Dogukan Sonmez

Currently in Munich, Germany

Java Groovy Python

No-Sql TDD Scrum

Django Shell Cloud

May 09 2012

Web servers and Node.js

I was going to write something about web servers but then I thought it’s better to mention about node.js as well. So I am going to try to explain how a web server is working. I will talk about http web server then I will give some clue about how node.js is working.

Client Server architecture

Very first thing I want to talk about is client server architecture before going to write something about http web server. As you can see from following picture, web is client server architecture. Don’t even think about only architecture is client server of course there are other architectures like P2P over the web

So what is client server architecture is, primarily request/reply oriented. Client makes request to server and server gives a response to client. Server doesn’t need to know how many clients running or what is technology running on clients. Clients use standard protocol to communicate with server like HTTP protocol and share data across the network (HTTP is stateless protocol).

Serving Web Site

Up to now we’ve learnt what client server architecture is and how it is working. So based on this architecture let’s think about what we are going to do if we want to host a one page web site. Since this is not google.com we don’t need to design a scalable server. This is going to be a simple web page which is doing simple file operation and give response to client. For this requirement we need a machine to run our web server program and an application to do logic part of client request.

Now it’s time to design our web server. Our web server is a simple http server which is accepting basic HTTP methods. It is a computer program which listens a port and accepting http requests.

If we don’t care response time of our server or number of throughput then this design works for us. This is single thread basis web server so if more than one requests reach to server, they are going to wait until server finished processing current request.

Let’s thing about everything went well and now we have many clients which are waiting response from our web server and it seems our web server is not capable to catch each response on time. Ok no problem we can change our design moreover instead of single thread basis design we are going to implement multi threaded web server.

We changed our web server and now one main thread listening port and if a request come over then it is giving the responsibility one available thread to give response this request. At the moment our web server software is available to handle request simultaneously.

Now we have better web server which is more scalable and its response time is acceptable for users. But we haven’t done yet! If our server creates new thread for each request we will have memory problem. We can use fixed amount of thread to make it more stable.

Another thing is, it was ok before to keep connection per request because we would have single html page which has only some tables and text and so on but today’s web pages have many links videos pictures animations… It’s not possible to load one page with only one request. Then the persistent connection is come out. You may wonder what is persistent connection or HTTP keep-alive and how come it helps us to build more efficient network sharing.

HTTP keep-alive

Let’s talk about what is http keep-alive. First of all Hypertext transform protocol (HTTP) is stateless request response based protocol and it has three well know version HTTP/0.9 HTTP/1.0 and HTTP /1.1. So what is persistent connection or http –keep alive is idea of using same TCP connection to send and receive multiple http request and response. As I said before when we try to load one web page on browser meantime many request response talking with server to load data on page. So idea is make sense when we think it’s better to reuse already opened connection. It’s also aim to improve HTTP performance that’s why it comes default at HTTP version 1.1. How it does work? When client make a HTTP request it puts keep-alive property to the header of request and web server do not close connection when it sees keep-alive header at request and put response header keep-alive true.

So far everything seems ok but maybe HTTP keep-alive is not a killer properties of HTTP or you may not consider design a web server with HTTP persistent connection. Here comes big WHY? Loading one web page isn’t takes too long, not even seconds. Many times it doesnt make difference to reuse connection. Also instead of using one single connection now browsers open parallel connection to load web page faster. You may ask if it’s not matter to use or not use persistent connection why you need to care about persistent connection just ignore it. And I would say you are 50 percent right but you are fifty percent wrong because maybe it’s not a problem on browser side but it’s a problem on server side. Since our web server working synchronously, it will allocate one thread to take care of this open reusable connection and many times thread waits by doing nothing. Since we design our server with fixed amount of thread in case of heavy load our server performance is not going to be good. So it’s better to keep connection keep-alive time shorter to increase server throughput.

Alright we design our scalable web server with consideration of http keep-alive option. So please ask what the hell node.js is coming to this post. Remember synchronous process flow it might not a good idea in case of handling millions of request per second with synchronous web server process. Our web server must be available always. It shouldn't be busy when thousands of user make request at the same time. This is like a max flow problem as below.

We have millions of request per second and very good application server and a web server which can only handle a finite number of concurrent connections. Our web application server starving to process request but our web server cannot catch speed of our web application server. We got to do something. So rather than handling each connection synchronously what if we handle each request asynchronously. What I mean get connection and then give the responsibility to application server and not wait response from app server, go get another connection. It sounds great isn’t it?

Node.js

Ok now we got the idea, it’s time to talk about what is node.js. Node.js is java script on server it allows you run java script at backend outside the browser. It uses Google V8 as chrome does. With node.js our web server and our application server is same. It’s Event-driven asynchronous server-side JavaScript with callbacks in action that’s another reason why it’s so fast. Because of that, it is very good when you need to do several things at the same time. It’s good when you need low response time and high concurrency.

There are some key points make node too fast like Event driven callbacks. Event-driven means one event triggers another event and so on or one event is starting process as a reaction of another event.

For example thing about you went to burger house to order a hamburger. When you arrived to burger house an event occurred because burger house state changed from no customer to one customer. You gave your order and now you are waiting to pick it up. They prepared your hamburger and now your state had been changed from no hamburger to one hamburger and you leave burger house and now they are waiting for new customers.

So this is event-driven but what is event-driven callback is? Previous story is a synchronous process because you are waiting them to get your hamburger. Thing about while waiting your hamburger someone else came to burger house and there is only one person working to get order so new customers have to wait you. Meanwhile many new customers arrived and line keeps getting longer. What if you give your order and have a seat and they inform you when they finished your hamburger then they can get another order. So this is event-driven callback.

Another example might be you give a phone call to your friend to ask something and while your friend looking for an answer you are occupying the line. It’s better to say “ok take your time and call me back when you done” then you don’t have to wait at line you can close phone and do your staff and when your friend done get answer.

I hope now you have an idea about how node.js is working. Remember In node, everything runs in parallel, except your code. Remember Burger house example Node.js is not multithread it’s single thread basis but event based. At burger house there is only one person who is getting orders, you give your order then quit the line and someone else give a new order. In addition behind of burger house there are many people working for preparing orders, it’s like multithreaded.

Node.js is not only JavaScript engine there are some other server side java Script engine like RingoJs. And if you read all post till this line you should also check out what is non-blocking I/O

Before writing this post I created a simple java web server application https://github.com/dogukansonmez/HTTP-SERVER. It’s only accepting some HTTP methods even not all of them but it’s good to check it out to get an idea how web server is working. And please have a look http://www.chromium.org/spdy/spdy-whitepaper there is a good project regarding making faster web.