Dave's Mess > Blog > Distribution and layers

<<< Dave's rebuttal of Macrovision's response to Steve Jobs' open letter about DRM in iTunes A tale of duelling GRUBs and boots. >>>

Distribution and layers

8pm, 2nd May 2007 - Geek, Interesting, Web, Security, Developer, Sysadmin

An onion. Well... it has layers too, hasn't it ? Lately, I've noticed that the application of layers and distribution to a great many things seem to improve nearly every aspect of the thing in question. It may seem obvious when it's pointed out but for me, at the time, it was the application of a well known principle in computer science to areas outside computer science and the astonishment that the principle continued to work.

I think I was first exposed to distribution when I discovered the distributed.net project. The idea was that sometimes a problem that would take one person an entire lifetime to solve can be solved by 1000 people in 1/1000th of a lifetime. When a problem is able to be split up like that (it is said to be parallelisable or distributable) then it makes sense to share the workload out over an appropriate number of people to be able to finish the job on time. In the case of distributed.net, a problem that was supposed to be unsolvable in any practical amount of time (decrypting an encrypted message) was actually solved in 22 hours!

Improving the time to solve a problem by distribution is not only applicable to large problems that can be split up into chunks but, not surprisingly, also to large numbers of small problems. Web servers are a very commonly used example. You may request a web page from a server and receive a response - your web page - yet when you make the same request later, it may be served by a completely different server. The magic behind the scenes is usually a load balancer in front of a number of web servers. The load balancer is designed to make sure each of the web servers is doing an appropriate amount of work and should be invisible to the end user.

There are several advantages of using a load balancer in front of many web servers rather than buying a faster, more expensive machine that can handle the same load all on it's own and the first one is price.A lot of almost nothing adds up to something.

A system is more reliable if it has no single point of failure. As a website grows it often starts with just one small web server but eventually this solitary web server won't be able to keep up and at this point it is much cheaper to buy a second small server with a load balancer than to buy a new server with twice the capacity and throw the old one out. Over all, you may have spent the same amount of money but the spending was small when your website was small and grew in proportion to your website. Other advantages can be even more compelling - one of my favourite benefits of distribution is reliability. If you have 10 web servers and one of them crashes, each of the other 9 will have to do 10% more work but your website will keep on working. The failure will not affect your visitors.

The two principles at work here (with the distributed.net project and the web server balancing) are that a lot of almost nothing adds up to something and that a system is more reliable if it has no single point of failure.

Layers are really just a specialised application of distribution where all of the elements are chained together and each element in the chain does a slightly different job. A job is only passed along to the next element in the chain if the current element of the chain can't complete it. Often, the chain is ordered so as to optimise it's own efficiency.

To continue with the example of the web servers, in order to speed up the response times of web servers, web masters use caching. Caching involves taking the result of a long, slow process and remembering that result. The next time somebody asks us for the result, we just hand them the copy we remembered (our cached copy) rather than calculating the result again. Caching usually improves access speed and reduces calculation time at the expense of using up more memory however as memory is often quite cheap, this trade-off is usually worthwhile. Caches usually have rules about how long they are allowed to remember a certain result so that they don't continue to remember a result that is incorrect or stale.

Caching is broadly applicable and can be implemented in many places within the system. The SQL server can cache results of certain queries so it doesn't have to calculate them again when the web server requests and the web server can cache it's copy results of the same queries so it doesn't even have to ask the SQL server for them. Sometimes, in front of the web server, there is another specialised web server called a proxy server which can also cache the page generated from the SQL queries by the web server so that it doesn't have to generate that page again. As Caching happens closer and closer to the source of the request, the advantage grows larger and larger. Sometimes your ISP will cache pages or parts of pages and not even ask the website to send it's copy to you but rather just send their own. Your own computer even caches pages and parts of pages you have requested so that if you request something a second time, it already has it and just gives it to you straight away. In this way, a request for a web page can be distributed over many different computers and the result is a much faster page on your screen. The other main advantage of distribution still applies here; if the SQL server crashes or the web server crashes, you may not even be aware of it because your entire page was served from a cache closer to you than either of those two servers.

Layering applies to more than just caching however. Spam filtering works quite well with many layers. Your ISP probably employs many layers of filters in order to prevent spam from reaching your inbox. Acme.com has a very good write-up on filtering spam using layers. One nice thing about these layers of filters is that you can use the results of later filters to modify earlier filters in order to reduce the total workload. Because the filters are applied in an optimised order, if a job is filtered at an earlier level it actually takes less work to achieve the same result than if it were filtered at a later level. This is still true even when you ignore the workload all of the filters in between that don't even filter the job. Caching can be seen as a series of filters that are filtering a request for a web page. As soon as one filter can resolve the request it does so and doesn't pass the request any further along the chain.

Another area where layering can create advantages is security. This is a principle often known as defence in depth. (Defence in depth also covers other areas but for the purposes of this discussion, it means layers.) In this case, the layer is usually not created in front of the existing layer, but behind it. For example, you might place a firewall at the external perimiter of your network to restrict access (layer 1) and then also place firewalls on each of your hosts within the network (layer 2). It may seem, if the firewalls were configured identically, that anything that made it through the first layer would also make it through the second layer but it is not so. If an attacker avoided the outer firewall by exploiting an unsecured wireless network set up by an employee within the building, then the firewalls on each host would be the only thing protecting the data contained within. Having said that, there is nothing that requires the firewalls to be identical. In fact, you should make each firewall as restrictive as possible (but not more so) on a case by case basis. Layering improves security because an attacker has to break each layer of security seperately and if any layer fails, the next likely will not.

So far, these examples have been all very computer related but thinking about them made me wonder if the principles involved might apply to areas not related to computers. It didn't take long for me to find some. Power generation is a good one. If every home has some alternative source of power generation - a wind turbine, solar panels, whatever they like - then the advantages of distribution take hold. The power station has reduced load and if a disaster strikes bringing the power lines down it affects fewer people and those not as badly. People can grow their own vegetables too. This means reduced load on farmers (and the land) and more reliability if the farmers are unable to deliver the vegetables for some reason.

Layering applies quite well to call centres. When your phone is first answered, it's just a computer that tries to figure out what you want, answer you if it can and direct you to the appropriate person if it can't. The first layer is cheap - one computer can handle hundreds of simultaneous connections - and the second layer often works the same way. The first humans you talk to will probably be paid minimum wage and have a book of common questions and answers. If they can't solve your problem, up the chain you go to the more highly-paid, more highly skilled who can solve the few problems that make it all the way to them. These guys could solve any problem that came to them but it's much cheaper and more efficient to filter out the easy stuff before it gets that far.

Physical security is enhanced by the application of layers. A guard at the door of a casino might stop most miscreants getting in but guards patrolling the floors at random could find them even after they have entered and could certainly help stop a robbery in progress. Casinos are such tempting targets that they need to have layer after layer after layer. Staff training, CCTV cameras, areas with different access controls, strong authentication methods for building access, rotating the floor staff regularly, seperating in house cash across several safes rather than just one... the list goes on.

I'm pretty certain that layers and distribution aren't going to solve all of your problems, but there's a good chance that a lot of them can be made less of a problem by using one of these techniques. See where you can fit them into your life.

So many servers, all hacked.
MoneySavingExpert under DDoS attack
Clever girl...
Sudoku solving version alpha
Galumph went the little green frog one day.

Comments

Be the first to comment !

Dave's Mess > Blog

Distribution and layers

Related posts:

Comments

Older blog posts: