As some of you have probably heard me say, we have tested Extreme Scale to over a thousand JVMs. Cool, and everyone runs out to build their applications to deploy single grids on lots and lots of boxes.
Now, while you could do this, we don't recommend it. Why? First issue is testing. How will your testing environment fully test a 1000 server grid? The answer is it won't. It'll test a much smaller grid because of budget, noone is going to buy twice the hardware, especially 1000 servers. Plus, what if you need to concurrently test multiple versions of your application. Requiring a 1000 boxes for each testing thread is not practical. This clearly is risky because you aren't testing the same thing as production.
Next issue is risk. Running a database on a single hard drive is risky. Any problem with the hard drive and you're losing data. Running a growing application on a single grid is similar. While we strive for very high quality it's inevitable that bugs will slip out and there will also, of course, be application bugs. Putting all your eggs in one basket is a risky move that almost never pays off. Splitting the application grid in to pods makes a lot of sense. A pod is a group of servers running a homogenous application stack. Pods might be 20 boxes in size. Rather than having 500 boxes in a single grid, now we'd have 25 pods of 20 boxes instead. A single version of the application stack runs on a pod but different pods may be on different versions of the application stack. The application stack is
- Operating system level
- Hardware level (bios, firmware)
- JVM level
- Extreme Scale level
- Application level
- anything else your application needs to work.
Pods are a nicely sized deployment unit for testing. It's easy to imagine QA having 20 servers to test. It's highly unlikely they would have 500. It also means they are testing the same configuration as production. Production uses grids with a maximum size of 20 servers, i.e. a pod. You can stress test a single pod and know what the capacity is, number of users, amount of data, transaction throughput. This makes planning easy and follows the Extreme Scale mantra of predictable scaling at predictable cost. Does a pod have to be 20 servers? No, this is just a number. It can be any number that makes sense. It should be small enough that if a pod has an issue in production, the fraction of impacted transactions is one that the customer is comfortable with until it's resolved. The bigger the fraction is then the less comfortable customers usually are...
A bug will hopefully only impact a single pod. This in the above example only impacts 4% of the application transactions rather than 100%. Upgrades are easier as they can be rolled out a pod at a time. This is just common sense. If an upgrade to a pod goes wrong then switch the pod back to the old level. Upgrades include application and system updates, any change to the application stack. Upgrades should as a rule only change a single element of the stack at once. This makes problem determination much easier. This isn't possible in all cases but should be strived for as a general principle.
You need a routing layer on top of the pods which needs to be forwards and backwards compatible as pods get software upgrades. A directory is needed to locate which pod has some data. Another Extreme Scale grid can be used for this with a database behind it maybe using write behind. This gives us a two tier solution. Tier 1 is the directory and is used to locate which pod handles a specific transaction. Tier 2 are the pods. Once tier 1 identifies a pod then normal Extreme scale routing routes the transaction to the correct server in the pod, this is usually the server holding the partition for the data used by the transaction. Near caches on the tier 1 can be used to lower the impact of the pod look up step.
This looks a little more complex than having a single grid but the operational, testing and reliability improvements will make it worth while.
This blog post doesn't discuss scalability in the normal sense, i.e. does Extreme Scale scale out to a 1000 or more JVMs. It discusses scalability in terms of operations, planning, risk management. These are usually MORE important than product scalability and unfortunately ignored frequently. Buildling highly available systems requires both. You need a reliable product and you need a reliable process for deploying your application.