3 years ago, the processor design trends were worrying. Designs were moving to lower clock speed cores with more cores. This meant that software needs to become more parallel to maintain response times. I've already seen customers complain that software runs slower on newer hardware. They are looking at response times, not throughput when they complain.
However, the other question is what will those cores be used for. At the time, people still ran applications on dedicated boxes. The virtualization trend was beginning but is becoming more entrenched now. People today are not using these boxes to run a single application. They are either virtualizing the boxes or running multiple applications side by side on them so save memory.
So, the question is: Has the #cores changed per application?
Maybe not is the answer. 3 years ago, 2 way or maybe 4 way boxes were common. Intel ships 8 core boxes now and 12/16 cores are just around the corner with the next set of chips. Large SMP boxes like Power/Sun are running virtual machines or the equivalent of Solaris Zones/AIX WPARs. I've a feeling that the reality is people are using fewer cores per application than 3 years ago.
This bodes well for existing applications on multi-core boxes. The crisis thought by many to be coming may be delayed or even mitigated as the extra cores are used for virtualized configurations not for running a single application. Of course, this means response times will be longer as applications will not be tuned for all those cores but this may simply be the price of more throughput per server through ever more cores/socket.
Memory will likely constrain this as this application consolidation is only really possible if there is enough memory on the box.
Is it possible the market for high socket multi-core servers will only be high end OLTP databases and the like? Will farms of two socket servers running automated virtualization layers be dominant for virtualizing applications? If the applications are not optimized for lots of cores then a large SMP box may not be the way to go. Farms of 2 socket boxes look attractive if the cost of managing them is reasonable compared to the cost savings versus a large SMP box. The cost per core will likely stay higher on a large SMP versus a blade type system.
Customers willing to optimise applications for multi-core will face a new challenge. I expect large SMP servers to have lots and lots of cores but slower cores. If you can't scale out your application then it'll be the only game. However, if your application does scale out then I think farms of blade type systems will be very cheap relatively speaking and be cheaper to develop as the blade systems will be faster per core than the large SMPs or at least thats my expectation. Databases are in a rock and a hard place here. The best response time will be blade type systems, probably x86 or similar. Throughput will be highend Unix systems but I don't expect these to match the response times of the blade machines but throughput will be better.
However, the cost of developing a single process to scale on those high end machines (maybe 160+ cores) will be substantial because of the difficulty to achieve very good vertical scaling. Blade systems will have much fewer cores and as a result, software is cheaper to develop to saturate the server. Thus, blade systems allow scale out applications to be cheaply developed, run on inexpensive boxes and have good response times. Software that can scale vertically only will remain expensive to develop, will run on high end expensive boxes and it'll be difficult to match response times depending on the application of the blade systems although if it only scales vertically then at the high end of throughput blades just can't do it. But, even 2 socket x86 blades are going to be extremely powerful so the realm where the high end SMPs boxes are ultimately required is pushed significantly upwards and a lot of people will find the performance of the blade systems meets a lot of needs currently met by SMP boxes.
I guess for CTOs staring as the costs of rewriting software, he/she may be able to relax a little but for architects, life moving forward is getting more complex.
Life is getting more complex, as usual :)
But what about applications with high concurrency requirements. I think a lot of web application utilizing existing multi core boxes just by running several requests simultaneously. We should have wait a long time, I think, before the number of cores will overgrown concurrency requirements of medium scale web-applications even. Though if it will happen, it can be a serious problem for a lot of applications. It's not obvious how single web request can be parallelized, I think.
Posted by: Denis Bazhenov | April 19, 2010 at 07:47 AM
I think thats my point. Java path lengths are getting longer, not shorter, more frameworks, more abstraction and so on. Most requests are still handled single threaded and so are sensitive to both clock speed of a single core and L2 cache size (as path length gets bigger, cache hit rates drop).
Posted by: Billy Newport | April 19, 2010 at 01:45 PM
Ok, I get your point.
What do you think we should do right now, as a developers. I mean all this parallelization stuff is long term process. If we stuck one day, we couldn't just turn around and say: "Hey, I know what to do. There is tools, there is languages. Here we go...". Developers should learn some aspects of parallelization models, I think.
And this is very interesting question. There are several computation models which allows you to parallelize application. There is functional programming paradigm and message passing. There is nVidia CUDA with friends (IBM Cell, Intel Larrabee etc.) with good vectorization support but explicit knowledge about underlying hardware. As you mention in your interview at QCon, none of them are fit naturally in the Java.
Does this mean, that all we can do right now is just wait?
Posted by: Denis Bazhenov | April 19, 2010 at 05:44 PM