The trend towards multicore is moving along at a fast pace. Architectures like Suns Niagara seem to be getting copied by the other CPU vendors. The architecture is basically lots of cores but low clock speed per core.
This is a problem for Java as:
- Java likely has a longer path length than languages like C and clock speeds won't help with this.
- JavaEE promotes a simple threading model for applications
- Garbage collection remains heavily dependant on clock speed for small pauses.
The issue of path length and running in a managed environment means that code will take longer to execute. Yes, there are multiple cores but from the point of view of an individual request the lower clock speed is going to hurt response times. All those cores will be pulling from a single cache also so the cache better have a high hit rate or it'll be even worse.
Applications may have to be written to exploit threading if they want very high performance. Developers will be put back 5 years in terms of clock speeds (1Ghz sound good to anyone?) so it may be necessary to heavily multi-thread code to make up for this in applications. This means applications may be forced to use APIs like commonj for threading to wring this performance out of the boxes now if response time is important. This may actually push people away from JavaEE as once you start down this slope then cutting out 'fluff', i.e. managed access to resources etc, will mean less path length and this will make non JavaEE or lighterweight containers more attractive.
The trend then includes some of the following choices:
- Use containers that can be made lightweight or just include exactly what you need, no more no less.
- JavaEE will be forced to evolve towards a scalable implementation, i.e. choose what you want rather than be forced to swallow it all whole and pay the path length price.
- Simpler containers with less services but enough for 90% of applications.
- JavaEE evolves to support common threading patterns so that it's easier for normal developers to leverage threading on these slower processors.
- JIT become cleverer in order to compensate for the slower processors.
- Garbage collection gets augmented with malloc/free type operations.
The garbage collection point interests me as even with Java 5, large heaps containing things like big caches still kill the performance with very long pauses. Cache managers really need a malloc/free type API to directly control the life cycle of objects in these caches as it's totally ridiculous to use GC to manage these heaps.
It's hard to know what will happen but I know the days of clock speed saving people from path length concerns look to be over. Clock speeds will start to regress to allow chip makers to lower power requirements while adding more and more cores to the processors and this is going to have a negative effect on Java unless some issues are resolved soon and not just on the latest JVMs. Many customers will be deploying existing JVMs on multi-core systems and will hit these issues.
Other posts: Tuning for multicore, the basics
Chris,
1) Clock speed are coming down. The onus now is on performance per watt not performance per thread as has been the case until AMD kicked Intel around. We'll start to see chips with more and more cores and the clocks will drop to control heat. You can believe me or not but I don't think this is in dispute industry wide.
2) Very few people write very fast Java code, C people don't have the high level libraries in Java and the code tends to be tigher as a result. I agree that JITs have advantages over statically compiled code and can do things impossible with a conventional compiler given profiling info etc.
3) Simple threading if it works is best but not all apps work like that. We didn't make commonj for nothing, we did it because some applications needed very high performance and needs a thread model suited to the application to get this high level of performance. One size does not fit all. If performance
4) Concurrent GC is indeed a beautiful thing but if applications have large caches or utilize those 64 bit address spaces now with variable data when it too will run in to issues.
4a) Trust me, you are not going to see tight coupling between #cores and #caches. Thats not the direction. It's transistors for cores. If someone is building a 64 core system it won't have 64x the cache of a single core.
5) You're assuming only a single process on that set of cores then right? This isn't normal. This will cause issues with the cache and negate some of the gains you speak of.
The point of these articles isn't to present a dogmatic point of view. It's to air issues and promote debate and awareness and as such I welcome all of the comments :)
Posted by: Billy | November 28, 2006 at 03:59 PM
Very interesting blog.
I believe a few languages are ready to multi-core/multi-thread as Java, due his native support to multi threading.
I agree we have now a new scene, and we all must prepared to face it. I guess we'll need :
* Better profiller tools to debug and optimize multithreaded applications;
* Enhanced JVMs and GC algorithms to use CPU power spread across several cores and better use of L2 and L3 cache.
* And, perhaps, enhancements to Thread API and Concurrency Utilities package, as well good docs and examples.
A final point: I also agree that speed clock per core will slow down. But this can be a good thing, when it's reminds me, as example, Pentium M with nearly half speed could do more than old Pentium 4. Doing more with less clock cycle.
Posted by: Alessandro | November 29, 2006 at 08:04 AM
nice reflexion to begin with. what you said seems logical since clock speed will slow down eventually to compromise the power consumption.
but it seems that this isn't only a a java problem but all languages that rely on a gc to do the clean up so c# is on the same boat!! however I think that the jvm will scale up if used in cluster mode.
Posted by: Tarik Guelzim | November 29, 2006 at 11:20 AM
Hi Billy,
Nice blog !
Do you have any data to substantiate your claim or it is just your perception. As said by Scott earlier the T2000 server from Sun has outpaced all the 3.x(higher) GHZ processor in terms of performance for most of the mutithreaded JAVA apps. Hey we are heading to a 64 Bit JVM, where GC will be taking lot of cycles and time but there are changes happening to the JVM's as well like induction of ParallerOLDGC flag. As said earlier there must be slowness in slowness in single theraded activity but we don't do lot of that with J2EE based servers ...
Posted by: Dileep Kumar | November 30, 2006 at 06:13 PM
Hi Dileep,
The whole whats better story is real interesting. The new quad intel in 2 socket form is expected to beat the pants of Niagara but it illustrates the differences. 'Only' 4 cores but a high clock, many cores and a lower clock or AMD with networked dual core high clock CPUs?
It will be interesting moving forward to see if high multi-core cpus like niagra can run at full speed in typical scenarios.
Posted by: Billy | November 30, 2006 at 06:24 PM
I agree with your comments and Niagara is not the end of road and of course Intel and likes can make better/worse processor than what exists today. All I wanted to point out that so far there doesn't seems to be any cases to support your initial claim of "multi-core being bad for java".
Posted by: Dileep Kumar | November 30, 2006 at 07:14 PM
It would be great if you could post a part two followup to this blog. It would be my thoughts that a slower clock speed is OK if there are hundreds of cores as the time difference would be made up by avoiding preemption.
Architectures such as Azul where a J2EE application can gain access to 384 cores shouldn't slow down response times as you state...
Posted by: James | December 02, 2006 at 04:54 PM
This seems to fall in line with some of your reasoning Billy.
http://www.dailytech.com/article.aspx?newsid=5201
Essentially the 'next' processors are lower clocked woodcrests. Granted these are specificaly low voltage, but since that also equates to heat, sooner or later we're going to see this type of things as a necessity to higher smp chips.
Posted by: Rob | December 04, 2006 at 03:38 PM
"Java without the Coffee Breaks": http://www.research.ibm.com/people/d/dfb/papers/Bacon01Java.pdf
Posted by: | December 05, 2006 at 07:45 AM
I had practical experience with dual processor IBM Intellistation PC with Pentium 700MHz (Windows 2000) in 2001-2002. It run heavy J2EE application (JDK1.3, EJB/Servlets/JSP) Websphere, Oracle, CVS, many developers connecting to VisualAge repository, some performance monitoring tools, anti-virus, often ERWin, it was just going and going and going. Don't remember it showing CPU utilization of more than 50%, running for weeks without crash or reboot, even with load tests, that was just amazing...
Normal single processor PC would choke on half of it in 2001.
So from that I think that multi-core PCs should greatly benefit Java apps with its normal multi-threading.
Or do you think that multi-core PC is so different from dual processor one ?
Posted by: Oleg Konovalov | December 09, 2006 at 10:06 PM
Interesting reply here:
http://dev2dev.bea.com/blog/hstahl/archive/2006/12/multicore_is_go.html
H. Stahl seems to think Multi-Core Java is good for you, and has some results to support it...
Posted by: Jack Rogers | December 17, 2006 at 05:41 PM
We've actually been emailing about it and while for GC and systems code, we both agree that highly parallel code can be written the problem remains that clock speeds look set to fall which means Java needs a way to allow applications to multi-thread to the masses and it currently doesn't do that. See my response:
http://www.devwebsphere.com/devwebsphere/2006/12/new_java_langua.html
Posted by: Billy | December 18, 2006 at 09:32 AM
I'm happy to say there's another way to stress test Java's ability to scale on multicore.
There's plenty of frameworks for OLTP and SOA like patterns, but ever try and do a bulk data processing app with Java? Say, 10 million rows of data that's 100 fields wide and your batch window is 20 seconds?
Perhaps some of the folks on this thread have time to help build more benchmarks?? I've got 1 on the site and many more in review about to be published.
http://www.pervasivedatarush.com/beta
Posted by: Emilio | December 22, 2006 at 08:19 AM
Hello,
I cannot agree with your statements. Enterprise applications benefit more from multi-core architectures than desktop applications do, because the threading model is very simple there - each request is handled by a different thread. If the threads don't share too much data, such applications scale pretty well. This is rather a case of better/worse application architecture than the Java itself. And java 5 new multithreading classes are great in simplifying things.
You can switch GC into parallel mode. This is not default, but you can and it works fine on our servers (we use J2EE to process lots of SMS messages from all over the Poland).
Lower clock speeds don't make our applications unresponsive. No-one notices if his request is handled in 50 ms or 100 ms. And if it wasn't multicore, we would have to handle more requests on a same core - some requests would simply wait for others to finish. Then it could be noticeable.
--
Our company website: http://www.dinf.pl/
Posted by: Piotr Kołaczkowski | February 16, 2007 at 07:32 AM
Piotr,
Thanks for your comment. Here's a thought, imagine right now you are running on a power 5 or a 3.8Ghz Intel CPU and were happy. Now, I tell you that you are deploying on a PIII/800Mhz CPU. That is going to hit your response time pretty significantly. The problem is path lengths are getting way longer due to richer programming models, frameworks, convenience frameworks etc. Programmers are being saved by Moores law. That is coming to an end. When massively multi-core chips hit the market then you'll see a big drop on per core performance. Code path lengths that were acceptable before will not be now.
Posted by: Billy | February 16, 2007 at 10:04 AM
I agree with the original post, these multicore servers take time to do initial mark and remark which are stop the world operations on the CMS garbage collector. It's not uncommon to spend a week trying to tune GC on a big enterprise application, only to see you still get 6 second pauses whenever a 4GB old generation gets full or fragmented.
Posted by: Rob | July 08, 2007 at 11:17 AM
I agree that Java's path length is long but you gotta know that new multicore CPUs are designed to execute more instructions in one cycle. So even a C2D 6600 2.4GHz is running faster than a P4 HT 4.0GHz.
Posted by: Andy | October 17, 2007 at 02:09 PM
I have two questions:
1) Will hardware vendors really decrease clock speeds noticeably in order to increase core counts if they end up seeing that this makes e.g. Java applications behave noticeably more "sluggish"? (Note: they could instead decide to increase core count only as far as clock speeds stay reasonable - why opt for the hard way?)
2) Will "simpler" languages (like C or Fortran) and "less sophisticated" frameworks really not just "shorten path length" but also prove to make multithreaded design easier and less error prone for the average developer? (Note: even for memory management, going back to the malloc/free did not yield better code, less heap fragmentation or shorter allocation times - why go back to the stone age in terms of concurrency support?)
Posted by: martinval | January 18, 2008 at 12:16 PM
My cheap Linux AMD box is twice as fast as a T2000 when running java app in Websphere. Why is that?
Posted by: Vijay | April 21, 2008 at 12:35 PM
Billy,
I disagree with you that multi-core is bad for Java.
You are saying that:
"GC has elements that are multi-threaded but the main task remains single threaded and this thread will be slower on a multi-core CPU and get slower as clock speeds drop as more cores are added."
1) You are very wrong about the way GC works. Maybe you are talking about earlier versions of GC. A good article for GC tuning can be found here: http://java.sun.com/docs/hotspot/gc1.4.2/
For so many years experience I still have not met a developer that has complained that the bottleneck is in the Java's GC and not in his/her code.
2) Clock speed means something if and only if you are talking about clock speed on same architecture. Usually that is not the case, so 2 core cpu will be quite different animal than a 4, 8, etc core cpu. So forget about clock speed comparison.
If you visit Dublin give me a buzz and we can continue this dispute in Temple Bar ;).
Posted by: bvh | May 16, 2008 at 10:13 AM