I'm seeing this a lot these days. A customer is using IBM WebSphere eXtreme Scale in front of a database. The DBAs are concerned about how many JDBC connections that the grid is requiring. If it's a larger grid then the problem is especially acute. For example, lets say we have a grid with 200 JVMs in it. All of the grids that I have sized recently are in this range. Lets look at different aspects of this problem
1) WXS as side cache
Here, there are no connections from the grid to the database. The application server JVMs do all the database interaction and hence, the grid adds no additional demands on the DBMS.
2) Preloading WXS with data from a DBMS
Here, preloading it can be interesting. The main issue is usually the time to load the grid with the DBMS data. We usually use parallelism when we do this but that means multiple JDBC connections per preload client. The main issue I see with DBMS preload is simply query complexity slowing down the preload process. Simplifying the query and moving join logic in to the grid side preload process seems to be the key to solve this problem. Once you can achieve a high preload speed per single preload thread then you don't need so many JDBC connections to achieve a specific preload speed.
3) Write through WXS to a DBMS
This is probably the worst offender here. Typically, a WXS Loader consists of a TransactionCallback (like this JDBXTxCallback example) and a Loader. The TransactionCallback makes sure a JDBC connection pool is available for the Loader and the TransactionCallback manages JDBC.begin and JDBC.commit and rollback operations. The Loaders share the connection provided by the TransactionCallback for a specific transaction.
Clearly, preloading the grid will enormously reduce the usage of JDBC connections for grid read operations. However, write operations will still consume one JDBC connection per concurrent WXS transaction. Given, the potential throughput performance of such a grid, clearly, the load is usually too much for most databases. A grid server with 20 JVMs on a modern blade can still achieve 20k transactions a second per blade. Lets start with a grid with a 100% write rate for some peaks. Thats 1000 write transactions per JVM. If each write transaction costs 40ms (remember, the DBMS is directly involved with writes here, this isn't the grid, it's the DBMS) then we could do 25 transactions per thread per JVM (1000 / 40ms). Now, how many threads is that? We'd need 40 (1000 / 25 ops per thread) threads per JVM to achieve this amount of throughput in a single JVM. So, the JVM needs a JDBC connection pool of 40 connections (one per thread) too.
If there are 200 JVMs then we need 8000 (200 x 40) JDBC connections for the grid! Now, we can see why the DBAs may be concerned. A write through grid thats write intensive is a great idea for a grid. But, only with write behind. Write through makes NO SENSE AT ALL for this situation. The DBMS will remain the bottleneck and the customer will likely question why use a grid at all if such a solution with write through was proposed. Remember a 10 blade grid with 20 JVMs per blade and a 100% write rate for some workloads would get you here.
So, write through can still use a lot of DBMS connections if the write transaction percentage is high. If it's lower then these numbers can all be revised downwards. For example, if it was 10% then lets do the math again. One box does 20k tx/sec. 10% is 2000/sec. There are 200 JVMs then thats 10/sec per JVM. One or two connections in the DataSource would likely be enough here depending on response time requirements. This gives us 200 JVMs x 2 connections or 400.
It's very important to share a DataSource within a JVM, do not give each JDBCTxCallback it's own DataSource. The example JDBCTxCallback provide in this blog entry (see the link above) accepts a DataSource reference when it's initialized. You can use Spring to wire it in or change the JDBCTxCallback to get one from JNDI when there is an application server hosting the grid containers. This way, even though there is one JDBCTxCallback PER primary shard in a JVM, at least they will all share the same DataSource instance. This would reduce the connection usage dramatically. This is usually only a problem when used a J2SE hosted WXS container. Customers using WXS inside WebSphere ND will likely be using WebSphere's DataSource implementation and this will naturally result in one DataSource per JVM.
4) Write behind WXS to a DBMS
This approach gives the best opportunity to cut JDBC connection usage. Assuming, we preloaded the grid then the connections for reads can likely be ignored. This leaves writes. Write behind decouples the writes to the database from the grid writes. This means that one WXS write does not mean one DBMS write. This was the major problem with write through.
WXS allow DBMS writes to be triggered every X 'dirty' entries in a Map or when 'dirty' entries in a Map are older than a certain time (5 minutes for example). This means we would see much fewer database writes than grid writes. If we have a grid JVM with 5 primary partitions in it (the normal amount when sized correctly) then there are only 5 potential concurrent writes per second in such a JVM. Each partition will send its changes to the DBMS using a single thread. Therefore, the maximum JDBC connections used for write behind is the number of primary partitions in the JVM which in this case would be 5 PER JVM.
Note, that when boxes fail then the # of primary partitions can be more. Thus, size everything with what you expect to be your minimum # WXS container JVMS to figure this out. It's more likely to be 8-10 primaries in this case. Thus, we'd expect 10 JDBC connections per container if the system was extremely write intensive. However, a very high write rate is unusual. Normally, the write rate is much lower (10-20% might even be high). But, a security system built with a grid that updates last login time for example might result in a WXS write per login so be careful sizing this.
We expect in the worse case scenario with write behind to need one JDBC connection per partition. However, this is only the case when every partition writes at the same time. The big factor here is how long does a write behind transaction take against the database or the 'duty cycle'. If it takes 1 second and we're doing one write behind every 5 minutes then we can likely use one shared JDBC connection in a DataSource per JVM, thats a duty cycle of 1 : 300 (300 seconds in 5 minutes). We are unlikely to get much contention for that one connection as there is a 1:300 change of writes from different partitions clashing within a single JVM. If each write behind tx takes 2 minutes then this clearly changes things given there are 5-10 primaries per JVM. 2 minutes would mean a duty cycle of 120:500 or about 25%. You'd need probably one per primary partition here or 10 per JVM.
But, regardless, it's clear that write behind can reduce the DBMS connection usage significantly when the write behind configuration is 'optimal'. Optimal depends on the situation but as a starting point, 5 minutes or 1000 records (T300;C1000) isn't a bad place to start. The size of the DataSource pool for each JVM depends on the duty cycle of the write behind within a single JVM.
The points on sharing a single DataSource within a WXS JVM equally apply here as to write through.
Summary
I hope this blog entry opens peoples eyes in terms of JDBC/WXS interactions. Most applications elect to use a grid because the DBMS is unable to provide adequate response times under peak loads. If the write rate is very low then there is no problem. But, when the write rate is high then write behind is usually the best scenario here. A high write rate application using write through is likely going to still have performance issues as the DBMS will prevent the grid from achieving linear scaling and good response times.
A quality DataSource implementation is a big deal here. The DataSource thats provided with WebSphere Single Server/Base and WebSphere ND is about as fully developed as exists out there. It is a significant improvement (in my humble opinion) over C3P0 or Apache dbcp. Thus, running WXS inside a single server WAS install is something you could consider instead of a J2SE based WXS implementation for this reason alone, the DataSource implementation.