I'm maintaining a live google document with the best that we're seeing. You can see the document at this link http://bit.ly/wxsbestpractice.
I'm maintaining a live google document with the best that we're seeing. You can see the document at this link http://bit.ly/wxsbestpractice.
Posted at 01:22 PM in WebSphere eXtreme Scale, WebSphere Performance, XTP | Permalink | Comments (1)
|
We just found a customer issue and it's worth talking about. The customer was preloading data into WebSphere eXtreme Scale. They were multi-threading the preload to speed it up. They were using a default fixed size ExecutorService as the pool. They were fetching records from the database and then doing a submit call on the pool to hit the grid.
They were running out of memory on the preload client. The reason is as follows. The thread pool size was fixed. Once those threads are busy then queueing additional items just puts them in the queue. The queue was getting longer and as a result holding more items. Items in a preload case might be 10k records to push in to the grid. Items are big, they can take a lot of space. A lot of space if you are able to grab stuff from the database very quickly.
This means if the thread pool is undersized then the pool will queue up lots of items and when the items are big, your preload client can run out of memory pretty easily. There are a couple of things to do here. But, the first is to fix that thread pool so that this can't happen. If you initialize the pool like this then you'll get a better behavior:
// this means that once there are 3x numThreads jobs queued waiting for
// a thread then it will start running jobs on the submitting thread.
LinkedBlockingQueue<Runnable> queue = new LinkedBlockingQueue<Runnable>(numThreads * 3);
// Once the queue reports that it is full, the CallerRunPolicy will run job
// on the submitter thread.
ExecutorService p = new ThreadPoolExecutor(numThreads, numThreads, 2L,
TimeUnit.MINUTES, queue,
new ThreadPoolExecutor.CallerRunsPolicy());
This shows us creating a LinkedBlockingQueue with a specific maximum capacity. I used 3 times the thread pool size but you need to figure out what works best. The major change here is the CallerRunsPolicy parameter. Now, that the thread pool uses a fixed size queue then if the queue fills then normally the pool will throw exceptions when this happens. We'd have N threads + M items on the queue at that point. CallerRunsPolicy means that instead of throwing an exception, the job is instead executed on the submitter thread. This will prevent the memory runaway that was happening in the case I described earlier and will throttle the preload operation itself rather than let it go crazy.
The other things to watch here would be to use larger batch sizes. If you are using WXSUtils.putAll_noLoader to bulk load the items in to the grid then make sure you have around 1000 key/value pairs PER partition. This is a good rule of thumb to start from when tuning things. If you have 200 partitions and a batch size for PutAll of 1000 items then thats only 5 per partition which isn't enough to really start to get the benefits of batching the puts. A number per partition in the hundreds is probably better.
If you have a grid with 200 partitions then a thread pool size of 32 isn't great either. You should at least have the same number of threads otherwise you'd overrun the pool instantly when doing any bulk operation. The default pool size for WXSUtils is indeed 32 and is too low for production. You can specify the size of the pool using a constructor parameter or by editing wxsutils.properties if thats how you are connecting to the grid.
I've already updated wxsutils to configure the threadpool as described above given recent experience. I have not changed the default thread pool size from 32 but thats something I'm considering also. Maybe, I'll default it to one or two times the number of partitions in the grid but bounded by some 'reasonable' number.
Anyway, it's worth pointing out the problem and how it's resolved just for every bodies education. An unbounded thread pool happens not just because of a growable thread pool for example but they can be memory unbounded pretty easily as in this case when the pool cannot service the pool queue fast enough.
Posted at 09:34 AM in WebSphere eXtreme Scale, WebSphere Performance, XTP | Permalink | Comments (2)
|
I'm seeing this a lot these days. A customer is using IBM WebSphere eXtreme Scale in front of a database. The DBAs are concerned about how many JDBC connections that the grid is requiring. If it's a larger grid then the problem is especially acute. For example, lets say we have a grid with 200 JVMs in it. All of the grids that I have sized recently are in this range. Lets look at different aspects of this problem
1) WXS as side cache
Here, there are no connections from the grid to the database. The application server JVMs do all the database interaction and hence, the grid adds no additional demands on the DBMS.
2) Preloading WXS with data from a DBMS
Here, preloading it can be interesting. The main issue is usually the time to load the grid with the DBMS data. We usually use parallelism when we do this but that means multiple JDBC connections per preload client. The main issue I see with DBMS preload is simply query complexity slowing down the preload process. Simplifying the query and moving join logic in to the grid side preload process seems to be the key to solve this problem. Once you can achieve a high preload speed per single preload thread then you don't need so many JDBC connections to achieve a specific preload speed.
3) Write through WXS to a DBMS
This is probably the worst offender here. Typically, a WXS Loader consists of a TransactionCallback (like this JDBXTxCallback example) and a Loader. The TransactionCallback makes sure a JDBC connection pool is available for the Loader and the TransactionCallback manages JDBC.begin and JDBC.commit and rollback operations. The Loaders share the connection provided by the TransactionCallback for a specific transaction.
Clearly, preloading the grid will enormously reduce the usage of JDBC connections for grid read operations. However, write operations will still consume one JDBC connection per concurrent WXS transaction. Given, the potential throughput performance of such a grid, clearly, the load is usually too much for most databases. A grid server with 20 JVMs on a modern blade can still achieve 20k transactions a second per blade. Lets start with a grid with a 100% write rate for some peaks. Thats 1000 write transactions per JVM. If each write transaction costs 40ms (remember, the DBMS is directly involved with writes here, this isn't the grid, it's the DBMS) then we could do 25 transactions per thread per JVM (1000 / 40ms). Now, how many threads is that? We'd need 40 (1000 / 25 ops per thread) threads per JVM to achieve this amount of throughput in a single JVM. So, the JVM needs a JDBC connection pool of 40 connections (one per thread) too.
If there are 200 JVMs then we need 8000 (200 x 40) JDBC connections for the grid! Now, we can see why the DBAs may be concerned. A write through grid thats write intensive is a great idea for a grid. But, only with write behind. Write through makes NO SENSE AT ALL for this situation. The DBMS will remain the bottleneck and the customer will likely question why use a grid at all if such a solution with write through was proposed. Remember a 10 blade grid with 20 JVMs per blade and a 100% write rate for some workloads would get you here.
So, write through can still use a lot of DBMS connections if the write transaction percentage is high. If it's lower then these numbers can all be revised downwards. For example, if it was 10% then lets do the math again. One box does 20k tx/sec. 10% is 2000/sec. There are 200 JVMs then thats 10/sec per JVM. One or two connections in the DataSource would likely be enough here depending on response time requirements. This gives us 200 JVMs x 2 connections or 400.
It's very important to share a DataSource within a JVM, do not give each JDBCTxCallback it's own DataSource. The example JDBCTxCallback provide in this blog entry (see the link above) accepts a DataSource reference when it's initialized. You can use Spring to wire it in or change the JDBCTxCallback to get one from JNDI when there is an application server hosting the grid containers. This way, even though there is one JDBCTxCallback PER primary shard in a JVM, at least they will all share the same DataSource instance. This would reduce the connection usage dramatically. This is usually only a problem when used a J2SE hosted WXS container. Customers using WXS inside WebSphere ND will likely be using WebSphere's DataSource implementation and this will naturally result in one DataSource per JVM.
4) Write behind WXS to a DBMS
This approach gives the best opportunity to cut JDBC connection usage. Assuming, we preloaded the grid then the connections for reads can likely be ignored. This leaves writes. Write behind decouples the writes to the database from the grid writes. This means that one WXS write does not mean one DBMS write. This was the major problem with write through.
WXS allow DBMS writes to be triggered every X 'dirty' entries in a Map or when 'dirty' entries in a Map are older than a certain time (5 minutes for example). This means we would see much fewer database writes than grid writes. If we have a grid JVM with 5 primary partitions in it (the normal amount when sized correctly) then there are only 5 potential concurrent writes per second in such a JVM. Each partition will send its changes to the DBMS using a single thread. Therefore, the maximum JDBC connections used for write behind is the number of primary partitions in the JVM which in this case would be 5 PER JVM.
Note, that when boxes fail then the # of primary partitions can be more. Thus, size everything with what you expect to be your minimum # WXS container JVMS to figure this out. It's more likely to be 8-10 primaries in this case. Thus, we'd expect 10 JDBC connections per container if the system was extremely write intensive. However, a very high write rate is unusual. Normally, the write rate is much lower (10-20% might even be high). But, a security system built with a grid that updates last login time for example might result in a WXS write per login so be careful sizing this.
We expect in the worse case scenario with write behind to need one JDBC connection per partition. However, this is only the case when every partition writes at the same time. The big factor here is how long does a write behind transaction take against the database or the 'duty cycle'. If it takes 1 second and we're doing one write behind every 5 minutes then we can likely use one shared JDBC connection in a DataSource per JVM, thats a duty cycle of 1 : 300 (300 seconds in 5 minutes). We are unlikely to get much contention for that one connection as there is a 1:300 change of writes from different partitions clashing within a single JVM. If each write behind tx takes 2 minutes then this clearly changes things given there are 5-10 primaries per JVM. 2 minutes would mean a duty cycle of 120:500 or about 25%. You'd need probably one per primary partition here or 10 per JVM.
But, regardless, it's clear that write behind can reduce the DBMS connection usage significantly when the write behind configuration is 'optimal'. Optimal depends on the situation but as a starting point, 5 minutes or 1000 records (T300;C1000) isn't a bad place to start. The size of the DataSource pool for each JVM depends on the duty cycle of the write behind within a single JVM.
The points on sharing a single DataSource within a WXS JVM equally apply here as to write through.
Summary
I hope this blog entry opens peoples eyes in terms of JDBC/WXS interactions. Most applications elect to use a grid because the DBMS is unable to provide adequate response times under peak loads. If the write rate is very low then there is no problem. But, when the write rate is high then write behind is usually the best scenario here. A high write rate application using write through is likely going to still have performance issues as the DBMS will prevent the grid from achieving linear scaling and good response times.
A quality DataSource implementation is a big deal here. The DataSource thats provided with WebSphere Single Server/Base and WebSphere ND is about as fully developed as exists out there. It is a significant improvement (in my humble opinion) over C3P0 or Apache dbcp. Thus, running WXS inside a single server WAS install is something you could consider instead of a J2SE based WXS implementation for this reason alone, the DataSource implementation.
Posted at 10:10 AM in J2EE, J2SE, nosql, Persistence, WebSphere eXtreme Scale, WebSphere Performance | Permalink | Comments (0)
Technorati Tags: database, datagrid, datasource, dbms, jdbc, sizing, websphere extreme scale
|
Updating a shared key/value pair in a concurrent environment is usually done in WXS in a pessimistic manner. The flow looks like this:
Begin
V = m.getForUpdate(K)
V++
m.update(K,V)
commit
This gets a lock, reads the value, modifies it and the updates it. The commit releases the lock. This ensures a serializable operation in the sense only one client thread at a time will be able to modify the value and ensures correct operation.
The common problem with this approach though in a contentious environment is threading. A client will use a server JVM thread to do the getForUpdate and it's blocking. We have lock timeouts to time bound the blocking time but lets ignore that for now. Once the lock is acquired, the server thread is released and the client gets control back. The client then sends a request to do the commit which uses another thread temporarily to commit the transaction releasing the lock and allowing others to get the lock in the future.
The problem here is if a lot of clients go after a small group of key locks then the thread pool can become drained because all threads are currently blocked waiting to acquire some shared lock. Those threads can only proceed if they get lock timeouts OR the client holding the lock can commit their transaction thus unlocking the key. The problem is if the client can't get a server thread because all server threads are used waiting for the lock then we end up with a problem because we can't get a thread to execute the commit for the client holding the lock so the lock never gets released. This is why lock time out is important to prevent server threads waiting forever for a lock.
Now, there is another way to do this and it's an optimistic approach. The problem above is that server locks are being held across client requests. Another approach would be for the client to use code like the following:
Loop
V = m.get(K)
V2 = f(V)
if(m.cond_put(K, V, V2) == T) exit loop
End Loop
The code snippet shows an optimistic version. Here, the client gets the current value and remembers it in V. The modified value in kept in V2. There is a new primitive on wxsutils#WXSMap now called cond_put. This takes the key, the old expected value V and the new value V2. The value for K is only updated to V2 if the current value is still V. A flag is returned to indicate whether the entry was set to V2 or not. In the above case, we use a loop to repeat the operation in the case.
This approach results in locks on the server being held for much smaller periods and never across client RPCs. The client needs to use a loop but i think this approach scales better in a contentious environment than the usual approach.
It's also similar to the concurrency techniques used for highly concurrent data structures now. Most of these are based on a compare and swap algorithm which is very similar to cond_put above.
The implementation for a cond_putAll is now in my github wxsutils library if you want to try it out.
Posted at 10:54 AM in WebSphere eXtreme Scale, WebSphere Performance | Permalink | Comments (0)
|
Many times I've had people compare the performance of WebSphere eXtreme Scale to a local hash map and comment that it's slower. Basically, they are comparing a remote hash map call to a local hash map call and yes, the remote one is much slower. But, this is an apples to oranges comparison. Try caching 600GB in the local hash map on every JVM requiring a cache? But, hey, it's faster, right?
Wrong. If you just need a local hash map then you should use one. A distributed cache like extreme scale brings expandible capacity, high availability of data in the cache, a network accessible remote very large cache, cache stays up when the application jvms are cycled and so on. It's different and thats why the fact that a local hash map is faster isn't really valid.
For a real world example, lets look at WebSphere Commerce Server. It can cache HTML page fragments in a local map (Dynacache) and in extreme scale also. If the cached data in present in both caches then the local map is faster but, that's not why we sell extreme scale to commerce customers.
An extreme scale cache is remote. You can cycle the commerce JVMs and when they restart, they are working with a hot cache straight away. This isn't the case with the local cache. It has to be regenerated and this costs of lot of CPU. This regeneration also hits the database during this time. So, we have extra costs of CPU, longer response times and database load if we cycle a commerce JVM and have to rewarm up its page cache. None of these problems exist with extreme scale as the page fragment cache because the cache contents isn't kept in the commerce JVM at all and therefore it survives a commerce JVM restart.
Now, imagine bringing down a box with 10 JVMs on it. With the local cache, the cost is 10x when you restart the box. With extreme scale, you has the cost of restarting the JVMs but no cache warmup costs at all as the cache survives the JVM/box cycle. It's ten times faster because those ten JVMs do not have to warm up their ten independent page fragment caches again or hit the database and the customers using that box see "normal" response times pretty much as soon as they are online versus waiting for the JVM caches to warm up in the conventional dynacache scenario.
Next, lets look at invalidation. If a large site had 100 JVMs and I update the price of an item, then all 100 JVMs need to regenerate the page fragments for that item every time this happens. They all burn CPU, the response time for serving that item is longer and there is database load. With extreme scale, the page fragments are invalidated but are only regenerated once, the first commerce server to request the page notices that it's not there, generates it once, puts it in the cache and now the other 99 JVMs have the new page even though they didn't generate it. This is a massive load reduction when prices or inventory levels are updated or promotions for product categories become active potentially impacting tens of thousands of skus in the system and the page fragments for those impacted skus. This means you can invalidate prices to match competitive prices in a more timely manner without worrying about high CPU spikes or database spikes. This is a major advantage for extreme scale with commerce server over the dynacache implementation.
Lastly, if each JVM has its own independent cache then potentially depending on which JVM a request is routed to, a different cached page fragment could be served, if pages are being invalidated aggressively then the site becomes a bit of a mess in terms of consistency of what users see across the cluster. WebSphere eXtreme Scale is a single logical cache shared by all commerce JVMs, they don't have a private local cache, they always go remote to extreme scale for page fragments. This means they are always consistent and a user will see the same cached page fragments no matter which JVM they are routed to.
So, when examining whether a customer should use websphere extreme scale with commerce server, it's important to focus on the massive response time, cpu savings and database savings benefits around edge cases, not steady state performance. The steady state CPU load of a commerce cluster with no JVM cycling, additions or data invalidations is likely to be a little higher with extreme scale versus a local dynacache. This is expected. If you have a completely static commerce site then dynacache with disk offload is probably a better solution. But, and I expect this is most customers, the content of the site changes and you want:
Then extreme scale is the way to go, hands down.
Posted at 06:52 PM in Dynacache, IBM WebSphere Commerce Server, WebSphere eXtreme Scale, WebSphere Performance | Permalink | Comments (1)
|
I speak all the time to corporate customers with well known and busy web sites. Typically, they see under 200 page views/sec. Maybe, a large one has a million logins per hour. The load curves for those systems usually resemble a repeating bath tub with a peak in the morning then a long trough until lunch time, another peak, then a trough till they leave the office and so on.
The advent of mobile browsers and mobile applications on devices like the iPhone, Android and Blackberry are changing this load profile. The trough isn't so deep any more. There is more average load simply because people on an impulse can bring up a mobile application and check something out. They don't need to be sitting at a desktop anymore.
A higher trough isn't so bad. People who were betting on consolidation will get a slightly smaller payback because as average utilization of servers rise, the potential to consolidate them falls. This is a small negative. The higher trough is small news though compared with whats going to happen to peaks. We have a new phenomenom here, the FlashLoad.
FlashLoads
We've all heard of flash floods. The stars align, the heavens open, and we get severe sudden massive flooding of a piece of land. The advent of wide spread smart phones and the availability of corporate mobile applications to make accessing their infrastructure easier is about to invent a new IT term. The flash load event.
Imagine the following scenario. A bank has a mobile application. It's a big bank, 20 or 30 million customers. They have a successful iPhone application downloaded to 6 million iPhones. Sounds great? But, lets say the stock market crashes. Everyone sees it instantly, CNN's mobile news application sends up real time notifications to their iPhone, they don't even need to be watching TV or listening to a radio. The phone is all they need to be aware of whether Kim Kardashian wore green as soon as she stepped out just now or that there is a stock market crash. What do the people do? Why, they crack open the banks mobile application to check the brokerage application. The problem is all 6 million are doing it at almost the exact same time. They were all synchronized through mobile notification technology and other media if they are observing. It's a flash load.
Corporate IT systems are usually not designed to handle a million concurrent logins let alone 6 million. They are usually not designed to handle 6 million concurrent users pulling up their brokerage account details or entering sell orders from mobile phones at the same time. The velocity of a flash load event will be fierce. The load will ramp up almost vertically if it's triggered by a media event (Cable show, Twitter, Facebook, TV, Radio, mobile notification service). Companies with auto provisioning systems won't be able to keep up. Provisioning more virtual machines takes time, minutes for the virtual machines to start, operating systems to boot, middleware to start up and so on. Meanwhile, you are getting beaten with a massive surge in load and it's likely not going to be pretty. By the time those extra resources are provisioned in, it's already over. Companies that consolidated resources have a lot less headroom than those who don't. You can't expect the headroom when non consolidated resources run at 10% while consolidated ones run at 50-60% load. The headroom just isn't there until the extra provisioned resources arrive and it won't be instantly.
Caching is a possible solution to some of these problems. The closer to the user that you can service a transaction using a cache then the less of the infrastructure you will touch with each 'transaction'. Caches can usually handle a lot of load, tens of thousands of requests/sec even on blades. A little investment here and having the data preloaded or warm in the cache could be the difference between a severe outage and a working site. I don't think it's a question of whether to cache or not, it's a question of cache or crash.
The cloud may not save companies here either. Very few corporate applications can run in a hybrid cloud/private data center mode. The cloud also has limited capacity. If the major banks used Amazon EC2 to host servers, a flash load event may even strain their ability to provision ec2 instances to try keep up. They are all looking for extra capacity at the same time. It's like the recent credit crisis but it's a mips crisis instead.
I think mobile applications will allow customers to check their accounts and details in a much more accessible and convenient way than the Web ever did. But, with the availability of instant information in our society through mobile devices, they also bring the potential for a new phenomenon. The flash load. Be prepared once you start shipping mobile applications for your public facing services. It's great to see a successful smart application getting millions of downloads but remember as your success measured by the number of downloads grows, the potential of a flash load is also rising fast too. Be prepared before you discover the limitations of a traditional multi-tiered architecture. It's not just a matter of adding REST services or similar, you need a flash load strategy also because it's guaranteed to come along eventually. It's cache or crash.
Posted at 05:08 PM in High Availability, virtualization, Web/Tech, WebSphere eXtreme Scale, WebSphere Performance, XTP | Permalink | Comments (1)
Technorati Tags: cache or crash, flash load, mobile applications, smartphones
|
There are a lot of applications out there that already use local caches like the many open source ones as well as things like Dynacache. These caches are basically local hash maps in the same address space and effectively have zero access time and cost. Cache misses are zero cost and cache hits/updates are practically zero cost also.
Applications faced with this have been written to expect this speed, i.e. zero cost. This means some applications are written to be extremely 'chatty' with the cache during individual requests. The caches tend to be very fine grained and a single application request may request or update hundreds of key/value pairs from the cache. When the cost of each operation is practically zero then this isn't a problem at all.
The trouble starts when you try to plugin a distributed cache. These caches are typically remote. This means all cache interactions involve an RPC and involved serializing the keys and values to wire format and possibly back again and thus are far from zero cost. Comparing the cost of an RPC with a local hash map lookup, it's a pretty huge difference in code path. Maybe 3 or 4 orders of magnitude. If the application is doing 100 lookups per request then before it was basically 100 x zero ms, now it might be 100 x 1ms even for misses.
This difference can then have a dramatic impact on an applications performance if it interacted with the cache in a very chatty manner. I see this a lot in applications written against a local cost cache because if something is free then developers will use it and then abuse it.
There are a couple of solutions to this.
First, we can modify the application to be less chatty. This typically means the application should store data in the cache in a much coarser form. Rather than have a customer split across 10 maps, a single customer map whose value is a denormalized 'customer' data might be better for example.
Another solution is to use a distributed local cache. WebSphere eXtreme Scale can do this using it's multi-master capabilities (V7.1+). This allows you to set up a single partition grid. The grid is hosted in a pair of container JVMs that we'll call a hub. Each client then instead of just connecting to the hub with our client APIs, they would instead create a local grid with the same objectgrid.xml/deployment.xml but also with a local catalog instance in the client JVM. The client then makes a multi-master link to the hub grid catalog. WXS will then copy everything in the hub grid locally. The client then reads and writes against it's local grid which is now like a spoke in a hub and spoke topology. All changes made to the clients grid are replicated asynchronously to the hub. If other clients did the same thing then the hub will ensure those changes are also copied to the other clients. Any collisions can be handled in the hub using the default arbitrator or an application supplied one.
This approach allows clients to have zero cost reads and writes to a cache bounded by JVM heap limits but it's fully replicated across different clients using a hub and spoke architecture. It would use optimistic versioning through our arbitration logic in the hub. Thus, it's extremely fast but the capacity is limited to the data that fits in a single JVM although this can be quite large these days.
But, the bottom line is that treating a cache as a zero cost access resources is a mistake. There are lots of types of caches these days, the caches can be made to scale out linearly from a capacity and throughput point of view but application developers need to realize this fact and either use coarser grained cache lookups or the local approach with some kind of hub/spoke but with a capacity restriction as it's unlikely a client JVM can host 600GB of data. It is possible in this day and age to have such large JVMs but if there are a lot of clients then cost of memory becomes a deciding factor and moves you back to the remote large cache with a less chatty application/cache interface.
Posted at 03:08 PM in Dynacache, WebSphere eXtreme Scale, WebSphere Performance | Permalink | Comments (0)
Technorati Tags: cache, datagrid, distributed cache, java
|
I've gotten quite a few questions on the just announced WebSphere eXtreme Scale integration with WebSphere Commerce Server. One of the top themes is around CPU usage.
Duplicate Page Rendering
This is the first major saving. A WXS cache is a separate tier and is shared between all the Commerce server JVMs. Once any commerce JVM renders a page fragment and stores it in the WXS tier then that fragment is immediately available to every other Commerce JVM. If there are N Commerce JVMs then that means the page fragment will be rendered once instead of N times. This is a significant source of CPU savings in the following circumstances:
Cache management costs
Commerce Server using a private disk based cache normally. This cache is private that means the burden of managing that cache is borne totally by that Commerce JVM. The bigger the cache, the more management, the more garbage collection pauses, the more private disk I/O and so on. This cost is per JVM so N JVMs means N times this cost.
A WXS cache is not managed by the Commerce JVMs at all, the costs of managing the WXS cache is shared by the collection of JVMs hosting the grid. This removes this CPU cost from the Commerce cluster and makes that CPU available to do more useful things like sell products. In addition to this CPU saving, WXS scales linearly, no one JVM is responsible for the cache management tasks, the tasks are shared in a uniform manner between all the JVMs hosting the data. There is no single bottleneck. If you make the cache larger then add more JVMs to increase capacity, add CPU for cache request servicing and management and network. We can say that the cost of management for a single WXS JVM stays constant as the cache cluster grows in size. We move the cost of managing the cache away from the client application and in to a more scalable tier.
Summary
So, there are two principal CPU savings.
Posted at 10:49 AM in IBM WebSphere Commerce Server, WebSphere, WebSphere eXtreme Scale, WebSphere Performance | Permalink | Comments (1)
|
I've written and uploaded a Bloom Filter implementation and uploaded it as part of the wxsutils library I keep on github. You can look at the source code here. If you want to use it with your WebSphere eXtreme Scale project just grab the jar from here and add it to your class path.
A bloom filter is best explained elsewhere in terms of how it works. The use cases for where it's useful are worth going over again here though.
Cassandra uses a bloom filter to track all the keys where are stored on disk in a space efficient data structure, i.e. the bloom filter. It uses this to avoid disk I/Os that would otherwise be used looking up keys definitely not on the disk.
Another more customer friendly example is a fraud detection scenario. Lets say you're a credit card company that has the spending history over 6 months of all your customers. You want to catch purchases to vendors that are not in that history. One way would be to keep all the customer vendor history in a database or in a datagrid and check if a new transaction is for an existing relationship. If not then you'd escalate it for scrutiny. For a large credit card company, this can easily be billions of rows. Lets say 200 million credit cards with 200 vendors each? That could be a lot of memory.
A better approach would be to create a custom bloom filter for each customer. The bloom filter is initialized for the number of actual vendors the customer deals with, maybe 200 vendors on average. Each vendor is then added to the bloom filter. This filter is a byte array when this is done. It's about 256 bytes. Pretty small to store 200 vendor strings, no? This filter can then be loaded in to a DataGrid and when a new transaction comes in, the filter for that customer can check with a 97% accuracy whether the vendor was used before by the customer. This is very accurate and was achieved with a huge memory saving and we can improve the accuracy further at the expense of memory if needed. This is the beauty of bloom filters. This example can be used for:
Another use case would be a security system that wants to track every password every used by a user and try to prevent them using one every again or prevent a customer from using banned passwords. A bloom filter could be used to track existing passwords or the list of banned passwords and allow those lists to be checked quickly without taking a lot of memory or disk space. It has huge advantages in the existing password case as the passwords are not physically stored, the bloom filter is like a fixed size set of password hashes.
IBM WebSphere eXtreme Scale customers can create Maps keyed by customer name whose value is a Bloom Filter. A WXS agent can be used to quickly check if a vendor or some related entity probably not in the 'list' of valid things for that customer. A batch job can generate the bloom filters from the actual vendor data and the grid can be preloaded with the resulting bloom filters periodically. The actual vendor data stays out of the grid.
Security is also improved with a Bloom filter. The grid is not storing any sensitive information. The actual vendors the customer deals with are NOT there. The bloom filter is just a bit array that can be used to check if a vendor is not there, it's not a list of vendors that could be hacked.
Bloom Filters are a very useful data structure to save memory as well as CPU path when checking lists of things for exclusion. Applications can use DataGrids such as IBM WebSphere eXtreme Scale to build extremely scalable, high performance systems exploiting techniques like BloomFilters for fraud detection as well as other scenarios.
Posted at 12:19 AM in Bloom Filter, Event Processing, nosql, WebSphere eXtreme Scale, WebSphere Performance, XTP | Permalink | Comments (2)
|
UPDATED to include compressed 64 bit numbers.
I figured this was worth doing quickly so here are some numbers for the overhead in WebSphere eXtreme Scale for storing 1m objects in a grid. I'm using a single JVM grid in this case but this applies no matter how many JVMs.
I'm storing 1 million Person objects with the following structure:
String firstName;
String surname;
String street;
String city;
String zipcode;
I'm populating these strings with some small sample strings. I'm using a key which is just a person# string.
I did the test with 32 bit and 64 bit Sun Java 6 server JVMs on my macbook. I pulled heap usage using the jmap utility supplied with the JDK (jmap -histo:live pid).
WebSphere eXtreme Scale can store object values in two ways, as POJOs or as a compact serialized form (more efficient than normal serialization, like a byte packed version of the data in the value, only available in 7.0+). The byte mode is enabled by having a copyMode="COPY_TO_BYTES" on the backingmap definition in objectgrid.xml. Lets look at the heap size after a system GC in 64 bit JVM mode:
So, you can see the BYTE representation is much more compact than the POJO version, about double the density.
Lets look at when we use -XX:+UseCompressedOops
You can see that compressed mode significantly reduces the 64 bit overhead by about 25%.
If we do the same thing with 32 bit JVM (-d32 on Mac) then we see
You can see 64 bit has quite a large overhead on Sun. Compressed 64 bit isn't so bad but still isn't as good as straight 32 bit. This is why we recommend 32 bit JVMs with heap sizes under 2.5Gb. It's just most efficient from a memory point of view.
In 32 bit mode, here is a break down for the POJO mode:
The BYTES model looks like:
So, our overhead here is 80 bytes per byte array entry, the values are stored as byte[]s in the 108MB chunk and the hybrid hash map takes again 19MB. So, the overhead is still 99 bytes per entry but each entry is stored more efficiently than in POJO mode and it is this saving which makes the big difference in the heap sizes seen above.
Applications interacting with the grid remotely typically use byte mode. Applications running collocated with the grid for highest performance typically use POJO mode in combination with other features but thats another blog entry.
The WXS code about to be released drops the overhead down further to 64 bytes rather than 80 and that will be available soon.
Each partition primary/replica shard is kept in its own area and we've had that design from the beginning because if we kept all data in all shards in a JVM in the same hash map then splitting them is expensive so we never did it that way.
Hopefully, this helps customers size heaps and memory. Obviously, we are continuously working to improve memory density so we can change this when ever we choose to so individual class names we use are not guaranteed at any time.
Posted at 06:01 PM in nosql, WebSphere eXtreme Scale, WebSphere Performance | Permalink | Comments (6)
|