UPDATED to include compressed 64 bit numbers.
I figured this was worth doing quickly so here are some numbers for the overhead in WebSphere eXtreme Scale for storing 1m objects in a grid. I'm using a single JVM grid in this case but this applies no matter how many JVMs.
I'm storing 1 million Person objects with the following structure:
String firstName;
String surname;
String street;
String city;
String zipcode;
I'm populating these strings with some small sample strings. I'm using a key which is just a person# string.
I did the test with 32 bit and 64 bit Sun Java 6 server JVMs on my macbook. I pulled heap usage using the jmap utility supplied with the JDK (jmap -histo:live pid).
WebSphere eXtreme Scale can store object values in two ways, as POJOs or as a compact serialized form (more efficient than normal serialization, like a byte packed version of the data in the value, only available in 7.0+). The byte mode is enabled by having a copyMode="COPY_TO_BYTES" on the backingmap definition in objectgrid.xml. Lets look at the heap size after a system GC in 64 bit JVM mode:
- 64 bit POJO 782.7MB
- 64 bit BYTE 381.0MB
So, you can see the BYTE representation is much more compact than the POJO version, about double the density.
Lets look at when we use -XX:+UseCompressedOops
- 64 bit POJO 592.0M
- 64 bit BYTE 299.4M
You can see that compressed mode significantly reduces the 64 bit overhead by about 25%.
If we do the same thing with 32 bit JVM (-d32 on Mac) then we see
- 32 bit POJO 523.6MB
- 32 bit BYTE 261.0MB
You can see 64 bit has quite a large overhead on Sun. Compressed 64 bit isn't so bad but still isn't as good as straight 32 bit. This is why we recommend 32 bit JVMs with heap sizes under 2.5Gb. It's just most efficient from a memory point of view.
In 32 bit mode, here is a break down for the POJO mode:
- num #instances #bytes class name
- ----------------------------------------------
- 1: 6016578 230933320 [C
- 2: 6017049 144409176 java.lang.String
- 3: 1000000 80000000 com.ibm.ws.objectgrid.plugins.CacheEntryHeapFactory$HeapEntry
- 4: 1000000 32000000 sample.bench.Person
- 5: 342796 19860816 [Lcom.ibm.ws.xs.util.AbstractMapEntry;
The BYTES model looks like:
- num #instances #bytes class name
- ----------------------------------------------
- 1: 1005960 108283256 [B
- 2: 1000000 80000000 com.ibm.ws.objectgrid.plugins.ByteArrayCacheEntryFactory$ByteArrayEntry
- 3: 1016742 25359464 [C
- 4: 1017199 24412776 java.lang.String
- 5: 342802 19861344 [Lcom.ibm.ws.xs.util.AbstractMapEntry;
So, our overhead here is 80 bytes per byte array entry, the values are stored as byte[]s in the 108MB chunk and the hybrid hash map takes again 19MB. So, the overhead is still 99 bytes per entry but each entry is stored more efficiently than in POJO mode and it is this saving which makes the big difference in the heap sizes seen above.
Applications interacting with the grid remotely typically use byte mode. Applications running collocated with the grid for highest performance typically use POJO mode in combination with other features but thats another blog entry.
The WXS code about to be released drops the overhead down further to 64 bytes rather than 80 and that will be available soon.
Each partition primary/replica shard is kept in its own area and we've had that design from the beginning because if we kept all data in all shards in a JVM in the same hash map then splitting them is expensive so we never did it that way.
Hopefully, this helps customers size heaps and memory. Obviously, we are continuously working to improve memory density so we can change this when ever we choose to so individual class names we use are not guaranteed at any time.