« March 2009 | Main | May 2009 »
Posted at 04:33 PM | Permalink | Comments (0)
|
Write behind is a technology to write updated records asynchronously to a backend such as a database. The application can 'commit' changes to replicated memory only and then periodically, all updated records are written to the database in a large batch. This eliminates the database from the response time for updates which improves performance and drastically lowers the load on the database by combining lots of small transactions in to a few big ones.
<objectGrid name="Grid">
<bean id="TransactionCallback" className="purequery.loader.PQTxCallback">
<property name="connecturi" type="java.lang.String" value="jdbc:db2://172.16.129.133:50000/SAMPLE"/>
<property name="username" type="java.lang.String" value="db2admin"/>
<property name="password" type="java.lang.String" value="db2admin"/>
</bean>
<bean id="ObjectGridEventListener" className="utils.WriteBehindDumper" />
Posted at 09:46 AM in WebSphere eXtreme Scale | Permalink | Comments (0)
|
I wrote this a couple of weeks ago for the redis API I'm working on. It implements a generic purequery Loader which works for any POJO following some simple rules. The POJO must have public attributes with scalar types. The attributes must have the same name as the database column. Any attributes used as keys must have an Id annotation. The key and value POJOs must share the same key attributes with the id annotations. So, each map has two POJOs, a key and a value. Both have all the key attributes annotated with @Id (the WXS one, not the JPA one), the value POJO also has the non key attributes.
<objectGrid name="Grid">
<bean id="TransactionCallback" className="purequery.loader.PQTxCallback">
<property name="connecturi" type="java.lang.String" value="jdbc:db2://172.16.129.133:50000/SAMPLE"/>
<property name="username" type="java.lang.String" value="db2admin"/>
<property name="password" type="java.lang.String" value="db2admin"/>
</bean>
The Map using the Loader looks like this:
<backingMap name="list-item-string-long" writeBehind="T5;C900" lockStrategy="PESSIMISTIC" evictionTriggers="MEMORY_USAGE_THRESHOLD" ttlEvictorType="LAST_ACCESS_TIME" timeToLive="600" pluginCollectionRef="list-item-string-long"/>
Here is an example loader specification:
<backingMapPluginCollection id="list-item-string-long">
<bean id="Loader" className="purequery.loader.GenericPQLoader">
<property name="tableName"
type="java.lang.String"
value="LISTITEMSTRINGLONG"/>
<property name="className"
type="java.lang.String"
value="data.list.ListItem"/>
</bean>
</backingMapPluginCollection>
Classpath = /Users/bnewport/Desktop/ObjectGridTest/objectgrid.jar:$DEVDIRpq21/db2jcc_license_cisuz.jar:$DEVDIR/pq21/db2cc_license_cu.jar:$DEVDIR/pq21/db2jcc.jar:$DEVDIR/pq21/pdq.jar:$DEVDIR/pq21/pdqmgmt.jar:$DEVDIR/apache-tomcat-6.0.14/lib/tomcat-dbcp.jar
Posted at 11:32 AM in WebSphere eXtreme Scale | Permalink | Comments (0)
|
Check it out on you tube here. Purequery, singletons, caching stored procedures, caching large result sets as pages.
Posted at 06:13 PM | Permalink | Comments (0)
|
The test application I'm using with the Redis API skin is causing some grief today. I had some bugs and some attributes didn't get written but some did. Now, it's a bit of a mess. Makes me wish for transactions. Half applied attributes are no fun to debug. An example might be a User. This is stored a two entries in Redis style. When it all works then it's nice but any problems at all and you'll wish for transactions.
Posted at 10:38 PM | Permalink | Comments (0)
|
I’m impressed with the API. I have it working now and we’re in the process of testing it internally. The current setup uses all public APIs of IBM WebSphere eXtreme Scale and persists to a partitioned replicated grid backed by a DB2 database. I’m currently planning to ship it as a sample with full source code for WebSphere eXtreme Scale V6.1.0.5 very soon. Write behind logic is used to take the database out of the picture from a performance point of view for most use cases.
The database uses a fixed schema and I’m using IBM purequery as the interface to the database so it’s pretty fast and lightweight especially with the write behind logic turned on for batching.
The API layer is extremely similar to Redis except for the following restrictions/additions. These restrictions are easily removed with a small amount of coding for the extra types that are desired. We'll probably add support for Boolean,byte[],Double and Date before shipping the sample.
The API is very easy to use. I’ve been showing it around and the best comment so far is that it looks very grailsy and that’s pretty accurate. Everyone likes it that’s seen it.
From an ease of use point of view or just to hack something together then it’s pretty cool. Very easy to use and explain to someone and you can get productive with it very fast.
Installing it is straight forward. Get DB2, make a database, add 8 tables. Get Purequery jars. Get IBM WebSphere eXtreme Scale, link the library on the classpath, start servers using the supplied xml configuration files. Then use the client library (Java only) in your web application or console application and just code away. Purequery apparently works with other databases besides DB2 so it's not limited to DB2 but it'll definitely go faster with it due to the non JDBC layer used by purequery to talk with DB2.
It scales out nicely. The entries are dynamically partitioned across the set of live server JVMs and IBM WebSphere Scale transparently scales everything out if you start additional JVMs while it’s running. The extra JVMs allow more CPUs/network and memory for servicing requests on the entries. You can configure the level of fault tolerance that you want. Synchronous replication or asynchronous replication. A modest server should be able to handle tens of thousands of requests/sec with very low latency (single digit milliseconds).
Implementation wise, it’s been interesting. The highlights of the implementation so far would be writing a generic pure query POJO loader and using the grid as a 2 layer persistence store to be able to manage the list and set entries. This 2 layer approach was necessary because the application wants to store simple lists and sets as entry values. Persisting this to the database directly wasn’t efficient as the Loader doesn’t know which elements are added/removed to the list/set. The solution was to use multiple maps with agents. The client API library uses Agents to implement the List and Set primitives and they use 3 maps for a list or set. Map 1 has a string key and the value is a List/Set of Longs. This map is queried using normal Map.get verbs on the client to get the current list if needed. This allows the near cache to be used for the list if required. Map 2 is a database backed Map for the list/set header. Map 3 is a database backed Map for the List items or Set items. The agents job is to keep all three maps in sync. Write behind is used for Map 2 and Map 3 to get the database out of the critical path for updates. It sounds a little complicated but it's not a lot of code here.
The database schema is key/value tables (two columns) for String/String, String/Long, Long/String and Long/Long key/value storage and then ListHeadStringLong, ListItemStringLong, SetHeadStringLong and finally SetItemStringLong. So 8 tables in total. I wrote the Loaders and APIs using generics so it’s very easy to add say a value type for Date or Double for example. Examples of these are in the code but I haven’t made the extra tables. Date is easily replaced with the Long support. Dates are just longs anyway counting milliseconds since 1970.
The biggest issue/advantage with it is that the application orients towards a key/column or column oriented style. Each entity, for example: user, ends up having a key/value pair for each attribute. User might have 3 attributes:
This means the memory used is higher than if a POJO was used containing all 4 attributes. But, if we did this then we’d need a BLOB value or we’d need OR mapping which makes it more ‘clumsy’ for want of a better word. I tried trying to add structure but it make it less turnkey and harder to get going so the column oriented approach is what it's using.
The column oriented approach is very easy to use but is not schema oriented. It makes it very easy to add attributes on the fly without restarting the application or grid. This is attractive when prototyping or working with a production application that’s in flux and getting developed at a fast pace.
Reports against the database tables are not easy. The schema doesn’t make sense for most reporting tools. You’d need an export to CSV or another table to do it. But, the trade of here is that it’s easy to develop with despite all this and that’s what makes it very cool to prototype with.
Sharing a grid between several applications is pretty easy also. You can imagine using a prefix on all the keys for a certain application, effectively isolating it for the others.
Overall, I like this a lot. It’s quite powerful and a lot of simple applications could use an API like this and get productive with it rapidly. The fixed database schema and column oriented approach makes it a rapid development system and it’s very flexible/extensible at runtime. WebSphere eXtreme Scale makes the whole thing scale very well as it runs an elastic partitioned in memory data grid in front of the database all but removing the database from the performance picture completely. The ability to turn on the near cache for gets means that in our sample applications, remote calls to the grid are only needed for set/remove and list/set operations which are usually much rarer than get operations.
I think once the sample is published, I can see quite a lot of projects using this API instead of the normal one simply because of the column oriented style advantages and turnkey operation.
Posted at 02:14 AM in WebSphere eXtreme Scale | Permalink | Comments (0)
|