A lot of people may think that eXtreme Transaction Processing (XTP) is a new thing. I disagree. XTP systems have been around as long as computers have processed transactions and the definition of what extreme is has been changing with Moores law. In the 70s/80s/90s, TPF/CICS/Tandem/Encina/Tuxedo systems defined XTP and still are. They handle thousands of flight reservations, credit card authorizations per second, bank account deposits/withdrawals. Mainframes were viewed as the ultimate transaction processing systems and for certain type 1 workloads they still define state of the art even today. In the last ten years a new style of XTP has emerged with companies like google, yahoo etc using tens of thousands if not millions of servers provided hyper scalable applications for the web. Investment banks doing algorithmic trading have similar issues with a market data event explosion. Exposing corporate business applications on an ESB as a corporate service or on the web has also caused workloads to increase to surprising levels. This doesn't redefine what XTP is or make the other systems suddenly not XTP any more. It's a new style of XTP. I'm going to try to define XTP as two basic types, Type 1 and 2 with Type 2 having two further subtypes, 2a and 2b.
I think the market for transactional systems can be classified as follows:
Type 1: Non partitionable data models
This is implemented using traditional big iron systems with stateless application servers fronting them. The data model isn't cleanly partitionable and the only way to scale up the datastore as a result is a large SMP or mainframe. Applications can run adhoc queries or joins over the data model making partitioning difficult if not impossible. The data model doesn't fit a constrained tree schema. This is what I call type 1. These will scale until the database saturates. This is just fine for a lot of customers. It's your classic 2 tier architecture. The kink here is that if the data model isn't partitionable then it's extremely difficult to convert to a Type 2 system. But, for a lot of customers, the load level will not reach that point so again, thats fine. If there is no business need to provide hyper scaling then thats great and traditional client/server, two tier architectures have, are and will continue to work perfectly for these workloads. These systems can still process thousands of transactions or maybe tens of thousands of transactions per second. They just may not be able to process hundreds of thousands or millions of transactions per second and they probably will never need to so there is no need to redesign (if even possible) or architect these systems.
Type 2: Partitionable data models
Here, the data model is a constrained tree schema. This means it can be relatively easily scaled out as the data model and transactional data access pattern fits on to a partitioned architecture cleanly. We can divide type 2 in to 2 further sub types.
Type 2a: Partitionable data models, limited need for scale out
Just because the datamodel is a CTS doesn't mean we have to run out and build these applications with products like extreme scale. These systems can start off looking a lot like a classic type 1 and may never need to evolve past that. The big thing with a type 2 system is that moving from a type 2a to a type 2b style architecture is relatively easy given the data model and transaction types suit partitioning and so can leverage modern technologies like Extreme Scale or its competitors. Other technologies in this space are Apache Hadoop HDFS/HBase, Amazon SimpleDB, or Microsoft Azure scalable data services. There are two types here which I call 2a and 2b.
Type 2a are applications that don't need hyper scale out. Even if their business grows to its maximum size, a large SMP box or 390 can host the database for the application. The implementors of these applications have choices. They can write them traditionally with a SMP database and a stateless front end. This will work just fine for a lot of applications. As the load increases then they may switch to an in memory database and keep going but eventually that box will max out also. Next, they could shard the database and run multiple instances with application level routing. This requires a lot of infrastructure to automatically handle failures and database shard placement etc as described in the blog entry. Another option is to use a Extreme Scale like product between the traditional database and the application in combination with write behind and scale out the app tier and extreme scale tier. This will provide most customers in 2a ALL the scalability they would need. A small percentage of customers may still saturate the database on a large SMP and technologies like SolidDB could be used or a manually sharded database/SolidDB could be used.
The transistions are:
- Database + stateless app tier
- Sharded database/In memory database + stateless app tier
- Database + eXtreme Scale (IMDG) + grid collocated or front end stateless app tier
- Sharded database/in memory database + eXtreme Scale (IMDG) + grid collocated or front end stateless app tier.
Type 2b: Partitioned data model, full on hyper scale out
Finally, we have 2b. 2b is full on scale out architectures. I think the biggest difference between 2a and 2b is that the relational or sharded relational database gets replaced with Amazon SimpleDB or Hadoop HBase etc. The architecture and all it's middleware components are fully elastic. These are designed from the ground up for hyper scalable systems. These are architectures which automatically can scale to thousands of boxes while managing their availability and elastic scaling automatically. Products in this space are automatically sharded databases (don't exist right now to my knowledge), Extreme Scale and its competitors, Couch DB, Apache HADOOP, HDFS,HBase and for lower qualities of service, caches like memcached etc.
Scale out companies (over the last 5-10 years) traditionally have started with 2a systems. That was easiest and as their business became more successful and grew, they transitioned to 2b. Today, if someone was starting a business like that then given the availability of cloud services like Amazon EC2 etc then such companies would likely start with a 2b from the start given the availability of these technologies.
Summary
So, in conclusion, having these classifications for XTP workloads is useful. It clarifies and separates the various types of applications in to two broad camps. It also shows where technologies like WebSphere eXtreme Scale fit most cleanly and it's in 2a/2b. It can be used in type 1 systems but the lack of a partitioned data model etc complicates this significantly.
Cloud is an interesting trigger in that it's current availability moves the starting point for from 2a to 2b for many startups I think.