The Java community is currently focused around JPA as the persistence API but I think it's missing the big picture. JPA is a combination of several facets which themselves need to be separated into separate specifications so that we can get a better programming model.
JPA consists of the data model which can be viewed as the POJOs + annotations for keys and relationships. There is an API for interacting with a data source to find and query entities within that data model and there is a set of meta data for mapping that data model to a SQL database.
The problem is they are all combined in a single spec. This is problematic because it couples together different facets where are useful in their own right. Lets look at the following problems faced by Java developers.
The web developer
Web session data is still stuck in the dark ages. Session data is kept in a flat Map of key/value pairs. It's difficult for a developer to work with a hierarchical data model for the session which is common and the API is dumb as it doesn't automatically detect changes and just persist those to the Session like JPA can do using byte code weaving or straight copy and compare approaches. Why can't a developer pick a data model expressed as a POJO object graph and then find the root of that in a session and then walk it, update it and have changes picked up automatically. If several modules share the session then why should they all need to use the same POJOs? Why can't each module map it's own POJOs to the data model for the session? The current servlet spec or even the proposed ones are sorely lacking in modernizing the specification down this path.
The work flow developer or JBoss Seam or Spring Webflow developer
The same thing applies here. Why don't we have a standard data model which can be persisted to a flow/conversational flow engine. Why isn't the specification for the data model for a flow projectable on to the POJO data models a developer wants to use. The seam and webflow stuff are attempts to modernize the session management in the servlet spec but without a common data model, annotations for capturing these meta models then we'll end up with a separate incompatible spec and redundant tooling for capturing and interacting with this meta data.
The cache or XTP developer
Most distributed caches still work with a single level Map of key/value pairs. WebSphere eXtreme Scale is the exception here in that it can layer a POJO graph model on top. This is our EntityManager API. We modeled this after the JPA spec for the pieces related to key and relationship definitions. We support different applications sharing data kept in a shared cache even when each application uses a different set of POJOs for the same data. The respective applications just use different metadata for the mapping to those POJOs from a master schema stored in the shared data grid. We built a standalone component within WXS called the projector which has a master meta data and it can then project data stored in a TupleStore to POJOs suitably annotated for the mapping and then push any changes from those POJOs back to TupleStores. We currently implemented TupleStores for our internal Maps of key/value tuples but in theory this could be used to represent any TupleStore (flow instance state or HTTP session). We shouldn't have had to write that component, it should be part of a J2SE specification built in to every JVM. Maybe a first step here once any internal hurdles are overcome would be to contribute this component to the community as an initial building block to start to move down this path.
The database developer
This is the ONLY developer catered to by JPA despite the fact that if this spec was split than many facets, it could be reused in a standard way for the other developer types. It's just a mixed up mess that needs to be pulled apart in to standalone specifications which can be combined to form a standard persistence programming model for all types of programmer.
Summary
You can see there are some parts of the Java programming model which are solely lacking. Flows, caches, conversational state, web developers and database developers should all be shared common specifications for data models, the mapping of that to various non SQL and SQL back-ends and support common query engines, projection/unprojection etc. We need to pull our collective heads out of the sand and look around for the big picture, otherwise, competitively, it's not looking great.
We need these common specifications so that our data based tooling in eclipse etc can be done once and then reused for all these purposes. The latest here is the upcoming criteria API. I'd like to be able to reuse this for WebSphere eXtreme Scale queries for example, why is this spec tied to JPA at all. We need a common POJO persistence spec containing abstract queries, factories, and object meta data like keys or relationships. Layered on top of this and its RI component would be JPA, HTTP Sessions, Caching, Web conversation flows and work flows.
Really? We need all this? How come we have been building applications for so long without this? While things aren't perfect, it works. We are able to build highly scalable reliable applications. It seems to me that if these specs come up, it will take way more time and effort to build applications. Sure, maybe IBM will come up with a product for this and sell it, but as we have learned, buying an expensive product doesn't mean your product will be successful.
Posted by: John Smith | February 02, 2009 at 10:40 PM
Yes, we do. What I'm suggesting is that we need to refactor the JPA specification and then reuse the various facets in places where those facets make sense, provide a common programming model and improve tooling support through common specifications. Reuse is key to productivity. None of this has anything to do with an IBM product, such a factoring would avoid the need to reinvent the wheel when adding new capabilities like DataGrids, conversational state or flow engines. We need reuse here, not reinvention and then continually rev each specification as people innovate in other similar areas.
Posted by: bnewport | February 02, 2009 at 10:50 PM
Well JBoss Seam and presumably Web Beans add the conversational state. I think you're close to something though on the object graph, but you're not going far enough. Why are we still thinking 2 dimensionally? Why can't I walk the object graph via JSON, update the state and at the end of the conversation get optimistic locking semantics on the server (but only just in time and only transmitting the relevant pieces)? Why to achieve this do I have to have that whole smelly detached session sitting in a clustered cache? That is where the real action is. This might put WebSphere Clustering out of business :-)
Posted by: Andrew C. Oliver | February 02, 2009 at 11:53 PM
I'm right there with you. The projector component we did for eXtreme Scale can do this. We look for the changes and then update the server using either optimistic or pessimistic locking which is basically what you're saying. I want to decouple this action from whether the server is a HTTP Session, a datagrid, a database, a spread sheet, an xml document what ever. The projector component we made works with POJOs but in theory a multi language projector could be made and the data sent across the wire doesn't care about where it came from, it just has to take the change log and 'apply' it to the server it's fronting.
We're doing some stuff on the JSON front but can't quite talk about it just yet.
Posted by: bnewport | February 03, 2009 at 01:31 AM
I'm working on this in the open source space as well. My plan is an AJAX adapter for the Hibernate session (and possibly something like Data Mapper http://datamapper.org/doku.php once it matures) that collaborates with a front end Conversation Manager. You might also want to talk to fellow IBMer Sam Ruby (http://intertwingly.net/blog/) he and I had a conversation about a year or two ago about the "web transaction" and some of the work he was doing.
Posted by: Andrew C. Oliver | February 03, 2009 at 10:56 AM
The word "reuse" is mentioned here a lot.
IMHO, if there is really a little chance to refactor JPA spec in order to improve reuse, the leaders should take into consideration that within bigger business (AKA "enterprise") systems, POJO model is actually an outdated approach.
Every entity relationship (I´d prefeer to say "collaboration" instead) occurs within a specific context (Bounded Context in Domain Driven Design).
So for example an entity person can be an employee within a context while being an employeer within another context and it's respective data could be saved into distinct storage systems while it's design should be totally decoupled, probably based in something similar to mixins.
http://www.qi4j.org is the closest implementation of this concept while http://www.research.ibm.com/hyperspace is (AFAIK) the first project trying to implement something similar within JVM, although persistence was out of its scope.
When I look back to hypes and the promises of Object Orientation, then components based systems, then services (SOA) and even (lately) dynamic languages, I feel that what our industry missed was a more elaborated approach for domain modeling.
Persistence and some other cross cutting concerns should be built around it.
Posted by: Rodolfo | February 03, 2009 at 12:14 PM
Agreed Rodolfo,
Thats exactly what we do in WXS. We have a meta model/schema for the central state or common state and each 'client' has a projection on to a set of contextual objects which have a context specific mapping from the common state to the client model. We're not tied to specific POJOs at all.
Posted by: Billy | February 03, 2009 at 01:27 PM
Making distinction between separation of concerns in the JPA api is a good point. (Simply consider if the data store is an OO database, in which case RDBMS specific constructs are shown to be orthogonal to the general use-case of an Entity Manager.)
Not sure about the session context objections. Certainly nothing prevents you to (a) map each module's "model" to a key in the session, and have that object ("value") be a object tree.
Posted by: Joubin Houshyar | February 04, 2009 at 08:59 AM
http://jcp.org/en/jsr/detail?id=299
Posted by: Andy | February 04, 2009 at 09:46 AM
Andy,
WebBeans isn't the answer here, it's for wiring beans together and managing bean lifecycle for now but it's not really about managing relationships of beans like a bill of materiels etc.
Joubin
I don't see how the session stuff works as you say. The session api is just a Map. A value could be a graph but this doesn't handle different modules sharing sessions with different POJOs. I've seen customers keep session data in a database so they can do exactly this kind of sharing which is fine but a database is the wrong place for this kind of very temporary data, it belongs in a DataGrid or similar technology.
Posted by: Billy | February 04, 2009 at 11:11 AM
Agree on many points. I've said for some time that any "persistence" 'spec' should be layered. If other problem domains can make use of those layers then better still.
* persistence API
* query API
* model definition
* mapping definition
A query API should allow plugging in of new query languages, then the query languages themselves can be developed in isolation and just used through the same structure. e.g the JDO Query API allows this at a high level. So if an API like that was in place there'd be no need to rework the EntityManager just to add on some "Criteria Query", or some DSL
A query API should be deployed not just against a datastore. e.g the JDO Query API can be run against a collection of objects.
JPA is way too specific to RDBMS, but then I seem to remember people saying that 3 yrs ago ;-)
Layering of this process also would lead to shorter spec change lifecycle since changes could be focussed on a particular thing in their own spec (e.g new keywords for JPQL would mean that could be just released rather than having to wait for resolution of some Criteria API). So the JCP wouldn't take so long ...
Posted by: Andy | February 04, 2009 at 12:09 PM
Billy,
I totally agree. I brought this up to our IBM rep about a year ago. I was thinking about how we could scale (using XTP) and still use all the work we did in modeling our domain and persisting it via Hibernate without having to redo all that. I am pretty sure he talked to that group about it.
Posted by: Mark | February 04, 2009 at 01:30 PM
Oh, you forgot to include RIA\RCP developer concerns/issues.
Posted by: Mark | February 04, 2009 at 01:32 PM
Mark,
We've solved that issue in WXS now. Our JPALoader support will read through and write [through/behind] to Hibernate using the JPA interfaces. It also works with OpenJPA or any other JPA implementation. So, annotate your objects in Hibernate/OpenJPA and then the builtin JPALoader will push/pull the POJOs through that OR Mapper. This works for lazy read/preload/sync write/async write.
Ok,I'll bite, too many acronyms. Whats RIA/RCP?
Posted by: Billy Newport | February 04, 2009 at 02:10 PM
Fully agree. I have issues with JPA in that, in a large shared backend, abstractions are impossible and we quickly experience query explosion. At this point, I am not sure it's worth it. What I would really like to have LINQ available, let entities be entities and queries to live in the appropriate application layer.
Posted by: Casper Bang | February 04, 2009 at 03:04 PM
RIA == rich internet app (Flex, Silverlight, JavaFx and some AJAX)
RCP == Rich client app (Eclipse RCP, Netbeans Platform)
Both typically don't have "sessions" but rather maintain state on the client and use the server as a service.
Thx for the update on WXS. I am hoping to be needing it soon.
Posted by: Mark | February 04, 2009 at 04:28 PM
I agree with you that persistence is not quite working, but that it's more fundamental than that. OOP in its current incarnation as Class Oriented Programming is not working. The problem, as you see in some of the comments here, is that the majority of developers show the same symptom as Pavlovs electricity-treated dogs: they have become so accustomed to the pain that the mind has accepted it, in order to put up with the daily struggles between what they want to do and what they have to do. Yes, Qi4j and COP fixes a lot of these problems (all of the above even as far as I can tell), but how do you explain to people that they ARE IN PAIN when they have managed to fool themselves into thinking they are not, just to get through the day, that's the question...
Posted by: Rickard Öberg | February 04, 2009 at 10:22 PM
More like Lem's "Trul's Electronic Bard".
Posted by: William Louth | February 05, 2009 at 05:14 AM
Rickard,
JDO supports persistent interfaces so it would work well with Qi4j.
Posted by: Erik Bengtson | February 05, 2009 at 05:33 AM
re: andy. Web Beans is very much a server-side framework that doesn't really answer the problem nor scale well without expensive clustering products.
I want to get data in my AJAX client, change things and do everything in a way that honors transaction isolation, data integrity, etc.
Rickard, I'm interested in hearing ideas that you have other than the usual "I have a secret project that is really great and no one can see it" stuff :-). Shoot me an email or give me a ring.
I'll be publishing within the month.
Posted by: Andrew C. Oliver | February 06, 2009 at 06:04 AM