I'd tried to use a distributed grid with Lucene before. I was at a customer with the competitors doing a face off type engagement. We (the competitors and myself) both implemented Directory implementations for Lucene on top of our respective caching products. The implementation I wrote is on github here. The customer had a very large lucene index in memory, abouit 30GB and they were seeing very large GC pauses (minutes) using the RAMDirectory supplied with Lucene. The plan was to use a grid based directory to move the indexes in to a shared remote grid.
This proved much slower than the RAMDirectory the customer was already using. Lucene was too 'chatty' with the Directory and this means lots of RPCs to the grid. The Directory abstraction is too low level. We tried solutions like near caches but depending on the query, the near cache could need to be very big thus making the solution not viable. This ultimately meant that the customer didn't go forward with either grid as a directory store.
However, I was reading about Solandra recently and they have what looks like a better approach. The problem we had was that just implementing a Directory is too low-level. There ends up being too many RPCs between the Lucene engine and the grid. Solandra implements a Cassandra store underneath Lucene. They did not implement a Cassandra directory which would have had the same issues as I just discussed. Instead, they implemented a higher level plugin. They moved up to implement IndexReader and IndexWriter and this allows them to push some of the search down in to Cassandra's native index/query support rather than just use it as a block store. This seems to reduce the number of RPCs between the store and the lucene engine considerably.
This looks like a much more performant approach and one that should work well with a data grid also. If I get time then I may try to implement this with a datagrid and post results.