It looks like memcached and redis users typically store data in column oriented style. This means that rather than store a Person object using the person key, they instead store each attribute of the person as a seperate key, for example:
<P:123:FirstName,Billy>
<P:123:Surname,Newport>
<P:123:Country,USA>
So, if the object had 3 attributes then there are 3 entries. This is flexible in that it's easy to add attributes when they are needed. The store object approach makes that more difficult as the value has a type and it's not easy to add a new attribute. You could store a HashMap as the value which means adding an attribute is easy.
So, you get flexibility with the entry per attribute approach but you pay a heavy performance cost. If a page uses a Person data entry then you may need to make several remote calls to the cache to get this one person. If there is a list of friends then you'd do this for each person. The number of RPCs per page starts to add up pretty fast. I'm pretty sure this is why you see blog entries etc on why memcached needs crazy fast response times, it's simply because one page calls it so often. So, while for environments where extreme schema flexibility is needed, it's an option. You pay a heavy price in terms of performance.
An entity based approach where you store the object at the key uses less memory on the server simply because there are fewer entries per object, i.e. there is one versus one per attribute. Fetching a single entity is cheaper because it's one RPC and you have it all rather than one per attribute. This means that if a page has ten Person objects to be rendered and each Person has 10 attributes then the column approach uses 100 RPCs while the object approach uses 10.
The other issue here is memcached does not offer transactions. Attributes for a single 'object' may hash to different memcached servers and with UDP especially, updates are not guaranteed to work. So, consistency is definitely a problem. Updates may half work. This isn't acceptable to many and the entry per object approach avoids this issue even without transactions because sometimes, all or nothing is better than half way.
I don't think there is a one is better than the other here. For extreme flexibility on schema the attribute per entry model OR the hashmap per entry model is most flexible but it definitely has a cost in terms of performance and consistency. Fetching only exactly the subset of the attributes that you need may be a benefit but the RPC cost cannot be ignored.
Comments