Tuesday, March 29, 2011

Understanding Grails GORM Secondary cache.

Hi,
  I always wanted to find out some detailed doc. which could explain me how GORM is managing Secondary cache (via Hibernate Obviously) for us. Obviously GORM gives us sugar coated way of defining Domains which are alowing us to call dynamic finders like find, findAll, finAllByXXX etc. and also allows us to fire Criteria, HQL etc. on that domain.
  You can get basics of this stuffs in grails doc. which is kind of good enough. But one thing I still feel is lacking the focus in Grails Doc. is the Secondary cache. Its not describing behavior of caching. Although part of it actually related to Hibernate, so thats why its not been docmented well there.

  Reason why I am writing this post is : http://grails.1312388.n4.nabble.com/Effect-of-Enabled-Hibernate-Level-2-cache-on-Domain-td3405253.html

Here are some basics that we should know atleast before going to further details.
  • There are dedicated regions for Domain cache and Query (standard query cache) cache. And one more special region called as "org.hibernate.cache.UpdateTimestampsCache". You will find enough docs online to understand what "org.hibernate.cache.UpdateTimestampsCache" is. So I wont go into its details for now.
  •  Dont forget that Hibernate Queries will get cached only if Secondary Query cache is enabled.
  •  Domain cache puts (a single row) Domain Instance in this cache region with key generated as follows: [Fully.Qualified.class.name:PrimaryKey]
  • Domain cache regions can be futher be customized to store a particular type of Domains only. This is to handle a situation where some domain cache has to expire soon or infrequently, or number of domain to be cached as to be different than other default cache (maxElementsInMemory) size.
  • Standard Query cache region is a seperate dedicated region, created automatically by Hibernate to store the results of the Cached Queries. It stores PK of all the rows that comes as result for the Query being fired. Its key is generated with following things: QueryStatement and Paramters map. This is because same query can be fired to get different result, for example to get paginated view, we keep on fireing same query, what we change is just the offset/start_index paramter in the query. And when results are got, we have to keep it seperate for each page in the cache.
  • If query cache is enabled in hibernate config, (datasources.groovy in our case), still to make sure that Dynamic finders caches it, we have to add cache:true in the params that we pass to the method. But list() method will not require it. It seems to assume put in the cache if query cacahe is enabled in hibernate config (datasource.groovy)
  • Mapping cache true declaration in domain allows domain instance to be part of Secondary cache regions which is dedicated for storing actual domain objects

Scenarios:
  1. Hibernate Cache + Query Cache is enabled, *NO* mapping of cache true in domain class, but cache : true specifieed in dynamic finder (findAll and findAllByxxx and criteria etc):
    •  If we fire dynamic finder like (findAll, findAllByxxx) with Query cache = true.
                i. Query resuls (PK) gets stored in cache. But Domain itself is not Cacheable because we have not mentioned mapping cache true in Domain class, Hibernate wont put it in the domain cache.  So if you hit same flow with same paramter again, Hibernate knows what all rows it requires as it knows PK of all rows in the results set for given query with specified paramters,
               So it fires individuals queries to DB for each expected rows with its PK. So assume that for the first time when Hiberate got 10 results for the given query and its params, if you fire same query with same params, hibernate will fire 10 (YES its 10) select SQL queries which is really * BAD *
    •  If we fire get(PK) method, and assume the PK we are already looking for NOT there in the Cache, it will get it from DB and CAN NOT put it in cache. So that every time you fire get(PK) it will get data from DB.
    •  If you did a insert/update anywhere in the table, whole Query Cache for that domain (of that type) will become invalid. Doesn't matter if Update has any changes in the Domain instance you are interested in or not. Its logically in-correct, but practically its too difficult to implement, and Hibernate guys have taken the obvious and simple route there to handle this situation by invalidating the cache. Domain cache for this cache is anyways ot having Domain o fthat type as cahe true mapping is not declared.
  2. Hibernate Cache + Query Cache is enabled, Mapping of cache true in domain class, but cache : true specified in dynamic finder (findAll and findAllByxxx and criteria etc):
    •  If we fire dynamic finder like (findAll, findAllByxxx) with Query cache = true. Query resuls (PK) gets stored in cache. So if you hit same flow with same paramter again, Hibernate knows what all rows it requires as it knows PK of all rows in the results set, it checks in secondary domain cache for PK, and it finds it there so no need to go to DB. * GOOD *
    • .If we fire get(PK) method, and assume the PK we are already looking for is already there in the Cache, it will get it from Cache, else will go to DB and put it in cache as well apart from returning to you. So that next time you fir get(PK) and cache is still valid, it will get data from cache.
    •  If Domain instance you want is already in Secondary cache, but you did a insert/update anywhere in the table, whole Cache (Domain Cache + Query Cache) for domain of that type will become invalid. Doesn't matter if Update has any changes in the Domain instance you are interested in or not. Its logically in-correct, but I think practically its too difficult to implement, and Hibernate guys have taken the obvious and simple route there to handle this situation by invalidating the cache.
    •  Dynamic finder's results are in Query cache. Suppose it returns 150 Rows. So Domain cache will have to store 150 domains in it. When you dont have enough space in memory to hold all those data, it should ideally overflow to disk store if enabled. This time your domain class will be stored in a disk file managed by eh-cache internally. You mention where to keep that files on disk in ehcache.xml file with <diskStore path="user.home/ehcache_data"/> node.

  I will keep adding more details as I will learn more of this.
All kind corrections or inputs are welcome !! Feel free to comment here or reply me on my email address as mentioned on facebook.

1 comment:

Nooruddin said...

Hey,

I read a blog of yours that explains about secondary cache for domains.

I have one such domain class called EnrollmentHeaderAttempt which is enabled to use secondary cache.
I have one task in which I am replacing EnrollmentHeaderAttempt.findById() with EnrollmentHeaderAttempt.get() for one action to increase performance.
As this action is called large number of times.

Now I am writing integration test for this action. I want to accomplish one thing here. I want to differentiate that in first call what I get is a record from DB and in second call I get it from second level cache. Is there a way I can do this like using name of instance or something like that.?