I often talk about how much I like clustered caching for a certain set of large scale systems (Give your DB a break).
I recently had the pleasure to add some simple caching to a Rails app. This time it didn’t make sense to add memcached and friends (some interesting work there with ActiveRecord support) because the problem was simple:
One page in a stats package that would be hit by ~4 users, a couple of times a day was very slow (30 secs) to dynamically generate. The data is NOT time sensitive.
Since this piece didn’t need to scale to thousands of concurrent users, and stale data was fine, we thought it would be enough to add some HTML caching.
One of our requirements was that we didn’t want one poor sucker to have to wait 30 secs to fill the cache. We wanted the cache to fill itself. Again, we ended up with a simple solution to this rather than some killer cache work.
Here are some steps though:
1. Rails Page Caching
At first it seemed simple to just add page caching.
All you need to do here is:
- Add “caches_page :action1, :action2″ to your controller
- Have some way to expire the page. You can create a Cache Sweeper, or have an action that expire_page’s things, or simply have something that nukes the html file.
Sounds great! However, all that is happening here is the caches_page
will make sure that an after_filter is applied that saved the response content out to the file system. By default, a .html file is created in the public directory so apache can just slurp it right up and serve it. Very fast.
The sweeping is where you have to write some code, but it is simple to do, so you can’t totally complain (although it could actually be even simpler: I would love to see a system where you have a cache config so you put in expiration info in there and rails auto handles expiry).
The problem is that this full page caching only works if you CAN actually cache the entire page. Erm, surely that is most of the time right? Well, not really. If you have any user-specific work on the page it will not work. This could be as simple as showing a login/register area for non-logged in people, to a ‘Welcome Dion’ link to the users account if they are logged in. If you have special UI changes for admin users it will not work (e.g. add ‘Edit’ links next to content to give your admin users a simple interface).
This all means that you often can NOT use full page caching (and this was the case for us).
2. Action Caching
Another option here is to cache an action rather than the full page. The only difference is that: “unlike page caching, every request still goes through the Action Pack. The key benefit of this is that filters are run before the cache is served, which allows for authentication and other restrictions on whether someone is allowed to see the cache.”
So, it lets you still do filters (which we would need to do as we wouldn’t want the stats to be viewed by anyone), but still has the full page limitations.
3. Fragment Caching
The first two items sit on top of Fragment Caching.
Fragment caching is used for caching various blocks within templates without caching the entire action as a whole. This is useful when certain elements of an action change frequently or depend on complicated state while other parts rarely change or can be shared amongst multiple parties.
So, we could simply wrap the content that took a long time to load with:
<% cache do %>
... all of the long running pieces ...
<% end %>
And we can tweak keys via:
<% cache(:action => "list", :action_suffix => "all_topics") do %>
Now we finally have just a piece of content cached, so the full UI is still dynamic (get the logged in features) and all of the filters will of course run.
By why out of the box does this version still take a long time to load?
- If you are testing in development, caching will not happen for you because config/environments/development.rb will probably have:
config.action_controller.perform_caching = false
You are used to thinking of this in the form of “in dev mode Rails reloads files that I have changed”. In this case it is the HTML caching too. To test, change this setting to true in dev mode, and remember to change it back later ;)
- Where are you DOING the long running task? Just because you have a cache() wrapping the VIEW doesn’t mean that the controller isn’t running the action. If your action has
@data = long_running_method
that the view then uses, the caching will do nothing for you. Make sure that the action doesn’t do the long running thing, but that is kicked off from within the cache block itself. E.g. literally by calling the long_running_method, or by doing a render passing in the long running piece (which is inside the cache)
Ok, done right? Not quite. This has just created the cache file. Unless you want this to happen once and have that same version used for the rest of time you need to clean up.
We simply put something in cron to expire that fragment every X minutes (as it was simple). When working on an app that does more of this caching, then we create a cache config object that knows more about expiration and does the deed for you.
Also, remember the requirement not to screw the first user after expiry? The cron script does this in a hacky way. After it nukes the fragment it accesses the url of the page immediately to kick off a new cache page. This isn’t totally trivial as in our case the page is behind authentication and some filters that can trip the script.
To solve this we have a cookie that the process uses to get through the auth piece (security worry) and we turn off one of the filters (cookies_required) for this beast.
Conclusion
Rails has a bunch of built-in page caching mechanisms, but they aren’t THAT useful out of the box. You need to tweak and play around to get what you need, and most of the time you will NOT use the simple solutions.
For our large scale sites we still love the event-driven memcached approaches.
What have you done?