Marc Hughes


Home
Blog
Twitter
LinkedIn
GitHub
about
I am a developer from a bit west of Boston.

Cloudfront and GZip

25 Mar 2013

This morning, I worked on a new way of deploying media on scrumdo.com. When I originally set it up years back, we uploaded all of our media to Amazon S3, pointed a Cloudfront (their CDN) distribution at it, and was done with it.

But, cloudfront doesn't gzip. So as our CSS and Javascript kept getting bigger, the pages were taking longer and longer to load.

There are some hack-ish ways of making S3/Cloudfront gzip content, but they seemed more trouble than their worth.

A while back, Amazon introduced custom origin's for cloudfront. What that means, is you could point cloudfront at your own web server instead of an S3 bucket. Any headers & content your server sent out would be cached in cloudfront for you. So today, I decided to start taking advantage of that.

First, I set up my Django app to collect it's static files into apache's web directory.

local_settings.py

STATIC_ROOT = "/var/www/html/static"

Then, I set up Apache to actually serve those files.

AddOutputFilterByType DEFLATE text/css
AddOutputFilterByType DEFLATE application/javascript
AddOutputFilterByType DEFLATE application/x-javascript
AddOutputFilterByType DEFLATE text/javascript
AddOutputFilterByType DEFLATE text/x-javascript

AliasMatch ^/static/v[0-9]+/(.*) /var/www/html/static/$1
Alias /static /var/www/html/static/
<Directory /var/www/html/static>
    Order deny,allow
    Allow from all
    ExpiresActive On
    ExpiresDefault "access plus 1 month"
    Header append Cache-Control "public"
</Directory>

The AddOutputFilterByType directives tell apache to compress those MIME types.

The AliasMatch line makes any request to /static/v/ be served directly by apache instead of the Django app. I added a wildcard in there so I can set long expiration dates and version the media via url. This means any of these url's point to the same file:

http://www.scrumdo.com/static/v1/images/small-logo.png
http://www.scrumdo.com/static/v2/images/small-logo.png
http://www.scrumdo.com/static/v3/images/small-logo.png

The Django app is smart enough to change it's STATIC_URL param based on some deployment options to take advantage of this. Overall, this reduces the amount of If-Changed-Since requests with 304 responses. It has the down-side of effectively invalidating ALL cached files when we put up a new release. (I know there are more elegant solutions here, but this is good enough for us for now)

Next, I set up a new cloudfront distribution to point to my ELB (Amazon Elastic Load Balancer)

http://cdn2.scrumdo.com/static/v3/images/small-logo.png

If you click that link, here's what happened:

  1. Your browser asked cloudfront for the file
  2. Through some DNS magic, you got a cloudfront edge location close to you
  3. Very likely, someone else had previously requested it, so Cloudfront returned the file from it's cache
  4. But if not, Cloudfront made a request to our load balancer, that forwarded it along to one of our app servers, and eventually the response made it back to you.

In our next major release, we're combining and minifying a bunch of CSS and JS files into a single file (for each type) using grunt.js, which should help speed up the page loads even more.