Amazon S3 counts as a gadget, right? :-) I've been using it professionally for a while, and of course many of the services we take for granted until us-east-1
goes down use it too. Turns out that you can hook it in to a homebrew website without very much work...
The other day I traced a period of terrible performance (8s network latency getting out of the house) to a visit from Googlebot-Video/1.0
fetching an old AVI
file from a post-hoc image stabilization project (now made mostly redundant by youtube's builtin stabilization feature.) The file was about 50M, and anyone interested in the project really wants the less-compressed original, shoving it to youtube really doesn't help... but it turns out that that's tiny by Amazon S3 standards, and the free tier covers it just fine.
There were a surprisingly small set of steps; I'm posting them here with the actual domains involved, since they're visible and public anyway, you just need to convert them to your own needs...
avi.thok.org
although something more generic like s3.thok.org
would have been a common choice. Do this first, because the bucket namespace is global and isn't checked against DNS registration at all, so there's a very faint chance someone already has a bucket of that name; at this stage, if you find a collision you can just pick a different name, like s3-namespaces-can-you-speak-it.thok.org
.s3cmd
(just git clone
the github version and run it from the checkout - the one in ubuntu doesn't actually handle puts with redirects.)s3cmd --configure
and get the Access and Secret keys from the console under "security"; don't bother to configure encryption or https because these are files that are already available by http, you don't want to deal with certificates, and you'll check the md5sums later.s3cmd put --no-encrypt kicx1440.avi s3://avi.thok.org/me/publish/europython/day2/kicx1440.avi
works just fine, without having to do anything about me/publish
directly.s3cmd setacl --acl-public s3://avi.thok.org/me/publish/europython/day2/kicx1440.avi
makes that single file public. At this point, there's a long convoluted url that will fetch this file, and you could stop here and just change the html that points to it, but let's handle this cleanly...avi IN CNAME s3.amazonaws.com.
Carlton Bale gets credit for having the first google hit that actually said this would work. Once you've pushed this through, curl -L -v -I http://avi.thok.org/me/publish/europython/day2/kicx1440.avi
works - note carefully, the -I
gets curl to do a HEAD
(-H
was already taken?) so you get back headers, not 100m of video. You should see the Location
header taking you over to S3, and then a convincing ETag
(md5sum of the file, in this particular case) and Content-Length
.RewriteRule ^/(me/.*\.avi)$ http://avi.thok.org/$1 [R,L]
To pick this apart:RewriteRule
is the apache swiss-army-knife of URL mangling.^
for start, $
for end) and grabs everything after the leading slash (thus the slash is outside the grouping parentheses.) Within this part of the path, it has to start with me/
and end with .avi
but can have anything at all in between; if we wanted literally all AVI files, we'd drop the me/
part, but I have some small ones elsewhere on the site that I didn't want to bother hunting down and uploading.avi.thok.org
to point to the CNAME
we set up above, $1
is the first set of parentheses in the match (so, me/xxx.avi
.)R
says to make it a redirect (and because our result starts with http
it automatically becomes an "external" redirect, in this case a 302, ie. "don't try to fetch this url, just tell the client to go away and find it themselves." You can't get theyah from heah, but you can get there from over there... the L
is for "last" and just says to stop trying and don't do any more rewriting on this particular result./etc/init.d/apache2 reload
or however your system spells that. At this point, you can curl -L -v -I http://www.thok.org/me/publish/europython/day2/kicx1440.avi
(note that we're actually starting with the primary domain here, where the original problem started) and follow our HTTP/1.1 302 Found
and then amazon's HTTP/1.1 307 Temporary Redirect
and the bandwidth problem (remember the bandwidth problem? This song's about a bandwidth problem) is now gone.Future refinements:
[R=307]
and make the first hop a Temporary Redirect as well. Not sure if that's correct, yet, but given that this all started with a search engine bot that wasn't aware of the human-readable "slow (home)" and "fast (MIT)" alternate links, it's worth looking into.thok.org
were more of a CMS, automatically noticing avi files and pushing them to amazon would be a good transparent trick. For a total of five files on a home website? Not actually worth the trouble, even if the logs say I have at least a month before the bot comes around again :-)