Cloud Backup, Redux
Back in April I wrote a post called Overhauling my Digital Life, in which (amongst other things) I wrote about signing up for a cloud backup service.
At the time I picked ADrive as our storage provider for a couple of reasons – the price is extremely reasonable ($25 a year for 100gb) and the fact that they support rsync, which makes it extremely easy to write a backup script or two and have the server run them periodically.
This week as I was taking a look at the logs from my backup script I noticed something alarming: I’d used up all of my 100gb quota and my backup jobs were failing as a result.
I thought for a while about what I should do about this. ADrive’s next account level up offers 250gb storage – 2.5x as much – but is also 2.5x the price at $62.50 a year. If you survey the cloud backup marketplace as I did eight months ago you’ll find this to be an extremely reasonable price, but it doesn’t feel like good value to me for a couple of reasons. For one, I would prefer to see reduction in the per-GB cost if I’m going to move up to a larger account and on that basis there’s no difference to what I pay now for my 100gb plan, but also because that’s much more storage space than I actually need. Buying an extra 150gb of space to store the one or two extra gigabytes that don’t fit in my 100gb plan just doesn’t feel sensible.
When I looked at ADrive in the first place one of the alternatives I considered was Amazon’s AWS. If you’re not familiar, Amazon sell services like storage and cloud computing power and they have some pretty big customers – they’re the service that powers Netflix and Instagram, amongst others. The reason I didn’t choose Amazon in the first place is that they really aren’t a consumer-focused service and you need to have a much higher degree of tech-savvy to be able to use them. They’re also a little more expensive than ADrive for the storage volume I need (3₡ per GB per month comes out to $36 a year for my 100gb backup), but their pricing model places no upper limit on the amount of storage you could use and you pay only for what you do use. Perfect.
They also offer an option called Glacier which on the face of it seems perfect for what I want – it’s a third of the regular price and it’s designed explicitly to be backup storage: if you need to restore files then you may have a couple of hours of waiting before they can be made available. That would be fine, except I do incremental backups – each week, month or quarter (depending on what’s being backed up) I synchronize the backup with what’s on my server, sending only files that are newly created or changed. In order to do that my backup tool needs access to what’s already in the backup so that it knows what it needs to send. Glacier was a non-starter for this reason.
Regardless, I’d all but decided to give AWS a try and I’d signed up for an account and created a storage “bucket.” I was reading online about tools that offer rsync-like functionality but can upload to AWS storage. I’d found one that looked good, and had noticed that it supported a variety of storage providers in addition to AWS. One of the other providers supported was another cloud services company that you might have heard of: Google.
I use Google’s consumer services pretty heavily (I have an Android phone and tablet, so it makes sense to), and in fact I’d used Google’s App Engine service once before for a previous project, but I’d never really realized that App Engine is part of a wider Google Cloud Platform offering that includes a cloud storage service very similar to Amazon’s S3, but costs just 2₡ per GB per month. This makes it cheaper than the service I was already getting from ADrive (by a whole dollar a year). I was sold, and I signed up.
The next step was to convert all my rsync-based backup scripts to send data to Google instead of ADrive, and this was extremely easy. Google offers a command-line utility called gsutil which can be for a variety of functions, including incremental, rsync-style file copying. The whole thing (from signup to having the scripts done) took just a couple of hours (including the time it took me to find and read the documentation). The documentation was absolutely necessary here: in contrast to the intuitive ease of use I’ve come to expect from Google’s consumer services everything I did to set up my cloud platform storage seemed foreign and complicated. You really do need a decent amount of technical knowledge to understand it. That’s not a comment against Google, necessarily: I would assume AWS is much the same. It feels complicated because it is complicated. Cloud Platform is a set of tools for developers to use however they see fit, not a single-task solution for consumers like I was used to.
Anyway, everything was set up, and I was happy… except for one thing. I still had to get the 100gb or so of data from my home server to Google’s. My backup scripts were done and would take care of that for me when they were next run, except I knew it was going to take a very long time for them to start from a blank slate. When I originally set up my ADrive storage I’m pretty sure it took several weeks to run the initial backup, and I’d had to run them only at night because sending that much data used up all our available bandwidth.
Really what I wanted was a method for importing data from ADrive to Google. If I could do that then I wouldn’t have to send 100gb of data from our home server at all, I could just move things across and then my backup scripts would take care of any changes since the last successful ADrive backup. There’s no such service, but wait! Google Cloud Platform is for developers to create their own services, why not build what I needed?
When I’d signed up for Google Cloud Platform they’d given me $300 credit with a 60-day expiry, intended, I guess, to help me play around and get my app off the ground. I’d dismissed it – the only service I needed was cloud storage, and to chew through the $300 before it expired I’d have to store 7.5tb of data. But the credit allowed me to explore the other Cloud Platform offerings and more or less use whatever I wanted for free during those initial 60 days. In a couple of clicks I’d provisioned and started a linux VM on Google’s infrastructure and was at the command prompt. I wrote a two-line script to download my backup from ADrive (using rsync) to the VM, then re-upload it to Cloud Storage (using gsutil). Our home internet connection would probably max out at about 300kb/s upload – less with ADrive where their infrastructure also seems to be something of a limiting factor. Downloading my data from ADrive to Google’s VM is not super-speedy either at an average of about 1mb/s, but re-uploading it to my Cloud Storage bucket races along a pretty staggering 12mb/s and, most importantly, all this happens without clogging up my home internet connection in any way.
The VM is running and doing its thing as I write this. I expect it to finish in about 10 hours time, at which point I’ll run the backup scripts on my home server to upload anything that was missing from ADrive and we’ll be done.