Synchronizing Files Across Multiple Servers with Amazon S3
Synchronizing Files Across Multiple Servers with Amazon S3
Jason Jason Photo By Jason Jason, May 10, 2016

I'm still working with a lot of video files on a daily basis. Most people would simply upload their videos to Youtube or Vimeo and call it good, but when you charge a membership fee for your video content, uploading your content to a free service isn't always a good idea, primarily because you lose some control over your content. That's why it was necessary for me to figure out a way to have one central collection of my videos that would sync accross all our servers automatically. Though I went through this process using video content, the same principles can apply for any type of file data you might have.

The Goal

The goal of this process is to have one central repository where files can be stored and then synchronized across all your other servers automatically. Then if you need to upload new content or change existing content, all you need to do is change it on the central repository and any changes will be automatically synced accross all your servers.

Using Amazon S3 as the master file repository serves multiple purposes; it's cheap, it's professionally backed up, and it's secure. But there are many other services out there that fulfill the same purpose. I just like Amazon S3 the best, personally.

Upload Your Files to Amazon S3

Upload files to Amazon S3. Our process involves uploading a video to our video processing server before it gets sent to Amazon S3. This video processing server takes almost any video format, stabilizes it, re-encodes it to different resolutions (example: 1080p, 720p, 480p, 240p, etc.), transcribes it (optional), and then uploads the finished product to Amazon S3. You can read more about our video encoding process by reading another blog post, Recursively Transfer Entire Directory to Amazon S3 with Laravel 5.2.

Once the files are uploaded into your S3 Bucket, you'll need to login your server that will receive a copy of the files from S3. For the sake of simplicity, we'll just call this server mirror1.bakerstreetsystems.com.

Installing the Amazon Command Line Interface

After logging in to mirror1.bakerstreetsystems.com via SSH, we will need to install the AWS (Amazon Web Services) CLI (Command Line Interface) Tool. On Ubuntu 14.04 LTS, this involved completing the following commands:

sudo pip install awscli

If this command doesn't work, you might need to install Python or Pip, Python's package management system. You can read more about how to do this by following Amazon's installation instructions: Installing the AWS Command Line Interface.

Configuring the AWS Command Line Interface

Once the AWS CLI tool is installed, you'll need to configure it to use your AWS access keys. Simply type in the following command:

aws configure

This command will the prompt you for your access token, secret key, default region, and response format. Here's and example of what you might expect:

AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: us-west-2
Default output format [None]: json

To get your AWS Access Key ID and AWS Secret Access Key, you'll need to configure "Security Credentials" from within your AWS Account. Here's a screenshot of what to click:

Screenshot of where to go to view or edit Amazon Web Services (AWS) security credentials

Synchronizing Files

Once your AWS CLI is all configured, you just need to run the following command:

aws s3 sync s3://[YOUR BUCKET]/[PATH TO FILES] /[PATH ON YOUR SERVER WHERE YOU WANT FILE TO GO]

Setting Up Cron Job

After you've successfully synced all your files from Amazon S3, you might think about setting up a cron job to automatically fetch any changes made to the master Amazon S3 files and sync them to your server. In my case, checking Amazon S3 for changes every 5 minutes or so is probably good. So to setup a cron job, type the following at the command prompt:

crontab -e

Then add the following at the bottom of the file before saving:

*/5 * * * * aws s3 sync --delete s3://[YOUR BUCKET]/[PATH TO FILES] /[PATH ON YOUR SERVER WHERE YOU WANT FILE TO GO]

Notice the --delete option in this command. This option ensures that if you delete something from the Amazon S3 bucket, it will be deleted on the synchronized server as well.

References

http://docs.aws.amazon.com/cli/latest/userguide/installing.html

http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html


Tags & Categories

Video Command Line Linux