Automatic site backups using cron jobs in Dreamhost

Dock Street Media

Back up your files.  It's said about a million more times than it's heard. Too often we realize we should have backed everything up right after it is too late - when the computer crashes and we lose everything.  Websites are no different.  They need to be backed up on a regular basis to preserve any changes, including image and file uploads in the site directory and content (usually) in the database.

The system described below automatically copies, zips, and stores your website and database on a backups folder on your server and emails you the compressed archives.  This way you do not have to think about it.   Archives of your site are stored in at least two places: (1) on your web server (e.g. Dreamhost), and (2) on your mail server, which should be completely separate from your web server (e.g. Gmail).  You should have backups on your computer or external hard drive as well, but that is up to you.

What I'll be using:

  1. Notepad++ - to edit the shell script
  2. FileZilla - to upload the shell script and set the right permissions
  3. PuTTY (on a Windows machine) - to "clean" the shell script using a command called dos2unix
  4. Dreamhost -> Admin Panel -> Goodies -> Cron Jobs - to set up the cron job
  5. Gmail - to receive the backup email with the attached site archives

If you are using a different hosting company, then you can still use this backup script, but you'll have to figure out on your own how to make it work on your setup.

Step 1: Create the backups directory

Using FileZilla or your preferred FTP client, go to your home directory and create a directory (folder) called backups.  In Dreamhost, this would create the path "/home/YOUR_USERNAME/backups."

Step 2: Create the shell script

Using Notepad++ or your preferred text editor, create a new shell file called site_backup.sh and insert the code below. If you have several websites, you should rename each script by the site name, for example: dockstreetmedia_backup.sh.

#!/bin/bash

# specific config variables (EDIT THESE)
HOME="/home/YOUR_USERNAME"
SITEDIR="YOUR_SITE_DIRECTORY"
DBHOST="YOUR_DB_HOSTNAME"
DBUSER="YOUR_DB_USERNAME"
DBPASS="YOUR_DB_PASSWORD"
DBNAME="YOUR_DB_NAME"
EMAIL="YOUR_EMAIL"

# other config variables(DO NOT EDIT THESE)
NOWDATE=$(date +"%y%m%d")
NOWDAY=$(date +"%d")
BACKUPDIR="backups"
MYSQLDUMP="$(which mysqldump)"

# check to see if target path exists - if so, delete the old one and create a new one, otherwise just create it
TARGETPATH=$HOME/$BACKUPDIR/$SITEDIR/$NOWDAY
if [ -d $TARGETPATH ]
then
rm -r $TARGETPATH
mkdir -p $TARGETPATH
else
mkdir -p $TARGETPATH
fi

# create a GZIP of the directory inside the target path
tar -zcf $TARGETPATH/${SITEDIR}_$NOWDATE.tar.gz ./$SITEDIR

# dump the data into a SQL file inside the target path
$MYSQLDUMP -u $DBUSER -h $DBHOST -p$DBPASS $DBNAME | gzip > $TARGETPATH/${DBNAME}_$NOWDATE.sql.gz

# email the GZIP files
mutt $EMAIL -a $TARGETPATH/${SITEDIR}_$NOWDATE.tar.gz -a $TARGETPATH/${DBNAME}_$NOWDATE.sql.gz -s "FULL Backup for $SITEDIR"

# print a message for the logfile / output email
printf "t$SITEDIR has been backed upn" | tee -a $LOGFILE
 

Modify the variables above to match your configuration:

  • YOUR_USERNAME: the username you use to log in to your root directory and often the same as the username to access the FTP.
  • YOUR_SITE_DIRECTORY: the folder where your site files are stored and where your domain points.  In Dreamhost, this defaults to examplesite.com when you set up your domain.
  • YOUR_DB_HOSTNAME: the location where your database is stored.
    • In Dreamhost, this defaults to mysql.examplesite.com. You can see the hostname in Goodies -> MySQL Databases.  You can have multiple databases under one hostname and can have several hostnames with access to the same database.
  • YOUR_DB_USERNAME: a user with access to the database.  This could be the same username that you use to access the root directory and FTP, although the two users are set up separately.  You can see the users with access to your site database in Goodies -> MySQL Databases.
  • YOUR_DB_PASSWORD: the password you combine with your database username to access the database.
  • YOUR_DB_NAME: the name of the database tied to your site, e.g. examplesite_wordpress.
  • YOUR_EMAIL: the email you want the site archives to be emailed.
    • I would recommend using a mail client that is not located on the same server, so do not use the Dreamhost hosted email address.  I would also recommend using a mail client with plenty of space.  I use Gmail since they give you 7.5GB of space as of the date of this post.  However, the email part of this system is ideally suited for smaller public websites with non-sensitive material.
    • If you have an extra large site or an extra secret site, then I would not recommend emailing the files at all, but instead manually transferring the archives with secure FTP to a computer, external hard drive or separate  backup server.

Step 3: Upload the file and set the right permissions

Using FileZilla, upload the script to your backups directory.  After it has been uploaded, right click on the file and scroll down to File Permissions.  You can also use CHMOD to set your permissions, by using shell or the Dreamhost WebFTP, for example.  Set the permissions to 700 or read, write, and execute by the owner only.

Step 4: Clean the script with the dos2unix command.

If you created the script on a Windows machine, like I usually do, then when you execute the script you will get an error that reads:

/bin/bash^M: bad interpreter: no such file or directory

Without getting into the specifics, this is because Windows creates invisible characters for line breaks that Linux/Unix does not recognize.  To clean it, open your shell client, such as PuTTY, and log into your shell account.  The hostname will likely be your website domain (e.g. examplesite.com) and the username will be the same as the username you use in Step 5 below.

Go to the backups directory where your shell script sits by using the CD command (ignore "[server]$" as this is your prompt that changes depending on your server):

[server]$ cd backups

To make sure you are in the right directory, use the LS command to view the directory.

[server]$ ls

If you can see the shell script, run the following command to "clean" the script:

[server]$ dos2unix site_backup.sh

That's it.  You should not get the above error message now.  Easy.

Step 5: Set up the cron job.

In the Dreamhost admin panel, navigate to Goodies -> Cron Jobs and click Add New Cron Job.  Add your details in the Creating New Cron Job form.

Dreamhost - cron interface

Some notes on the fields:

  • USER: this must be a user with Shell access.  You can use your main username or any other username with shell access to the root directory.
  • EMAIL OUTPUT TO: You can, but are not required to, use the same email where you are emailing the archive files.  The output will simply be a line stating "YOURSITEDIRECTORY has been backed up and emailed to USER@GMAIL.COM" from the script or any errors that may occur.  You can remove the email address once you know the script is running error-free.
  • COMMAND TO RUN: This is the location of the shell script.  Change the value to match the path of your script.
  • WHEN TO RUN: You can set this to any time interval you would like.  If you have a site that is regularly updated, you can use Daily. For most news-type sites with one or two new posts per week, Weekly will work fine.  Some brochure sites that rarely change should use Monthly.  You can also set Custom time intervals, such as once on the 15th of every month, or on every Tuesday at 3:00 am.  To test if the script is working, set the time interval for 5 minutes from the current time and see if you receive the email.

After a while, your email may start to get clogged up with all these backups, so I would recommend purging all of the old backups from time to time.  I would only keep the last 2 or 3 backups, depending on their size.

That's it.  You now have a system that automatically backs up your website without manually doing much.  If you have multiple sites, repeat these steps using the right configuration variables. The backups directory should look like this:

root
--backups
----examplesite.com
------1
------2
------3
------[...]
------28
------29
------30
------31
----secondexamplesite.com
------7
------14
------21
------28
[etc.]

Each site has a sub-folder for the days that the backup was made. With examplesite.com, the backups were made on a daily basis. Each of the numbers represents a day of the month and is recycled per month. This way, the server does not get too clogged with old backups. Each daily site then has up to 31 days of backups.

In the case of secondexamplesite.com, the backups were made on a weekly basis, so the days are staggered.

UPDATE: Backing up large sites causes the error:
postdrop: warning: uid=123456: File too large
sendmail: fatal: username(123456): message file too big
Error sending message, child exited 75 (Deferred.).
Could not send the message.

This is because the SENDMAIL application cannot send attachments larger than 1MB. This is not a problem for smaller sites that do not change very often, but is a problem for large sites that contain a lot of content in the database and accrue a lot of uploaded files.

If you are getting this error or do not want to send your site files in an email attachment, simply remove the following line:

# email the GZIP files
mutt $EMAIL -a $TARGETPATH/${SITEDIR}_$NOWDATE.tar.gz -a $TARGETPATH/${DBNAME}_$NOWDATE.sql.gz -s "FULL Backup for $SITEDIR"

When this line is removed, the script will only backup the site archives to the web server. To satisfy the "two locations" backup rule, you must log into FileZilla or your FTP client and manually download the site archives. You can choose how frequently you want to download the new archives and purge the old.

Comments and suggestions are welcome.