#!/bin/sh
""""exec ${PYTHON:-python} -t $0 "$@";" """
# vim: filetype=python noexpandtab


__author__ = "Chad MILLER <" "rsnap" "@" "chad.org" ">"
__copyright__ = "Copyright (c)2008 Chad MILLER"
__license__ = "GPL v2"



# User-servicable
debug = False
iterator_fmt = "%04d"  # constructs names of new snapshot directories



__doc__ = """\
rsnap
=====

rsnap is a backup and snapshot utility based on rsync.

rsnap is available from as a Bazaar branch from 
http://web.chad.org/projects/rsnap/

To check out the most recent release, 
``$ bzr branch http://web.chad.org/projects/rsnap/``

Usage
=====

``rsnap N source[s] dest [-- rsync_options]``

Where N are the number of snapshots you want to keep, source the source
directories (local or remote) and dest the destination directory.

``rsnap 7 user@server.com:/home/user/ ./backup/``

If you have more than one source, then you probably shouldn't have slashes on
the end of any.

Run this command every day and you will have backups going back a week. The
backup directory will look like this:

-  backup/0000/ -- last backup
-  backup/0001/ -- previous backup
-  backup/0002/ -- even older backup
-     ..
-  backup/0006/ -- oldest backup 

Note that the oldest directory is 0006 and not 0007, because we started
counting at 0.

If you need to pass parameters to rsync, just append them after a single "--".

``rsnap 7 user@server.com:/home/user/ ./backup/ -- --exclude \*.o --bwlimit=8``

Why are snapshots useful?
=========================

Consider the following situation: You have set up a cronjob with rsync to
create a backup of your website to your local computer. You run this cronjob
every day to make sure you're not missing anything when you modify the website
on the remote server. But what happens if someone breaks into your site,
modifies it and you only notice that days, maybe weeks later? Or you rewrote
big chunks of your website and feel this was a mistake but you can't go back
because your backup already synchronized your changes and overwrote the old
version?

Thats where snapshots are useful (snapshots are sometimes also referred to as 
"checkpoints"). rsnap will not only save the state of your files of the last
backup, but it will also retain previous states - as many as you like. This
is done in an efficient way, so that only files that have changed will occupy
additional space. Even better, as rsnap is a thin wrapper around rsync which
adds snapshot rotating and creation, the changes you need to make to your
existing backup script are minimal.

If you are a developer, you might think of tools like CVS to be the answer to
the problem mentioned above. Now while these tools are more powerful, they
require you to learn a special syntax and they are generally more difficult to
use. rsnap provides an ad-hoc, general-purpose, zero-headache backup and
snapshotting tool which is extremely easy to use. Want to restore a backup
from two days ago? Just copy it from the subfolder to your current folder
using "cp". Want to see the differences between to snapshots? Just run a
"diff" over these directories.

When to use rsnap and when not
==============================

rsnap is useful if you have lots of small files which change infrequently,
like a website or maybe even source code. It is useful for Maildirs too, but
rsnap won't recognize, if a file has moved from new/ to cur/, so sometimes you
will be wasting space (ideas on how to optimize this behaviour are welcome).

If you have lots of big files that all change frequently, don't use rsnap. For
example you have lots of images or word files you are editing. Or you are
using mbox files. Every time the file changes (or even if only the file's
permissions change (this information is stored in the inode)), rsnap will
create a new copy of it. In such cases, you might consider using rdiff-backup,
which creates efficient binary deltas.

Do not use rsnap to create backups of your MySQL data directory. MySQL data
files tend to change frequently which leaves you with inconsistent (=unusable)
files. This is a limitation of rsync itself, not rsnap. Use mysqldump instead.

The idea behind rsnap came from an article by Mike Rubel, which can be found
at: 

  http://www.mikerubel.org/computers/rsync_snapshots/

The result of this article were probably thousands of tools, including this
one, which none of them I liked (see section "Similar tools").

The basic idea of creating snapshots using rsync is to use hardlinks. rsync
can do this automatically with the --link-dest option. For more details,
read the article :)

Requirements
============

* rsync >= 2.5.6

* python >= 2.4

* Ssh with no interactive authentication, if you are reading from or writing
  to a remote location.

* On the destination volume, ability to link two filenames to the same storage;
  also called "hard-link".

Installation
============

 $ ./setup.py install --prefix=/usr/local


Similar tools
=============

* Original rsnap: http://daniel.lorch.cc/projects/rsnap/


Tools that provide snapshots, but are not related to rsync:

* LVM: http://www.tldp.org/HOWTO/LVM-HOWTO/snapshotintro.html

  LVM allows to create consistent backups on block-level. All the changes
  you are making to the "current" Logical Volume propagate down to all
  snapshots you have created for it. So if you have created two snap-
  shots, you are using 3 times the space (one for the active, two for
  the snapshots) and evidentially you have 3 times as much writes on
  your disk.

  Snapshots could be used in conjunction with rsnap, though, to create
  a consistent view from your drive.

* rdiff-backup: http://www.nongnu.org/rdiff-backup/

  rdiff-backup uses rdiff to create efficient binary deltas. Restoring the
  last backup is easy, as you can just copy the files out of the backup
  directory. Restoring previous backups requires to use rdiff-backup, which
  reconstructs the previous state using the deltas it created (and which
  were stored in a subdirectory called rdiff-backup-data). Actually a very
  nice tool, it just didn't handle errors gracefully and I ended up having
  to manually intervene very often. Also restoring files is not as intui-
  tive as it could be.

* CVS / SVN / any other Source Control Management tool

  Very powerful and recommended if you are managing source code. Not neces-
  sarily useful for everyday-use though (IMO).

Tools based around the documentation of Mike Rubel, using all rsync:

* rsync-incr: http://colas.nahaboo.net/software/rsync-incr/

  Of all the tools I tried, this is the one i liked most. It created sub-
  directories in a fashion I didn't like, though.

* rsnapshot: http://www.rsnapshot.org/

  rsnapshot requires to write a configuration file which contains all the
  targed hosts.

* ribs-backup: http://www.ribs-backup.org/

  suffers from same problem than rsnapshot, tries to be all-in-one-solution.

* dirvish: http://www.dirvish.org/

  same problem
"""





# Black box beneath here, unless you know what you're doing.  Beware.






import sys
import os
import subprocess

# Don't change these
shell_true = 0
test_file, test_dir, rem_dir, rem_file, rename, make_dir, fingerprint, get_available_blocks = range(8)
fingerprint_file = ".rsnap_prot0"


def rpc(host, command, *parameters):
	if command == test_file:
		do = ("test", "-f") + parameters
	elif command == test_dir:
		do = ("test", "-d") + parameters
	elif command == rem_file:
		do = ("rm", "-f") + parameters
	elif command == rem_dir:
		do = ("rm", "-r", "-f") + parameters
	elif command == make_dir:
		do = ("mkdir", "-p") + parameters
	elif command == rename:
		do = ("mv", "-f") + parameters
	elif command == fingerprint:
		do = ("touch",) + parameters
	elif command == get_available_blocks:
		if host is None:
			do = ("python", "-c", "import os, statvfs; l=os.statvfs(%r); print l[statvfs.F_BAVAIL]" % parameters,)
		else:
			# ssh removes one level of shell interpretation
			do = ("python -c import\\ os,\\ statvfs\\;\\ l=os.statvfs\\(\\'%s\\'\\)\\;\\ print\\ l[statvfs.F_BAVAIL]" % parameters[0].replace(" ", "\\ ").replace("'", "\\'"),)
	else:
		raise NotImplementedError, command

	if debug: print host, do, parameters

	if host is not None:
		doing = subprocess.Popen(("ssh", "-oControlMaster=no", "-oForwardX11=no", "-oForwardAgent=no", "-oClearAllForwardings=yes", "-oProtocol=2", "-oNoHostAuthenticationForLocalhost=yes", "-q", "-t", "-t", host) + do, stdout=subprocess.PIPE, stderr=subprocess.PIPE, stdin=subprocess.PIPE)
	else:
		doing = subprocess.Popen(do, stdout=subprocess.PIPE, stderr=subprocess.PIPE, stdin=subprocess.PIPE)

	stdout, stderr = [], []

	for out_line in doing.stdout:
		if debug and stdout != "":
			print "stdout:", repr(stdout)
		stdout.append(out_line)

	quash_errors = ('tcgetattr: Inappropriate ioctl for device\n', 'Pseudo-terminal will not be allocated because stdin is not a terminal.\r\n', '\n', '')

	for err_line in doing.stderr:
		if err_line not in quash_errors:
			print "stderr line:", repr(err_line)
		stderr.append(err_line)

	return doing.wait(), stdout, stderr

def require(successful_p, requisite, any_values=[0]):
	if debug: print "Testing return value:  any of %r is success; returned %d" % (any_values, successful_p)
	if successful_p not in any_values:
		raise BackupError(requisite + " (val=%d)" % (successful_p,))

class UsageError(Exception):
	def __init__(self, msg):
		self.msg = msg

class BackupError(Exception):
	def __init__(self, msg):
		self.msg = msg


def read_config_file():
	raise UsageError("no command line parameters and config file not yet supported")


def main(argv=None):
	if argv is None:
		argv = sys.argv

	try:
		try:
			seperator_number = sys.argv.index("--")
		except ValueError:
			seperator_number = len(argv) 

		if seperator_number == 1:
			retention, sources, destination_specification, rsync_options = read_config_file()
		else:
			try:
				retention = int(argv[1])
				sources = argv[2:seperator_number-1]
				destination_specification = argv[seperator_number-1]
				rsync_options = argv[seperator_number+1:]
			except:
				raise UsageError("wrong parameters")
	except UsageError, e:
		print >>sys.stderr, "Usage: rsnap number sources... destination [-- rsync_options*]"
		print >>sys.stderr, e.msg
		return 2

	for rsync_option in rsync_options:
		if rsync_option == "--compress" or rsync_option == "-z" or rsync_option.startswith("--compress-level="):
			print >>sys.stderr, "Compression setting is handled automatically in rsnap.  It's probably unwise to override it."
			break

	# verify version of rsync
	rsync = subprocess.Popen(["rsync", "--help"], stderr=subprocess.PIPE, shell=True)
	rsync.wait()
	if "link-dest" not in rsync.stderr.read():
		print >>sys.stderr, "Error: rsync does not support --link-dest option; consider upgrading to >=2.5.6"
		return 1


	# test if ionice is installed
	ionice = subprocess.Popen(["ionice", "-h"], stderr=subprocess.PIPE, stdout=None, shell=True)
	if ionice.wait() == shell_true:
		nicely = ("ionice", "-n", "7")
	else:
		nicely = ()


	# check sanity of options
	for source in sources:
		if source.endswith("/") and len(sources) > 1:
			print >>sys.stderr, "Warning: %r source should probably NOT end with a slash" % (source,)

	# check whether we're working remotely
	if ":" in destination_specification and not destination_specification.startswith(":"):
		dest_host, dest_path = destination_specification.split(":", 1)
		run_prefix = [ "ssh", dest_host]
	else:
		dest_host = None
		dest_path = destination_specification
		run_prefix = []

	assert dest_path.startswith("/"), "Relative paths not allowed for destination"  # just being paranoid.  Perhaps unnecessary

	require(rpc(dest_host, test_dir, dest_path.rstrip("/"))[0], "destination container directory %r must exist" % (dest_path,))

	g_a_b = rpc(dest_host, get_available_blocks, dest_path.rstrip("/"))
	try:
		free_space_at_start = int(" ".join(g_a_b[1]))
	except ValueError:
		free_space_at_start = None
		print >>sys.stderr, "(Couldn't get block info: %r)" % (g_a_b,)
		

	if rpc(dest_host, test_dir, ("%s/"+iterator_fmt) % (dest_path.rstrip("/"), 0))[0] == shell_true:
		# there exists the smallest number, so we must move things up to make room for the next.

		if rpc(dest_host, test_dir, ("%s/"+iterator_fmt) % (dest_path.rstrip("/"), retention))[0] == shell_true:
			# there exists the biggest number, so we should expire it.
			require(rpc(dest_host, test_file, ("%s/"+iterator_fmt+"/%s") % (dest_path.rstrip("/"), retention, fingerprint_file))[0], "can't expire.  must not remove directories not created by us")
			require(rpc(dest_host, rem_dir, ("%s/"+iterator_fmt) % (dest_path.rstrip("/"), retention))[0], "oldest destination directory couldn't be removed")

		for n in range(retention, 0, -1):
			# rotate each number upward
			old = ("%s/"+iterator_fmt) % (dest_path.rstrip("/"), n-1)
			new = ("%s/"+iterator_fmt) % (dest_path.rstrip("/"), n)
			rpc(dest_host, rename, old, new)[0]

	try:
		tmp = "%s/%s" % (dest_path.rstrip("/"), "tmp")
		saved = ("%s/"+iterator_fmt) % (dest_path.rstrip("/"), 0)

		require(rpc(dest_host, make_dir, tmp)[0], "new destination dir must be created")

		full_fingerprint_path = "%s/%s" % (tmp, fingerprint_file)
		try:
			require(rpc(dest_host, fingerprint, full_fingerprint_path)[0], "must be able to write a fingerprint file")

			command = []
			command.extend(nicely)
			command.extend(("rsync", "--archive", "--numeric-ids", ("--link-dest=../"+iterator_fmt) % (1,)))
			if dest_host is not None:
				command.append("--compress")
			command.extend(rsync_options)
			command.extend(sources)
			command.append(destination_specification + "/tmp")

			rsync = subprocess.Popen(command)
			require(rsync.wait(), "rsync did not complete successfully", any_values=[0, 23, 24])  # no err, partial with error, partial with vanish

		except:
			print >>sys.stderr, "(Removing fingerprint file)"
			rpc(dest_host, rem_file, full_fingerprint_path)[0]
			raise

		require(rpc(dest_host, rename, tmp, saved)[0], "can't rename 'tmp' location into place, as " + iterator_fmt % (0,))

		
		if free_space_at_start is not None:
			free_space_at_end = int(" ".join(rpc(dest_host, get_available_blocks, dest_path.rstrip("/"))[1]))

			space_consumed_this_run = free_space_at_start - free_space_at_end 
			if space_consumed_this_run * retention > free_space_at_end:
				remaining_prediction = free_space_at_end / space_consumed_this_run
				print >>sys.stderr, "Warning: There may not be enough space for %d snapshots." % (retention,)
				print >>sys.stderr, "         Capacity at this rate:  %d more instances" % (remaining_prediction,)

	except:
		print >>sys.stderr, "(Removing tmp directory, where we were storing this backup)"
		rpc(dest_host, rem_dir, "%s/%s" % (dest_path.rstrip("/"), "tmp"))[0]
		raise


if __name__ == "__main__":
	try:
		sys.exit(main())
	except BackupError, e:
		print >>sys.stderr, e.msg
		sys.exit(1)

