Backing Up and Migrating an SVN Repository

To forestall the question: it's a legacy SVN repository, it's huge, and yes, we plan to migrate to git ... but it's proven difficult. I had to learn how to pack it up and move it - as SVN.

This isn't the only way to move an SVN repo to another server: it would be easier and much faster to simply bundle up /var/svn/repos/coderepo (or wherever your repository resides on the server) and move it to the new machine ... but if you're changing the SVN server version in the process, that will likely NOT work. While this is slow, it's an immensely more portable method.

Backup

What I'm retrieving here is just the repository: no ACLs, no hooks, just the commits. All of the backup process can be done on any client machine with a Unix-like command line. What follows is explanation of the pieces I use to generate a Bash script (at the end of the entry) to automatically do the backup that's done by hand in the first section.

Because I write scripts with the intent that they should be easily repeatable, let's set a target URI for the repository first:

$ uri="svn://oldrepo.example.com/svn/coderepo/"

The first thing I like to know about a repository (at least if I'm backing it up) is the current value of HEAD (if you know git, remember that SVN revision numbers are actual sequential numbers, not hashes):

$ head=$(svn info -r HEAD --show-item revision "${uri}")

I was dealing with a repo containing 27000 commits.

The next building block is the svnrdump command, which allows you to talk to an SVN server remotely.

$ svnrdump dump "${uri}" -r 0:5000 > coderepo-00000-05000.svncommits

If you're wondering about the odd-looking use of leading zeroes in the output filename, it's so the files sort correctly later. This may be useful during the restore, and also makes it easier to see what dumps you've created so far as they sort in order.

How long this command takes will vary wildly depending on so many things (the processor on the local machine, the processor on the remote, the network speed, the size of the committed files, probably many other factors) that I hesitate to even tell you how long it took for me. With all that in mind, the 5000-commit dumps I was generating averaged about 1.5G in size (trunk is ~5G in size) and took about 10 minutes.

The above command can also be run on the SVN server itself, although if you do that you should use svnadmin dump <local-path-to-repo> -r 0:5000 > coderepo-00000-05000.svncommits - if you want to use this, read up on it, I'm not going to go into detail.

You'll need to repeat this process until you reach your repository's HEAD:

$ svnrdump dump "${uri}" -r  5001:10000 --incremental > coderepo-05001-10000.svncommits
...
$ svnrdump dump "${uri}" -r 10001:15000 --incremental > coderepo-10001-15000.svncommits
...
$ svnrdump dump "${uri}" -r 15001:20000 --incremental > coderepo-15001-20000.svncommits
...
$ svnrdump dump "${uri}" -r 20001:25000 --incremental > coderepo-20001-25000.svncommits
...
$ svnrdump dump "${uri}" -r 25001:27000 --incremental > coderepo-25001-27000.svncommits
...

Again, this will take a long time. 27000 was HEAD for us: you'll have to manage your own revision chunking. Notice the use of --incremental - this should not be used on the first run with revision 0, and has to be used for all chunks after.

Restoration

Restoration - at least for me - was done on the server, and was kind of the reverse of the process above. I assume that you've copied all the dumps created previously into /root/ on the server machine. While the previous steps could be done as a user, this should be done as root:

# svnadmin create /var/svn/repos/coderepo
# svnadmin load   /var/svn/repos/coderepo < coderepo-00000-05000.svncommits
...
# svnadmin load   /var/svn/repos/coderepo < coderepo-05001-10000.svncommits
...
# svnadmin load   /var/svn/repos/coderepo < coderepo-10001-15000.svncommits
...
# svnadmin load   /var/svn/repos/coderepo < coderepo-15001-20000.svncommits
...
# svnadmin load   /var/svn/repos/coderepo < coderepo-20001-25000.svncommits
...
# svnadmin load   /var/svn/repos/coderepo < coderepo-25001-27000.svncommits
...

Again, this will take some time. For me, like the dumps, it took about 10 minutes each. (Since the leading zeroes on the commit numbers in the filenames mean that the files sort in correct order, you should be able to just say svnadmin load /var/svn/repos/coderepo < coderepo-*.svncommits ... although I admit I haven't tried that yet.)

Backup Script

Putting together all the parts of the backup process, this script automates all the calls to make a full set of incremental backups of a remote repository.

#!/bin/bash
#
# Created:  2019-05-17
# Purpose:
#     Remotely dump an entire SVN repository into suitable sized chunks for
#     backup and later reloading.

# the (default) leading part of the generated filenames - this can be
# changed at the commnd line:
preface="SVNIncrementalDump"

######################################################################
#                            Help
######################################################################

function help() {
    echo "Usage:"
    echo "    $(basename "${0}") [-h]"
    echo "    $(basename "${0}") [-i -u <SVN-URI>]"
    echo "    $(basename "${0}") [-s <count> [-p <word>] -u <SVN-URI>]"
    echo ""
    echo "Uses 'svnrdump' to remotely dump an SVN repo in chunks for backup"
    echo "and/or reloading."
    echo ""
    echo "-h            show this help and exit"
    echo "-i            show revision count ('info') about the given repo"
    echo "-p <name>     preface: changes leading part of generated filenames"
    echo "              from '${preface}' to <name>"
    echo "-s <count>    revision count per chunk"
    echo "-u <SVN-URI>  URI of the SVN repository"
}


######################################################################
#              is_uri, check incoming string
######################################################################

function is_uri() {
    # From https://stackoverflow.com/questions/3183444/check-for-valid-link-url
    # Updated for svn.
    # The document points out that this fails on non-Latin URIs, but the
    # solution is massive and ugly: not implementing now.
    # (should be called "is_latin_uri")
    regex='(https?|ftp|file|svn|svn+ssh)://[-A-Za-z0-9\+&@#/%?=~_|!:,.;]*[-A-Za-z0-9\+&@#/%=~_|]'
    string="${1}"
    if [[ $string =~ $regex ]]
    then
        echo "true"
    else
        echo "false"
    fi
}


######################################################################
#              Dump the repo as chunks of the given size
######################################################################

function dump_chunks() {
    # "start" and "finish" are the first and last rev numbers for the chunk
    # currently being created.

    chunksize=${1}
    uri="${2}"
    head=$(show_rev "${uri}")
    # We need to know the length of head so we can zero-pad all revs to the
    # same length - the "+1" adds room for expansion:
    (( headCharCount = ${#head} + 1 ))

    (( start = 0 ))
    (( finish = chunksize ))

    while (( head > start ))
    do
        # check finish not greater than head, reset to head if it is
        if (( finish > head ))
        then
            (( finish = head ))
        fi
        if (( start == 0 ))
        then
            INCREMENTAL=""
        else
            INCREMENTAL=" --incremental "
        fi
        zeroPaddedStart=$(  printf "%0${headCharCount}d\n" ${start}  )
        zeroPaddedFinish=$( printf "%0${headCharCount}d\n" ${finish} )
        # make chunk
        outputName=${preface}.${zeroPaddedStart}-${zeroPaddedFinish}
        svnrdump dump "${uri}" -r ${start}:${finish} ${INCREMENTAL} > ${outputName}
        # reset start and finish
        (( start = finish ))
        (( finish = finish + chunksize ))
        # increment start by 1
        (( start = start + 1 ))
    done
}


######################################################################
#                    Show revision number for the repo
######################################################################

function show_rev() {
    uri="${1}"
    svn info -r HEAD --show-item revision "${uri}"
}


######################################################################
#                    Process the command line
######################################################################

if [ $# -lt 1 ]
then
    help
    exit 1
fi

INFOWANTED="false"
PREFACEPROVIDED="false"
URIPROVIDED="false"
COUNTPROVIDED="false"

# http://wiki.bash-hackers.org/howto/getopts_tutorial
while getopts :hip:s:u: opt
do
    case "${opt}" in
        h)
            help
            exit 0
            ;;
        i)
            INFOWANTED="true"
            ;;
        p)
            preface="${OPTARG}"
            PREFACEPROVIDED="true"
            ;;
        s)
            count="${OPTARG}"
            if ! [[ ${count} =~ ^-?[0-9]+$ ]]
            then
                echo "Parameter must be an integer."
                help
                exit 2
            fi
            COUNTPROVIDED="true"
            ;;
            ;;
        u)
            uri="${OPTARG}"
            URIPROVIDED="true"
            ;;
        \?)
            echo "invalid option: -${OPTARG}" >&2
            help
            exit 1
            ;;
        :)
            echo "option -${OPTARG} requires an argument." >&2
            help
            exit 1
            ;;
    esac
done
shift $((OPTIND-1))

# The option processing below hasn't been well tested and may be buggy ...

if [ "${URIPROVIDED}" == "true" ]
then
    if [ $(is_uri "${uri}") == "true" ]
    then
        echo "URI OK"
    else
        echo "URI is not in a recognized format."
        exit 2
    fi
fi

if [ "${INFOWANTED}" == "true" ] && [ "${URIPROVIDED}" == "true" ]
then
    show_rev "${uri}"
    exit 0
fi

if [ "${INFOWANTED}" == "true" ] && [ "${URIPROVIDED}" == "false" ]
then
    echo "An info request requires a URI provided."
    help
    exit 1
fi

if [ "${COUNTPROVIDED}" == "true" ] && [ "${URIPROVIDED}" == "true" ]
then
    dump_chunks ${count} "${uri}"
    exit 0
fi

Run the script to see the help and understand the options. The most important are -u <uri> and -s <count> for the number of revisions per chunk. (I don't know why I used "s" ... "c" would have made so much more sense ...)

I hope this helps some poor sucker stuck with an old SVN repo. Good luck.

Bibliography

  • http://svnbook.red-bean.com/ - this is the bible, read it (it's not perfect, but it's damn good - especially considering they give it away for free)
  • I haven't tried the --deltas option, which claims to reduce dump size