The Blog

How to use svnsync to create a mirror backup of your Subversion repository

Whenever I talk to another developer and find out they’re not using version control (a.k.a. Source Code Management system, or SCM) as part of their workflow, I become a little shocked and horrified. There are just too many great reasons for using version control. Both Git and Subversion are free to use, relatively simple to set up, and give you snapshots to go back to anytime you break something in your code. An SCM is indispensable for any team of more than one developer, but it’s just as useful if you’re on your own.

Tons of developers love Git, and although Git does have some really great features when compared to Subversion, there’s one particular benefit to using Subversion that Git users rarely consider. That is, when you use Git to commit hundreds or even thousands of revisions to your local machine… what happens if your hard drive crashes? Unless you also have set up a remote repository and get familiar with pull requests and merges — Git actually requires a little more effort to get this big benefit — and it’s built into Subversion by design.

One of the primary benefits of using Subversion as your SCM solution is that it’s like having an insurance policy against your local machine breaking, your laptop being stolen, or your hard drive crashing. Every time you commit, you’re sending that code to another server. Granted you can do the same thing with a master Git repository on Github.com or your own server in the cloud, but unlike Git, Subversion is meant to be run somewhere other than your laptop or workstation.

To extend the insurance metaphor, as great as it is having the code on your laptop backed up to a central Subversion server, it makes just as much sense to protect yourself from the possibility that your Subversion server’s hard drive will crash.

The best way to do this is to create a mirror repository on another server, and use the svnsync program to create a replica of your primary Subversion repository, or repositories, if you have multiple. If you’re familiar with master-slave database replication, Subversion repository replication is quite similar. Following are the steps and a few gotchas I learned this week as I finally had a chance to set up a proper Subversion mirror replication slave server.

Step One: Set up Subversion on a 2nd Server

This guide assumes you’re already running Subversion on one server. We’ll call that source-server from now on. The first step is to set up a second Subversion server to be used as a mirror. We’ll call that mirror-server from here on out.

What’s interesting to note here is that unlike most database replication setups, with Subversion, it doesn’t matter too much what version the server is, or what platform you want to run it on. Unfortunately, I didn’t realize that when I started. I purposely downloaded an older version of VisualSVN 2.06 bundled with Subversion 1.6.17, and later found out I would’ve been better off running the latest VisualSVN 2.5.3 bundled with Subversion 1.7.3. Why? As of Subversion 1.7, you can now use svnsync with the new --allow-non-empty option, which is designed for exactly the situation of starting to sync a mirror when it already has content in it. More details in Chapter 9: Subversion Repository Mirroring in the SVN Book.

In our case, we have our source-server repository running Subversion 1.6.6 on Ubuntu Server 10.04 LTS, and I set up the free version of VisualSVN 2.06 (bundled with Subversion 1.6x) on a Windows 7 PC.  Whether you install from source or from a binary, it’s important to at least know what version of Subversion each server is running.

Step Two: Dump the Source Repository

Unless you’re starting fresh with both an empty source-repo and an empty mirror-repo, chances are you have lots of commits on your current source-repo. You can actually “play back” these changes on the mirror-repo starting from revision 0 forward, but in most cases it’s faster to dump the source and load it onto the master.

#> svnadmin dump http://source-server/svn/source-repo > source-repo.dump
#> tar czf source-repo.tgz source-repo.dump

Note for Subversion 1.6 and below: Unfortunately, the svnsync program has a limitation in that it assumes you’re starting your mirror-repo from revision 0. This is fine if your repository is small with only a few revisions, but can be quite slow if the reverse is true. Anyway, we’ll be performing some manual tweaks to our mirror-repo later on using this dump-and-load technique. If you’re already running Subversion 1.7 or greater, they’ve added a new feature to circumvent this limitation.

Step Three: Create the Mirror Repository

Most guides to using svnsync warn you to never commit to the mirror repository. The reason for this is that you only ever want your replication user (syncuser) to make changes. Otherwise you risk breaking replication on the slave. On the mirror-server open up a terminal or command prompt and type:

#> cd /svn/repos
#> svnadmin create mirror-repo

How to create a new repository using VisualSVN on WindowsIn my case, using VisualSVN makes this incredibly simple. Just click to the administrative interface and right-click on Repositories container create a new repository. Don’t check the box to create branches, tags and trunk! Then add a new user, syncuser. This is the only user that will need access to the mirror-server repositories. More on this a little later.

Now at this point, we have an empty mirror-repo at revision 0. Here’s the first gotcha. The svnsync program needs to store some special properties about its own syncronization activities. It does this by setting some properties on the repository at -r 0. In order to do this there has to be a valid pre-revision property change hook on the repository that calls exit 0. Some tutorials have you simply add the line exit 0 to your script, but I would recommend against this approach because it leaves the door open for some other user to modify properties and muck up the works. This hook is a perfect place to put your check that only syncuser is allowed to do things. Here’s the script I used for the pre-revision property change hook on Windows:

IF "%3" == "syncuser" (goto :label1) else (echo "Only syncuser may change revision properties" >&2 )
exit 1
goto :eof
:label1
exit 0

You might get errors such as svnsync: DAV request failed, svnsync: Error setting property 'sync-lock' could not remove a property  if you forget this step. It took me quite a while to come up with the above on Windows — the vast majority of samples online for pre-revision hooks are bash scripts.

#> cd /svn/repos
#> tar xzf source-repo.tgz

Step Four: Load the source repository

This is where these directions differ from most of what you’ll find online. Typically, the next step you normally see in other tutorials is to start the synchronize process with

#> svnsync init X:Repositoriesmirror-repo http://source-server/svn/source-repo

I tried that first, and it did work fine, but I could tell shortly that it would take a very, very long time to sync from -r 0 to HEAD over the network. I subsequently got some extra advice on the Subversion user’s mailing list to perform the sync on a local repository, but the technique I describe here works just as well.

We can import our dump file to our mirror repository with the svnadmin load command as follows:

#> svnadmin load mirror-repo < source-repo.dump

This command can take a while to run, proportional to the size and number of commits in your svn dump file.

Step Five: Manually Set Sync Properties

Now we have two repositories that are exact copies of each other, but they aren’t yet synchronized in a master-slave configuration, and they’re not automatically syncing just yet. If you try to start svnsync initialize command now, you’ll get the following error:

svnsync: Cannot initialize a repository with content in it

This can be really frustrating if you’ve never used svnsync before. As I said earlier, the svnsync program expects to be initialized on an empty repository at revision 0, to play the revision history forward from there. In this case, our repository has thousands of commits in it already, and we want to start up sync from the current revision forward.

To do this, we have to understand what happens when calling svnsync initialize. What happens is the svnsync program creates three special properties at -r 0, for tracking its own syncing activities. These can be seen on an actively mirrored subversion repository with the svn proplist command.

#> svn proplist --revprop -r0 http://mirror-server/svn/mirror-repo
Unversioned properties on revision 0:
svn:sync-from-uuid
svn:sync-last-merged-rev
svn:date
svn:sync-from-url

You can ignore svn:date; only the svn:sync* properties are relevant to syncing. Okay, now that we know what the unversioned properties on -r 0 are, we’re going to hack our own values into those properties, using the svn propset command. We’ll take these one at a time.

To set the svn:sync-from-uuid property by hand, we need to find out the UUID of the source-server’s source-repo, with

#> svn info http://source-server/svn/source-repo
Authentication realm: <http://localhost:80> Subversion Repository
Password for 'yourusername':
Path: source-repo
URL: http://localhost/svn/source-repo
Repository Root: http://localhost/svn/zupper.com.br
Repository UUID: 9d96f4c0-7d9a-42f6-b8c8-54e79b961fad
Revision: 3738
Node Kind: directory
Last Changed Author: jsmith
Last Changed Rev: 3738
Last Changed Date: 2012-03-01 16:38:38 -0700 (Thu, 01 Mar 2012)

Okay, there we can see it in the output, so copy and paste it — you don’t want to type that. Back on mirror-server we can now issue this command:

#> svn propset --revprop -r0 svn:sync-from-uuid 9d96f4c0-7d9a-42f6-b8c8-54e79b961fad
property 'svn:sync-from-uuid' set on repository revision 0

That response means it worked. Okay, next, we can set the last-merged-rev, or the revision that was last merged. To be safe, you should check the current revision number of both repositories, and use the lower of the two, probably your mirror-repo, which would indicate that someone has already committed new code on source-repo.

#> svn propset --revprop -r0 svn:sync-last-merged-rev 3738 http://mirror-server/svn/mirror-repo
property 'svn:sync-last-merged-rev' set on repository revision 0

Again, a successful response. Next, we need to set the source URL on the mirror repository using

#> svn propset --revprop -r0 svn:sync-from-url http://source-server/svn/source-repo
property 'svn:sync-from-url' set on repository revision 0

Great, now we’re ready to tell our Subversion mirror to sync:

#> svnsync synchronize http://mirror-server/svn/mirror-repo
Transmitting file data .
Committed revision 3739.
Copied properties for revision 3739.

You may not see a confirmation message exactly like mine… in my case it just means that the mirror was able to fetch 1 new change from source-repo.

Last Step: Automate synchronization

Now that we have two subversion repositories mirrored, we need to add a post-commit hook on our source-repo that pushes commits to the mirror. This is done by editing the repository’s post-commit hook.  On the source-server

#> sudo vi /svn/repositories/source-repo/hooks/post-commit
svnsync --non-interactive --username syncuser --password XXXXXXX sync http://mirror-server/svn/mirror-repo/ &amp;

That should be it. Commit some code as normal (to source-repo), then browse to your mirror-repo or do an svn info on it to make sure your commit made it over to mirror-server. If so, congratulations! You’ve just completed this tutorial and are twice as safe from Subversion hard drive failure as you were before.

One obvious security concern in the example above is you probably aren’t going to store the syncuser’s password in the post-commit hook. It does not need to actually be placed in clear text in your post-commit hook file, I just wanted to show that to make the point that your source-server has to be able to see your mirror-server and have the syncuser password hashed or stored. It’s not a big deal in our case, since our repos are on the LAN and nobody can fiddle without access to the box. In any case, there’s a variety of methods out there to conceal your subversion password. Storing encrypted passwords on Ubuntu Server without Gnome Keyring… now that’s a whole other story.

  • Yuriy

    a very detailed article. I followed the steps and found some option of svnsync to avoid manual property assignment:
    svnadmin load < dump
    svnsync initialize –allow-non-empty

    then sync goes seamlessly without manual intervention.
    Works at least for SVN version 1.7.9

  • Grzegorz Szyszło

    What for you do that? :

    # svnadmin dump http://source-server/svn/source-repo > source-repo.dump
    # tar czf source-repo.tgz source-repo.dump

    at first, tar is only balast when idea is compressing only one file. second, you take place by uncompressed dump. and third, you waste hard drive performance by temporarly store big uncompressed archive. better use this with pipe, saving place and performance, and finally increase operation speed. you can use this method for backup svn. simply do this example:

    # svnadmin dump http://source-server/svn/source-repo | gzip > source-repo.dump.gz

    And when restore repo, you should recover like this:

    # zcat source-repo.dump.gz | svnadmin load mirror-repo

    • Noggin182

      The idea is that you run svnsync in your post commit hook. This means every commit will automatically be synced to your other repository. It’s incremental and your backup is also on a remote server. Using svn dump is a nice way to backup your entire repository as a snapshot, but it isn’t ideal for backing up every commit in real time. You could dump each revision separately, but then you’d have to manage copying them to a remote server yourself and you’ll have to restore one dump at time. Doing a full dump on each commit takes far too long.

      They are designed for different things. dump is great for taking a snapshot. svnsync is great for mirroring.

      • Grzegorz Szyszło

        Unfortunately this is bad idea. SVN has his own method for syncing remote site. This is more effective with comparsion to rsync. Rsync is bad because this takes disk space twice and must allways scan all dump. But internally svn sync bases on his own versioning. Then remote repository remember all revisions. Only diffs between revisions are transferred to remote site. Of course you can compress stream because you can sync by http, https or by ssh. Those three protocols have got ability for compression on the fly. Of course you should protect remote repo integrity. This repo should be write protected. This is configurable by hook, examples are available in many docs.

        • Noggin182

          “Unfortunately this is bad idea. SVN has his own method for syncing remote site”

          I think you might be confused. SVN does have it’s own method, and it’s called svnsync, the exact thing that the author and myself are saying you should use.

          • Grzegorz Szyszło

            I don’t want to argue, it’s non sense. summing:
            1. svnsync is part of svn
            2. using dumping without compressing, rsyncing with network compression protocol and restoring you:
            a. waste disk space and his I/O by making temporary copy, and full copy scan by rsync (it must calculate md5 for file parts)
            b. partially waste network transmission
            3. using svnsync with network compression protocol you:
            a. save disk space, on source and destination machine you have got only active copy without dump. then you save I/O because you transmit content directly
            b. save network transmission because you transmit only revision diffs. It’s very fast
            c. you have got last or previous last actve copy for RO access allways available.

            About article, its very nice :) I tested this but with low level fast compression on the fly for disk space and I/O saving. Then it is about 5%-10% faster than complete svnsync starting from revision 0. Then I decided this is not usable for me. But this is usable for making reggular compact backup on tape. I make dump directly to tape, or directly to remote ssh server using svnadmin dump | bzip2 -3 | ssh someuser@remotehost -C ‘cat /somewhereexternal/backupfile.gz’ . Notice, encrypt compressed stream make less cpu utilization to encrypt uncompressed stream.

    • James Parks

      Yes, Grzegorz, for the initial copy to the mirror, you are right, you can save some space and there’s no need for the additional step or for tar.

      You could also do something like this from the mirror to make the whole thing happen in one go:

      mirror-server: ssh source-server “svnadmin dump /svn/somerepo | gzip -c” | gunzip | svnadmin load /svn/somerepo

      That way, the data stream is gzipped on the server side before ever hitting the wire, and gunzipped on the fly as it’s loaded straight into the target repo on the mirror. That takes no disk space on the source server at all and doesn’t require an intermediate file on the target, either.

  • Amber Garg

    Hi Geoffrey,

    To automate svnsync sync & synsync copy-revprops we need to schedule the script in crontab and yes than at what time interval it should be run?

    Thanks
    Amber

  • Qingbo Zhou

    Hello – Where did you get your svn? My svnadmin version 1.8.5 can only dump from local paths. But looks like you guys can dump from URLs, which is reassured in the comment by another user… Just curious are you really able to run that command on http://… repos?

  • Xicom

    Thanks,
    For sharing knowledgeable information about development. It’s a ideal information for beginners and development persons. which have intersting in developments.

    Good Luck.

  • Craig

    Anyone idiotic enough to think that Git only on their local machine will protect them will be idiotic enough to think that SVN only on their local machine will protect them. Like any other SCM, Git is not limited to working on the user’s local machine. I am curious to know where you got the notion that Git user’s think they should only have a repo on their local machine. Now, January 2013 was practically the stone age, so perhaps you hadn’t heard of things like Github, which are publicly and freely available examples of the fact that Git is not limited to working on the local machine.

    As for this great “benefit of SVN”, it’s simply something that is the driving principle of any centralised SCM, of which SVN is only one example. And it’s actually a limitation of SVN and other centralised SCM’s – the only way for multiple people to work on a single project in centralised SCM’s is to share a central, single point of failure repository which therefore, by necessity, has to be backed up regularly. The thing about something like Git is that even if the shared repo goes down, nothing is actually lost. As for that terrible burden of having to get knowledgeable about things like pull and merge, presumably you’ve only ever worked on projects that involve one person. Having to merge is part of multiple people working on the same project in any SCM. And SVN’s merge is close to useless, whereas Git’s merge is well known to be amazing and not limited to merging from two branches at a time.

    I’m currently saddled with using SVN and have used several centralised SCM’s over the years – SVN, Perforce, CVS and even the ever-awful VSS. And I’ve always found them to be fairly sucky to work with.

    You’ve written an entire article (which goes overboard. You could’ve stopped at SVN dump, compress the dump and copy it off the main server, preferably off-site at least once a week. We have ours dumping and compressing every day, verifying the history and copying off to a remote server. The only way we actually lose anything is if the central server, the remote server, every developer’s machine and all the backups that are taken off-site are lost. Mirroring on every commit is completely unnecessary unless the number of developers is large and the number of daily commits is large.

    Want to know how you back up a Git repository? Copy the folder to another location.

    • foam exstinguisher

      Flame boy, calm down.

  • JSLP

    If the source server is also using Visual SVN Server, is there any precaution? On the other hand, I’m not so sure how to “translate” those Linux script into Dos batch.
    On step five, it seems like syncing at the same revision is very important. So to make sure nobody could commit to the source repo while we are making manual sync? I mean, I can write a mail to every user to tell them not to commit, but human error can occur. Is there a way to prevent commit while still letting source and mirror servers to sync at first?
    Thanks

  • JSLP

    We have to assume there are more than one repo in the source server.
    In those “svn propset” command, don’t we need to specify the repos? What is the syntax exactly?
    I have tried the “svn propset” but that didn’t work. I’m not a command-line guys so I’m completely lost. Is there any other documents we need to understand PRIOR to follow this procedure? There seem to be a lot of “holes” in the procedure that I simply can’t see what to fill in.

  • JSLP

    I sort of succeeded in making a mirror server. But it seems to me that the mirror server is only read-only. If a user commits to it, it does not work.

  • carleeto

    “what happens if your hard drive crashes? Unless you also have set up a remote repository and get familiar with pull requests and merges — Git actually requires a little more effort to get this big benefit”

    No, it doesn’t. All you need is a backup of your .git folder sitting somewhere else. Remotes are one way of doing this. However, a simple backup of your .git folder means you really haven’t lost a lot of work. All that’s left to do is to replace your damaged .git folder with your backed up version and you’re in business. Sure, you might have lost the work that wasn’t backed up, but that’s a function of how periodic your backup is. Either way, you now have a working repository using a simple backup restore. No remotes, No merges.

    Finally, using hooks, you can trigger automated back ups of your .git folder so that it happens every time you commit. Now you have only lost unstaged work, which to me is a good enough trade off. Best practices state that when you’re done coding for the day you at least stash what you have.

    Still no remotes or merges in sight and you’re back in business in no time.


Cardinal Path Training

Copyright © 2014, All Rights Reserved. Privacy and Copyright Policies.