Migrating SVN to Git

Overview

SVN migrations to Git can vary in complexity, depending on how old the repository is and how many branches were created and merged, as well as if you use regular SVN or a facade such as SVK to do your branching and merging. If you have a fairly new repository and the standard setup of a trunk, branches, and tags directory, your job could be pretty easy. However if you’ve done a ton of branching and merging, or your repository follows a non-standard directory setup, or that setup changed over time, you could have a bit more work on your hands.

SVN to Git Conversion

Building Git

The very first thing to do is make sure you are running the newest git. And for this I recommend from source. So grab it and do the usual compile stuff

$ wget http://kernel.org/pub/software/scm/git/git-1.6.5.2.tar.bz2
$ tar xjf git-1.6.5.2.tar.bz2 
$ cd git-1.6.5.2
$ ./configure
$ make

Now I’m not sure if this is because I am always using local::lib but I tend to get this error after a while of building. If you don’t get it you can obviously skip this step

Only one of PREFIX or INSTALL_BASE can be given

This is fixable by going into the git/perl directory and manually building the perl modules

$ cd perl/
$ perl Makefile.PL 

Then go back up a directory and finish the build

$ cd ..
$ make
$ sudo make install

Using git-svn

If you have made any merges using SVK then you should grab the git-svn from Sam Vilain’s git branch here. I had to download it because the git clone wasn’t working so grab it like so

$ wget http://download.github.com/samv-git-ccef8e0.tar.gz

Open it up, and copy the git-svn.perl script into your bin directory

$ tar xzvf samv-git-ccef8e0.tar.gz
$ cd samv-git-ccef8e0
$ cp git-svn.perl ~/bin/git-svn

Now you can git svn clone your repository with the git-svn command. The –prefix=svn/ is necessary because otherwise the tools can’t tell apart SVN revisions from imported ones. If you are using the standard trunk, branches, tags layout you’ll just put –stdlayout like we have below. However if you had something different you may have to pass the –trunk, –branches, and –tags in to identify what is what. For example if your repository structure was trunk/companydir and you branched that instead of trunk, you would probably want ‘–trunk=trunk/companydir –branches=branches‘. You can optionally provide an authors file, and the last option to our script is the repository URL.

$ git-svn clone --prefix=svn/ --stdlayout --authors-file=authors.txt http://example.com/svn

This can take a few minutes or as long as a few hours depending how big your repository is. When its done you should end up with a git checkout of your repository. If you look at your remote branches you’ll see that its ported our svn stuff with the svn prefix we used

$ git branch -r
  svn/dbic_versioning
  svn/feed_source
  svn/trunk

Anyway at this point it doesn’t hurt to create a tarball of your project so you can get back to it later if you screw up.

Fixing the Git Repository

There are a few more scripts you need to grab to finish this off. We want to clone the git-svn-abandon repository for these.

$ git clone git://github.com/nothingmuch/git-svn-abandon.git 

Now go into that directory and copy the new scripts into your local bin directory, or wherever you feel is good for your purposes

$ cd git-svn-abandon
$ cp git-svn-abandon-* ~/bin/

You can now run git-svn-abandon-fix-refs, which will run through all the imported refs, recreating properly dated annotated tags, and makes branches out of everything else. It’ll also rename trunk to master.

$ git-svn-abandon-fix-refs

We can see that it took the things that our git branch -r command earlier listed and imported them into actual git branches, as well as putting us on the master branch which is analogous to the trunk from svn

$ git branch
  dbic_versioning
  feed_source
* master

Grafting Your Branches

Depending how correct you want your repository, you could skip this step. However it could be a good idea to try to get your merge history correct. This requires a little bit of understanding of the git internals, so at a minimum you should read and understand this introduction. There is also a nice presentation of how git works here that you may consider watching.

So now you understand (you went over those links right?) that the way git figures out merges is such that it basically creates a new node on the graph that points to both the head of master and of the branch. And then it basically takes the file contents from the 3 points, the new merge base node (the ancestor), and each parent, deltafies them and applies the result to the ancestor. Obviously if they don’t apply cleanly a conflict is created and must be resolved. The previous revisions contents is no longer relevant at all, except as a part of history, and our new snapshot of the history is born. So this is basically what we have to fix.

This is a snapshot from gitk of my repository. We can see near the bottom that it branched properly, but even though the repository was merged in svn, it doesn’t appear to be merged in our migration to git.

The way to fix this is to edit the .git/info/grafts file. With my great photoshop skills I’m going to point out what goes in there

We need to take what the first arrow is pointing to, and tell git that its parents are its current parent (the third arrow) as well as the point the branch was merged in (the second arrow). So all we have to do is add the hash from each of those commits, put the top node first, then the second and third with a space in between each to the .git/info/grafts file.

After we add those to the .git/info/grafts file we can just reload gitk or gitx to see those changes that visually show the branch was merged

When you are done you can run git-svn-abandon-cleanup which cleans up SVK style merge commit messages and removes git-svn-id strings. Another important thing that happens is the grafts entries are incorporated into the filtered commits, so the extra merge metadata becomes clonable

$ git svn-abandon-cleanup

Publish Your Repository

If you followed the last article you’ve got a gitosis setup that we can use to store our code centrally. So depending what host this is on you can import it like so

$ git remote add origin git@localhost:repository.git
$ git push --all
$ git push --tags

And we’re done.

Conclusion

We’ve gone over how to migrate an SVN repository to Git and how to deal with some of the complications of how Git interprets our repository based on what was in SVN and how to go about fixing that. Hopefully we also learned a little bit about Git too in the process.

References