Skip to end of metadata
Go to start of metadata


These are notes taken during the investigation of how to move this project from a CVS repository to an SVN ( repository. It involves exploring the cvs2svn script at and practicing its use on a copy of the modules that are being considered for migration.

Modify the CVS module to be read-only
This is an optional precaution during practice. During the real move, this should be mandatory. It will prevent others from accidentally modifying the CVS module during the conversion or after it has been migrated to SVN.
There are a number of ways to achieve this. For me,

  • Make sure all the files and directories were owned by user "jwang" and group "SSG".
  • Make sure all the files and directories were writable only by user "jwang".

If anyone is aware of a more explicit CVS-way to accomplish this, I'd like to hear about it.

Install the cvs2svn script
This is described in detail at

Make a backup copy of the CVS module
This is the copy upon which the cvs2svn script will perform its operations. This is a good idea for what should be obvious reasons.
Again, there are a number of ways to achieve this. For me,

  • cp -dpR /home/cvsroot/hdbstat /path/to/tmp/hdbstat.cvs

For subsequent operations with cvs2svn, the "hdbstat.cvs" directory is used to refer to the copied CVS module directory.

Practice run with cvs2svn
Use the --dry-run switch of the cvs2svn script.

  • cvs2svn --dry-run -s /path/to/tmp/hdbstat --trunk=trunk --branches=branches --tags=tags /path/to/tmp/hdbstat.cvs

Or for more complicated conversions, use the options file.

  • cvs2svn --dry-run --options=my.options
  • Note that you don't need to pass -s because the repository name is specified within the options file; nor do you need to pass in the CVS repository name because these are specified by the ctx.add_project() blocks.

For HDBSTAT, this practice run revealed that the repository was in a "mildly corrupt" state (see FAQ at This is likely due to the direct manipulation of the repository when it was less well-understood that this is a bad idea. It is unclear to me whether any manipulation would cause this corruption or whether incorrect manipulation was the culprit.

Vinodh tells me he has worked out a set of circumstances under normal CVS operation that can cause this situation. Vinodh, if you could elaborate here, I would be grateful.
See: **

To fix the corruption, I followed the advice in the cvs2svn FAQ to remove one or more sets of files. I used find, xargs, cut, and standard error redirection at the shell to remove the files in each of the Attic/ directories in a semi-automated fashion. WARNING: Be careful here. The first script I wrote did not do the right thing. See me for details.

Real run with cvs2svn

  • cvs2svn -s /path/to/tmp/hdbstat --trunk=trunk --branches=branches --tags=tags /path/to/tmp/hdbstat.cvs
  • NOTE: This directory layout is only one of many possibilities. It follows the SVN book's recommendations to make branching and tagging more straightforward.

Or if you are using an options file to specify the conversion,

  • cvs2svn --options=my.options

Set up the newly created repository for access by the desired server type
This step will likely need to be coordinated with the sysadmin or SVN administrator for your server. For example, if the svnserve daemon will be used, then the newly created repository needs to be:

  1. Copied to a location where the svnserve daemon can see it.
  2. Updated so that the svnowner user has the right permissions.

Test the newly created SVN repository

  1. Use 'svn list' to browse the repository structure and make sure it is correct.
  2. Check out the module at the trunk.
    1. svn checkout svn:// hdbstat/client
    2. Assumes the svnserve daemon is running and is rooted at the right place.
  3. Check out the module at a particular branch.
    1. svn checkout svn:// hdbstat-1.5/client
    2. Assumes the svnserve daemon is running and is rooted at the right place.
  4. Compile.
  5. Run unit tests.
  6. Spot check the log messages for important files, paying attention to committer, commit dates, etc.
  7. Generate a changelog and make sure is "looks ok".
    1. Can't do this yet because the CvsChangelog ant task won't work with svn.
    3. There appears to be a not-yet-a-core-task SvnChangelog task available in antlib at This requires further investigation.
  8. Use the ViewVC webapp to explore further.
    1. Vinodh, I have some configuration questions for you. Please see me.
  9. Run the application.
    1. Not just opening up the application, but running one or more full data analyses that exercise the major (and hopefully, minor) components of the software.

Lessons Learned

One of the biggest lessons is that this process is not easy (except perhaps for much smaller codebases without many branches and tags). There are many decisions to be made, amongst them, server type, repository layout, what to migrate, IDE integration, ant task support, et cetera.

Within a set of projects that are grouped together, say, the HDBSTAT client, webapp, and black box tests, it may make sense to treat these as separate repositories because they evolve and exist independently. The jury is still out.

You may not want to checkout all the branches and tags. In CVS, you have to specify a very particular command line to get a tag or branch. If you don't specify a particular branch or tag you get the trunk. In SVN, if you checkout at the top level, you get EVERYTHING. In other words, with SVN, the URL you give to "svn checkout" requires more forethought than with CVS.

What's nice

One of the nice things I've noticed is that version control operations are now only authenticated if the operation needs authentication (e.g. if you use the svn:// scheme and not the svn+ssh:// scheme).

Binary files

Even though subversion claims that it does not (by default) modify things like line endings unless you ask it to do so, the cvs2svn script does not make such a claim and in fact, by default, it converts line endings to "native".
For those binary files properly marked as kb in CVS, this still appears to be a problem. I encountered this issue while converting Power Atlas, there are .gz files power_atlas/client/resources/soft_examples that are marked as kb, but were modified during the conversion.
I have not gathered enough evidence yet to produce a bug report, however, so far, I have used the --no-default-eol switch to force the behavior I want.
Here are some potentially related threads on the cvs2svn-user mailing list:

  • None