/ git

Splitting a git branch by files modified

Two of the nice things about git are the ability to create lightweight branches, and the ability to rewrite commit history to rationalise things after the fact. As a relatively recent migrant from Subversion, however, these are things that I sometimes forget about, and I therefore don't always make full use of them.

Recently, I was working away on some new developments to TractoR, when I realised that I'd been working on three logically separate ideas, which touch generally quite different sets of files, all within one branch. There's nothing explicitly wrong with this, but since branches are so nice in git it would have been useful to create a separate branch for each thread of work, making it easier to integrate just one of them into the main codebase later if required. Since there were quite a few commits associated with each thread, picking and choosing them by hand to put into three new branches wasn't very appealing, but since they concerned different files, I thought that would provide a shortcut to picking up what I wanted in each case.

But a bit of documentation-searching later, I still wasn't clear how to do what I wanted. So I asked for help on Twitter.

First to reply was @mjdominus, who suggested using a loop:

git checkout -b newbranch start^
for commit in $(git rev-list --reverse start..end); do
  git checkout $commit -- <files to keep on the new branch>
  git commit -C $commit

where start and end are the first and last commit on the branch that I want to split. This essentially creates a new branch with the same starting point as the one to be split, then checks out the state of each file whose changes I want to keep at a particular commit in the history and recommits the changes. But, apart from being a little verbose, this has the problem of not behaving as required if (some of) the files of interest don't exist at the starting commit.

My friend Mark suggested another approach by e-mail:

git checkout -b newbranch currentbranch
git filter-branch --prune-empty --index-filter \
  'git rm -r --cached --ignore-unmatch <files to remove on the new branch>' newbranch

This creates the new branch to be exactly the same as the current branch, which contains all of the commits from all three threads of development, and then rewrites its history to remove unwanted files. In practice this gets me fairly close to where I wanted to be, but its effect is to remove files, and I only really wanted to remove commits to files (i.e. to leave them in the state they were in the upstream branch), rather than to remove the files themselves. I also wanted to stop rewriting history at the point the new branch split from the upstream branch.

After a bit more head-scratching I came up with a third alternative:

git checkout -b newbranch upstreambranch
git rev-list --reverse --no-merges upstreambranch..currentbranch -- \
  <list of files to keep on the new branch> | git cherry-pick --ff --stdin

This says, create a new branch from the upstream branch, and then find the commits touching the specified files and apply them to the new branch. This is the closest to what I originally had in mind, but it has the drawback that git cherry-pick can't handle merge commits, so if any of those occur in the current branch, it may cause problems (although the -m parameter may help). In my case, this wasn't an issue, so this was the method that I ultimately used.

These three different approaches to the problem have certainly taught me more about git, so that in itself makes this a useful exercise. But I thought I'd also put them up here for posterity, and in case any of them is useful to someone else. It does look like I'm not quite the only one to want to do something like this.