Creating great Pull Requests

Creating a pull request is simple. But creating a really good pull request seems to be a different beast. If you follow some simple rules a great PR is easy to create:

One PR, one Feature (or bugfix)

Keep PRs focused. Do not combine multiple features or bug fixes into a single PR. A focused PR is much easier to review. Thus the maintainer can usually merge it more quickly.

If a PR contains more than one feature, it can only be merged after issues for all features are solved. Sometimes this means it takes very long to merge a PR because of a single controversial feature.

Nevertheless, there are situations where it makes sense to contain multiple bug fixes into a single PR. This can happen after new code analysis tools have been added. Or when new tests have been developed. An indication probably is also that the individual fixes are small. If in doubt, create multiple pull requests.

Use One Commit for Feature-PRs

If you add a single feature, there is no reason to use more than one commit for it. If there would be, you’d actually add more than one feature. What I often see is multiple commits because of follow-on commits. These fix errors in a previous commit. Do not do this! We all make errors, and the need to fix a previous commit is universal. Use “git commit –amend” to apply that fix to the previous commit. Alternatively, squash the commits at a later stage. This makes sense if multiple people work as a team on a single feature.

Be sure to add tests for your feature. You may be tempted to add them in a second commit. If you really want, that is fine. But for new feature, create a single commit that covers it all. Among others, this makes it much easier for others to cherry-pick your feature.

Use Two Commits for Bugfix-PRs

Sounds a bit counter-intuitive, but has to do with the way distribution maintainers (Debian, Fedora, Suse, etc) work. When they backport a bugfix, they usually want to change as little as possible. Help them by making clear what the minimal change is.

Do a commit for the bugfix itself. This commit should contain everything needed to fix the problem – but not more. Most importantly, I recommend not to include new dynamic tests into the bugfix commit. When backporting, the maintainer will possibly not want to add the new tests.

And of course you want dynamic tests, right? The proper process to fix a bug is

  1. create tests that reproduce the problem
  2. craft a patch you think that fixes it
  3. run your new tests to see if it really does
  4. as long as the tests fail, go to step 2

Sweet and easy. Simply put your tests (and all related changes except the bugfix) into a single commit. What now often happens is that the test becomes the first commit, and the bugfix the second. Do not do this! It breaks “git bisect” (a great tool to discover regressions). The problem is that after commit one, tests fail. This violates the policy that each commit by itself should build and run without problems.

The proper sequence is bugfix commit first, testing commit second. Again, it’s a bit counter-intuitive, and you may want to work with two branches and “git cherry-pick” to achieve that. A simpler alternative is to keep the tests local after until the bugfix is finished. Then commit the bugfix first and do a second commit immediately thereafter with the tests.

Write Descriptive Commit Messages

Keep it small but descriptive. Start with a concise title, best with less than 60 characters. “update stream.c” is really a bad title. Go for “imfile: fix segfault when parameter validation fails” if that is what the commit is all about.

If the change is small the title alone can be sufficient. Does it need more explanation? Add it after the title and leave an empty line between them. Try to keep the explanation brief, but do not be shy to write a lot if you really need it. I have had commits with more than hundred lines of commit message. I loved them. The explanation made understanding and reviewing the PR much easier.

Put the text into the commit message – do not only a link to an external reference. Often I see a very brief commit message .. and a wealth of info in the github PR. This is wrong because github may go away. We have used several platforms in the past and they tend to go away (sourceforge is a good example). Putting all the info into the commit itself has big advantages:

  • it is eternally available
  • it is readily available when a developer needs it – “git show” does it all

Github populates the intial PR text from the commit. Write your full commit message first, then create the PR. You will see the text in both places at no extra effort. Of course, this requires that there is only a single commit inside the PR ;-). If I have long text in bugfixing commits, I sometimes tend to create the PR with just the first commit present. This populates the github PR text field. Thereafter I add the test commit.

Add references to external issue tracking systems (bugzilla, github, etc) at the bottom of the commit text. Be sure to include full URLs. I often see “closes #1234“. Of course, you can guess this means a GitHub issue. It’s far better to keep out the guesswork.

Provide Documentation for User-Visible Changes

If you implement a great feature and do not update the doc, nobody knows about it. So the PR would be waste of effort. Do not try to have the maintainer do your homework. This almost never works. Many projects require updated doc before merging.

Depending on project structure, the doc may be kept in a sister project. In rsyslog we do this for access permission reasons. There are different doc and code committers. Rsyslog feature PRs thus require a matching PR in rsyslog-doc.

If you really do not know how to update the doc, contact the maintainers and work with them. Readily-usable text is key. A maintainer is much more likely to do the formatting stuff for you if you provide raw text. Raw text that is really usable…

Keep your PRs in Shape while you Work on Them

If your PR is non-trivial, you probably need to correct and/or add something. A lot can go wrong even for a well done PR:

  • CI may use tools you usually do not use (e.g. static analyzers).
  • CI may include platforms you have never used – they may show so-far unseen build or test failures
  • you may violate project specific coding policies
  • and, and, and, …

Be patient and fix the issues. Work with the maintainers if you are uncertain about a specific problem. Maybe it’s just a false positive (CI systems are notoriously flaky). During this work, you need to change your PR. Keep on to the usual policies.

It may make sense to add multiple commits to the PR while multiple people work on it. This can happen if the maintainers help you setting things straight. Multiple commits are no issue as long as they are temporary. Once all is done, squash your commits. Ask for help if you don’t know how.

Some PRs take a while to be merged. I strongly suggest to rebase frequently. Do this via “git rebase“, not via “git merge“. The latter generates a mess in git history. Frequent rebase is no effort at all, but it saves a lot of time by preventing merge conflicts before they occur. In my experience most merge conflicts are caused by too-far dividing code bases.

Note that some CI systems may require you to rebase frequently. Others do not. The rsyslog CI, for example,  does not require to rebase as long as the PR merges without problems to current master. If there is a merge conflict, all bets are off. Then you need to rebase. Rebasing late tends to be painful. So do it early.

Note: This is my personal opinion on what makes up an ideal PR. Others may have different opinions. Also note that gitlab calls “pull requests” “merge requests”. It’s the same thing.