March 15, 2021

Scrubbing Personal Data From Git Repos

Guides git workflow

Paul Kelly

The problem

I made a booboo. I was trying to do the right thing, honest. I will put it down to a bit of rustiness of using Git. I was trying to scrub my personal data from a fork of a repo that I was working on and put an example config file in place of my actual one, but ended up committing the real one to GitHub instead.

Luckily, without me even having to do anything I received two emails, one from MailGun (it was my SMTP password for an account I was using for emailing tasks):

Screenshot of a warning email from Mailgun

and one from GitGuardian that both pointed out how silly I had been and offered steps on how to resolve it.

Screenshot of a warning email from GitGuardian

The solution

Removing the offending secrets from the repo isn’t as simple as just deleting the files, because they are still in your history.

Luckily, a quick search revealed BFG which would allow me to scrub the repositories history. I was working on WSL when I messed up, but whatever you are using you will need Java installed to make it work

sudo apt install default-jre

Then create a mirror of the affected repository

git clone --mirror git@github.com:your-compromised-repo.git

This gave me an empty repository, but had all the git history.

To remove the offending items I created a passwords.txt file that had the passwords that needed replaced as well as port numbers, smtp server addresses and email addresses. All of these would be replaced with REMOVED once the tool had run.

Then run BFG:

java jar -bfg.jar --replace-text passwords.txt your-compromised-repo.git

This ran through changed everything it found and then asked me to run:

git reflog expire --expire=now --all && git gc --prune=now --aggressive
git push

Now a scan using GitGuardian gave me the all clear:

Screenshot of GitGuardian with all issues resolved

To get my Mailgun account back up and running I had to change my account password, enable 2FA (I should have done this anyway), and then refresh all my API keys and change all smtp passwords I had setup. Turns out I use Mailgun more than I thought so that took a little while.

However, once it was all done my account was re-enabled within 5mins of opening a support ticket.

Chalk this one up to a learning experience.