I forgot that this is a tech blog sometimes. Some notes from work.
(If you just want an answer to your problem, scroll to "A clean setup for private repos" below. This solution does not use personal access tokens, so your team doesn't need to create a billion GitHub accounts.)
Colab is Google's version of a Jupyter notebook in the cloud. It has clean APIs to everything Google and is pre-installed with some nice libraries. I begrudgingly admit that it's not bad.
But Colab still has the same code problems that all Jupyter notebooks have:
- no code review
- no unit tests
- no tooling support
- no reuse across notebooks
- no change history
- no namespacing
- no moral decency
The fix for all of these, of course, is to put important code in a repo then pull that code in as needed. But it's not obvious how to do that.
The setup for public repos
If your repo is public, this is a little easier:
Then your cloned repository is on the filesystem, and you can do whatever you
want with it. If you feel fancy, you can even add a
setup.py to your repo
pip install it, though this will cause problems if your dependencies
clash with what Colab already provides:
But if your code is sensitive or proprietary, keeping it public isn't a great business plan.
Some resources online will suggest getting an access token for your account then using that to fetch the repo. But if you work in a larger team, everyone has to set up GitHub, get an access token, and modify the notebook to use it. It's just not a clean or simple solution.
A clean setup for private repos
Here's a clean setup that doesn't require all of your Colab users to create GitHub accounts:
Create a new public/private key pair that you will use only for this integration, e.g. through
ssh-keygen -t ed25519. Keep it simple: default directory, no passphrase.
Set the public key (
~/.ssh/id_ed25519.pub) as the deploy key on your private repo. You have the option to enable write access, but this is foolish without good cause. Keep it read-only.
Dump the private key (
~/.ssh/id_ed25519) into a string that you save in your Colab. Your security instincts will scream at you not to do this. But (a) anyone with access to the Colab could already see your private code anyway and (b) this private key is used only for the repo read, not for anything else.
Add some logic to write your private key string to
~/.ssh/id_ed25519. If the filesystem copy is ever lost, this logic will just write the key back to where it needs to be. But before you run this code, please test that the content of your string is identical to the content of
And now you can get your repo. End-to-end, it looks something like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Anyone with access to the Colab will be able to see your repo. So if your repo contains other secrets you don't want people to see, perhaps split it in two and use only the safe version with Colab.
Credit where credit is due: thanks to Felix Müller for describing
this approach. My main change was to use
ed25519 and to clean up one of the
code examples. I also used this post an excuse to complain about Jupyter
notebooks, which is always a meritorious deed.