tigue.com

Jupyter Books' Many Manifestations of a Notebook

March 03, 2020

four reps of ipynb

This is an addendum to my 2020-01 post entitled, Jupyter Book to Colab.

Jupyter Book is a tool which generates static HTML renderings of Jupyter .ipynb files. It can optionally generate links to live Python kernels which can run the code in the original .ipynb files. This is called the “interact” button.

In my earlier post I described how I extended the interact button to work with Colab rather than one of Jupyter Books’ already built-out interacts to Jupyer services (e.g. Binder).

While discussing this little hack, I have found this whiteboard sketch helps explain what is going on under the hood. In the context of Jupyter-Book-interact-Colab deploy, any given Jupyter notebook .ipynb file can have four manifestions.

Let’s walk through the five steps.

1. The source notebook at home

A git repository is archived somewhere, say, Microsoft GitHub (but it could be any git repo). In the context of this post the repo is one built out to work with Jupyter Book, which means is it essentially just a collection of Jupyter notebooks and markdown files.

2. Pre-run notebook as HTML

For step 2, the repo has been fetched from GitHub and run through Jupyter Book with the output being a bunch of static web content (HTML, JavaScript, CSS, and images).

Static web sites are the simplest kind of web site: they are simply file servers talking HTTP. In this diagram the example static site is http://static-bar.com.

3. Hand off to Colab

This is what in Jupyter Book is referred to as interacting, moving from a static web page rendering of an (optionally pre-run) notebook to something backed by a live Jupyter kernel. Normally, Jupyter Book will hand off to Binder for provisioning Jupyter kernels. In my hack, open source Binder is replaced with commercial Google Colab.

The hand off is simply an http:// URL to Colab, which includes/ends-with a map to the .ipynb file that Colab should load from GitHub. That mapping will result in an URL of the form:

https://colab.research.google.com/github/my_org/my_repo/blob/my_branch/my_file.ipynb

4. Colab kernel spin-up

Next, the web browser follows the http://colab.research.google.com URL, loading a new web page. At Colab, an HTTP GET arrives and the URL is parsed. When colab sees the /github/ part, it knows that the user is requesting that an .ipynb file be fetched from GitHub. The tail of the URL provides the organization, repo name, and relative file path. Colab then fetches the specified file from github.com.

Behind the scene Colab spins up a new virtual machine to provide a Jupyter kernel for the request. (Anyone with a gMail email address can have up to two VMs running simultaneously.)

Eventually (quickly) the HTTP response goes down to the browser where the user sees the notebook and can run the code.

5. Persisting a modified version

“Playground mode” is the Colab term for a transient, unpersisted version of a notebook running in a Colab VM. If a reader wants to play with and run the code (read: modify the input notebook) and keep a copy, exiting playground mode will save a copy of the modified .ipynb in the users Google Drive.

The take away is that open source tools make it possible to have a static web site showing HTML rendering of .ipynb files. Those static HTML files can then link to Colab (or Binder) to on-demand hook the notebook up to a new VM. A static web site linking to free compute.