Converting an IPython notebook to other formats with nbconvert

An IPython notebook is saved in a JSON text file. This file contains the entire contents of the notebook: text, code, and outputs. The matplotlib figures are encoded as base64 strings within the notebooks, resulting in standalone, but sometimes big, notebook files.

Note

JSON is a human-readable, text-based, open standard format that can represent structured data. Although derived from JavaScript, it is language independent. Its syntax bears some resemblance with Python dictionaries. JSON can be parsed in many languages including JavaScript and Python (the json module in Python's standard library).

IPython comes with a tool called nbconvert that can convert notebooks to other formats: raw text, Markdown, HTML, LaTeX/PDF, and even slides with the reveal.js library. You will find more information about the different supported formats on the nbconvert documentation.

In this recipe, we will see how to manipulate the contents of a notebook and how to convert it to other formats.

Getting ready

You need to install pandoc, available at http://johnmacfarlane.net/pandoc/, which is a tool for converting files from one markup language to another.

To convert a notebook to PDF, you need a LaTeX distribution, which is available at http://latex-project.org/ftp.html. You also need to download the Notebook dataset from the book's website (https://github.com/ipython-books/cookbook-data), and extract it in the current directory.

On Windows, you may need the pywin32 package. If you use Anaconda, you can install it with conda install pywin32.

How to do it...

  1. Let's open the test notebook in the data folder. A notebook is just a plain text file (JSON), so we open it in the text mode (r mode) as follows:
    In [1]: with open('data/test.ipynb', 'r') as f:
                contents = f.read()
            print(len(contents))
    3787

    Here is an excerpt of the test.ipynb file:

    {
     "metadata": {
      "celltoolbar": "Edit Metadata",
      "name": "",
      "signature": "sha256:50db..."
     },
     "nbformat": 3,
     "nbformat_minor": 0,
     "worksheets": [
      {
    ...
         "source": [
          "# First chapter"
         ]
        },
      ...
       ],
       "metadata": {}
      }
     ]
    }
  2. Now that we have loaded the notebook in a string, let's parse it with the json module as follows:
    In [3]: import json
            nb = json.loads(contents)
  3. Let's have a look at the keys in the notebook dictionary:
    In [4]: print(nb.keys())
            print('nbformat ' + str(nb['nbformat']) + 
                  '.' + str(nb['nbformat_minor']))
    [u'nbformat', u'nbformat_minor', u'worksheets', u'metadata']
    nbformat 3.0

    Note

    The version of the notebook format is indicated in nbformat and nbformat_minor. Backwards-incompatible changes in the notebook format are to be expected in future versions of IPython. This recipe has been tested with the IPython 2.x branch and the notebook format v3.

  4. The main field is worksheets; there is only one by default. A worksheet contains a list of cells and some metadata. The worksheets field may disappear in a future version of the notebook format. Let's have a look at the contents of a worksheet:
    In [5]: nb['worksheets'][0].keys()
    Out[5]: [u'cells', u'metadata']
  5. Each cell has a type, optional metadata, some contents (text or code), possibly one or several outputs, and other information. Let's look at a Markdown cell and a code cell:
    In [6]: nb['worksheets'][0]['cells'][1]
    Out[6]: {u'cell_type': u'markdown',
     u'metadata': {u'my_field': [u'value1', u'2405']},
                   u'source': [u"Let's write ...:\n", ...]}
    In [7]: nb['worksheets'][0]['cells'][2]
    Out[7]: {u'cell_type': u'code',
             u'collapsed': False,
             u'input': [u'import numpy as np\n', ...],
             u'language': u'python',
             u'metadata': {},
             u'outputs': [
                          {u'metadata': {},
                           u'output_type': u'display_data',
                           u'png': u'iVB...mCC\n',
                           u'prompt_number': 1}]}
  6. Once parsed, the notebook is represented as a Python dictionary. Manipulating it is therefore quite convenient in Python. Here, we count the number of Markdown and code cells as follows:
    In [8]: cells = nb['worksheets'][0]['cells']
            nm = len([cell for cell in cells
                      if cell['cell_type'] == 'markdown'])
            nc = len([cell for cell in cells
                      if cell['cell_type'] == 'code'])
            print(("There are {nm} Markdown cells and "
                   "{nc} code cells.").format(nm=nm, nc=nc))
    There are 2 Markdown cells and 1 code cells.
  7. Let's have a closer look at the image output of the cell with the matplotlib figure:
    In [9]: png = cells[2]['outputs'][0]['png']
            cells[2]['outputs'][0]
    Out[9]: {u'metadata': {},
             u'output_type': u'display_data',
             u'png': u'iVBORwoAAAANSUhE...ErAAAElTkQmCC\n'}
  8. In general, there can be zero, one, or multiple outputs. Additionally, each output can have multiple representations. Here, the matplotlib figure has a PNG representation (the base64-encoded image) and a text representation (the internal representation of the figure).
  9. Now, we are going to use nbconvert to convert our text notebook to other formats. This tool can be used from the command line. Note that the API of nbconvert may change in future versions. Here, we convert the notebook to an HTML document as follows:
    In [10]: !ipython nbconvert --to html data/test.ipynb
    [NbConvertApp] Writing 187617 bytes to test.html
  10. Let's display this document in an <iframe> (a small window showing an external HTML document within the notebook):
    In [11]: from IPython.display import IFrame
             IFrame('test.html', 600, 200)
  11. We can also convert the notebook to LaTeX and PDF. In order to specify the title and author of the document, we need to extend the default LaTeX template. First, we create a file called mytemplate.tplx that extends the default article.tplx template provided by nbconvert. We specify the contents of the author and title blocks as follows:
    In [12]: %%writefile mytemplate.tplx
             ((*- extends 'article.tplx' -*))
             
             ((* block author *))
             \author{Cyrille Rossant}
             ((* endblock author *))
             
             ((* block title *))
             \title{My document}
             ((* endblock title *))
    Writing mytemplate.tplx
  12. Then, we can run nbconvert by specifying our custom template as follows:
    In [13]: !ipython nbconvert --to latex --template mytemplate data/test.ipynb
             !pdflatex test.tex
    [NbConvertApp] PDF successfully created

    We used nbconvert to convert the notebook to LaTeX, and pdflatex (coming with our LaTeX distribution) to compile the LaTeX document to PDF. The following screenshot shows the PDF version of the notebook:

How it works...

As we have seen in this recipe, an .ipynb file contains a structured representation of the notebook. This JSON file can be easily parsed and manipulated in Python.

nbconvert is a tool for converting a notebook to another format. The conversion can be customized in several ways. Here, we extended an existing template using jinja2, a templating package. You will find more information in the documentation of nbconvert.

There's more...

There is a free online service, nbviewer, that lets us render IPython notebooks in HTML dynamically in the cloud. The idea is that we provide to nbviewer a URL to a raw notebook (in JSON), and we get a rendered HTML output. The main page of nbviewer (http://nbviewer.ipython.org) contains a few examples.

This service is maintained by the IPython developers and is hosted on Rackspace (www.rackspace.com).

Here are some more references: