Search code examples
htmljupyter-notebooktagsfile-conversion

how to convert a jupyter notebook containing html tags to HTML?


I have a jupyter notebook (python) in which I have HTML code to display sections for example :

<div style="background-color: #1A079B; width:100%" >
  <h2 style="margin: auto; padding: 20px; color:#fff; ">1 - Test </h2>
</div>

enter image description here

What I want is to convert this notebook to HTML. I used this command in a notebook cell :

 !jupyter nbconvert $p1 --to html --output $p2

the output HTML file is created but completely buggy. I think the conversion does not like html embedded in markdown.
output HTML file converted from jupyter notebook

I put the whole code used. It is not very readable it is in ipynb format

        {
     "cells": [
      {
       "cell_type": "markdown",
       "id": "geographic-press",
       "metadata": {},
       "source": [
        "<div style=\"background-color: #1A079B; width:100%\" >\n",
        "<h2 style=\"margin: auto; padding: 20px; color:#fff; \">1 - Test </h2>\n",
        "</div>"
       ]
      },
      {
       "cell_type": "markdown",
       "id": "allied-finish",
       "metadata": {},
       "source": [
        "bla bla bla"
       ]
      },
      {
       "cell_type": "markdown",
       "id": "strong-estonia",
       "metadata": {},
       "source": [
        "# first text"
       ]
      },
      {
       "cell_type": "markdown",
       "id": "aquatic-stopping",
       "metadata": {},
       "source": [
        "<div style=\"background-color: #1A079B; width:100%\" >\n",
        "<h2 style=\"margin: auto; padding: 20px; color:#fff; \">2 - test </h2>\n",
        "</div>"
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 1,
       "id": "efficient-grass",
       "metadata": {},
       "outputs": [],
       "source": [
        "import os"
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 2,
       "id": "architectural-membership",
       "metadata": {},
       "outputs": [],
       "source": [
        "path_actuel = os.getcwd()+'/documentation/test'\n",
        "filename = \"test.ipynb\"\n",
        "output_name = \"test.html\""
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 3,
       "id": "incomplete-above",
       "metadata": {},
       "outputs": [
        {
         "name": "stdout",
         "output_type": "stream",
         "text": [
          "o path_actuel: /opt/continuum/project/documentation/test \n",
          "o filename: test.ipynb \n",
          "o output_name: test.html\n"
         ]
        }
       ],
       "source": [
        "print('o path_actuel:', path_actuel, '\\no filename:', filename, '\\no output_name:', output_name)"
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 4,
       "id": "olympic-magnitude",
       "metadata": {},
       "outputs": [],
       "source": [
        "p1 = path_actuel+'/'+filename\n",
        "p2 = path_actuel+'/'+output_name"
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 5,
       "id": "fundamental-screening",
       "metadata": {},
       "outputs": [
        {
         "name": "stdout",
         "output_type": "stream",
         "text": [
          "o p1: /opt/continuum/project/documentation/test/test.ipynb \n",
          "o p2: /opt/continuum/project/documentation/test/test.html\n"
         ]
        }
       ],
       "source": [
        "print('o p1:', p1, '\\no p2:', p2)"
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 6,
       "id": "lesbian-pizza",
       "metadata": {},
       "outputs": [
        {
         "name": "stdout",
         "output_type": "stream",
         "text": [
          "[NbConvertApp] Converting notebook /opt/continuum/project/documentation/test/test.ipynb to html\n",
          "[NbConvertApp] Writing 574385 bytes to /opt/continuum/project/documentation/test/test.html\n"
         ]
        }
       ],
       "source": [
        "!jupyter nbconvert $p1 --to html --output $p2\n",
        "# !jupyter nbconvert $p1 --to html --output $p2\n"
       ]
      },
      {
       "cell_type": "code",
       "execution_count": null,
       "id": "competitive-isolation",
       "metadata": {},
       "outputs": [],
       "source": []
      }
     ],
     "metadata": {
      "kernelspec": {
       "display_name": "Python 3 [anaconda202011_py38]",
       "language": "python",
       "name": "anaconda-project-anaconda202011_py38-python3"
      },
      "language_info": {
       "codemirror_mode": {
        "name": "ipython",
        "version": 3
       },
       "file_extension": ".py",
       "mimetype": "text/x-python",
       "name": "python",
       "nbconvert_exporter": "python",
       "pygments_lexer": "ipython3",
       "version": "3.8.5"
      }
     },
     "nbformat": 4,
     "nbformat_minor": 5
    }

If someone has an idea to correct that !!!


Solution

  • OPTION A: Kludge workaround

    I found a way to get the result I think you want using your test notebook code supplied as a starting point. I confirmed that it works whether you are saving the notebook as HTML using the 'File' menu > 'Save and Export Notebook as...' or use the '!jupyter nbconvert' command you had in your notebook.

    First editing inside the notebook view of your supplied notebook I took your markdown cells that contained HTML and made them code cells. Then I added the cell magic code %%html as the first line of each of those new code cells above the HTML contents. Running this results in the same view you had although you now have the input code. Optionally, you can collapse that in the notebook rendering if you want; however, it is moot if your goal is only the HTML rendered result anyway.

    enter image description here

    Now you convert the modified notebook to HTML using your way or the 'FIle' menu. If you view the HTML file produced in a browser or inside JupyterLAb, you'll see it has the cells now vertical again. Except for the addition of the code cells producing the markdown, you have what you want now. Luckily, it is easy to remove the code cells from the HTML. For example, from the top example in your code cell, you remove this block of code in the HTML file:

    <div class="jp-InputPrompt jp-InputArea-prompt">In&nbsp;[2]:</div>
    <div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
         <div class="CodeMirror cm-s-jupyter">
    <div class=" highlight hl-ipython3"><pre><span></span><span class="o">%%html</span>
    <span class="p">&lt;</span><span class="nt">div</span> <span class="na">style</span><span class="o">=</span><span class="s">&quot;background-color: #1A079B; width:100%&quot;</span> <span class="p">&gt;</span>
    <span class="p">&lt;</span><span class="nt">h2</span> <span class="na">style</span><span class="o">=</span><span class="s">&quot;margin: auto; padding: 20px; color:#fff; &quot;</span><span class="p">&gt;</span>1 - Test <span class="p">&lt;/</span><span class="nt">h2</span><span class="p">&gt;</span>
    <span class="p">&lt;/</span><span class="nt">div</span><span class="p">&gt;</span>
    </pre></div>
    
         </div>
    

    In other words, the part that corresponds to the starting <div class="jp-InputPrompt jp-InputArea-prompt"> and ends at the indented </div> that matches with the div class="CodeMirror cm-s-jupyter"> div.

    enter image description here

    The use of the '%%html' as a way to find the section to delete in the produced HTML is nice.

    You do that for both and you get what you want.

    enter image description here

    I know that seems inconvenient; however, I suspect this process could be scripted if you use of the HTML in the initial markdown code is consistent enough. (I'd use nbformat to do the step of moving the HTML code from the markdown cell to a code cell with the %%html tag and then I suspect iterating find and delete using a regular expression would work on the produced HTML file.) Your test notebook is very consistent as the content of the markdown cells all begin <div style=. Is that the case in all the notebooks you are working with? I didn't want to try scripting any automation until confirmed.


    OPTION B: Change to using 'div class alert'

    While the div style combo you are using triggers the horizontal rendering in the HTML, in the markdown cells you can use alert divs inside and they won't trigger the weird rendering in the HTML later. This has the advantage that rendering in notebook and HTML is all consistent.

    So in your markdown cells, this could be example use:

    <div class="alert alert-block alert-warning">
    <h2 style="margin: auto; padding: 20px; color:#fff; ">1 - Test </h2>
    </div>
    

    -Or-

    <div class="alert alert-block alert-danger">
    <h2 style="margin: auto; padding: 20px; color:#444; ">1 - Test </h2>
    </div>
    

    -Or-

    <div class="alert alert-block alert-info">
    <h2 style="margin: auto; padding: 20px; color:#222; ">1 - Test </h2>
    </div>
    

    So in the notebook it renders, like this: enter image description here

    And in the HTML it renders, like this: enter image description here

    (NOTE THAT THE VERSION OF THE ORIGNAL HTML SUPPLIED RENDERS CORRECTLY IN THE HTML IF YOU CHANGE IT TO RAW! That's the blue box with '2 - test' text. See below.)

    OPTION C: Switch cells with HTML in them to 'Raw' cells before Conversion step

    This works on the supplied notebook without changing anything except the type of cell where the original HTML styling code is harbored. You change the cell containing the HTML code from being designated as markdown type to 'Raw' type.

    It's not perfect in regards to matching the cell width, and it doesn't work directly in the notebook rendering. But it is the 'path of least resistance' and results in an HTML version of the notebook that doesn't go horizontal. You can get the best of both worlds if you are careful. You can have the version of the notebook saved with the HTML as markdown and then when ready to convert, you just change those cells to 'Raw' and run the conversion.

    Example starting from the original supplied notebook, change the two markdown cells containing HTML to the 'Raw' type: enter image description here

    And then the HTML looks close to the desired result: enter image description here

    The conversion to 'Raw' could be automated by using nbformat if the cells containing markdown and HTML are consistent enough to easily distinguish.