Search code examples
pythonparsingmarkdown

Convert markdown table to json with python


I am trying to figure out, what is the easiest way to convert some markdown table text into json using only python. For example, consider this as input string:

| Some Title | Some Description             | Some Number |
|------------|------------------------------|-------------|
| Dark Souls | This is a fun game           | 5           |
| Bloodborne | This one is even better      | 2           |
| Sekiro     | This one is also pretty good | 110101      |

The output should be like this:

[
    {"Some Title":"Dark Souls","Some Description":"This is a fun game","Some Number":5},
    {"Some Title":"Bloodborne","Some Description":"This one is even better","Some Number":2},
    {"Some Title":"Sekiro","Some Description":"This one is also pretty good","Some Number":110101}
]

Note: Ideally, the output should be RFC 8259 compliant, aka use double quotes " instead of single quotes ' around they key value pairs.

I've seen some JS libraries that do that, but nothing for python only.


Solution

  • You can treat it as a multi-line string and parse it line by line while splitting at \n and |

    Simple code that does that:

    import json
    
    my_str='''| Some Title | Some Description             | Some Number |
    |------------|------------------------------|-------------|
    | Dark Souls | This is a fun game           | 5           |
    | Bloodborne | This one is even better      | 2           |
    | Sekiro     | This one is also pretty good | 110101      |'''
    
    def mrkd2json(inp):
        lines = inp.split('\n')
        ret=[]
        keys=[]
        for i,l in enumerate(lines):
            if i==0:
                keys=[_i.strip() for _i in l.split('|')]
            elif i==1: continue
            else:
                ret.append({keys[_i]:v.strip() for _i,v in enumerate(l.split('|')) if  _i>0 and _i<len(keys)-1})
        return json.dumps(ret, indent = 4) 
    print(mrkd2json(my_str))
    
    [
        {
            "Some Title": "Dark Souls",
            "Some Description": "This is a fun game",
            "Some Number": "5"
        },
        {
            "Some Title": "Bloodborne",
            "Some Description": "This one is even better",
            "Some Number": "2"
        },
        {
            "Some Title": "Sekiro",
            "Some Description": "This one is also pretty good",
            "Some Number": "110101"
        }
    ]
    

    PS: Don't know about any library that does that, will update if I find anything!