Search code examples
pythonpandasmarkdown

Pandas: how to read from markdown string?


I have a table on a git issue which has data about weekly workers.

The table written with markdown, and is like:

start | end | main | sub   
-- | -- | -- | --    
1/30 | 2/6 | Alice | Bob   
2/6 | 2/13 | Charlie | Dave   

I can get the current date, and can get that markdown text from rest api.

What I get from rest api is a string, which separates lines with \r\n.

And what I wanna do more is to figure out the weekly workers, but I'm stuck on this.

Do you have any good ideas?


Solution

  • For further applications, I think you should form the table become records with each record as a dictionary whose key is the columns' name.

    First, get headers of the table by:

    tableStr = 'start | end | main | sub'\
        + '\r\n' + '-- | -- | -- | --'\
        + '\r\n' + '1/30 | 2/6 | Alice | Bob'\
        + '\r\n' + '2/6 | 2/13 | Charlie | Dave'
    
    headersStr = tableStr[:tableStr.find('\r\n')]
    headers = [h.strip() for h in headersStr.split('|')]
    

    then, parse the table to records by:

    records = []
    for rowStr in tableStr.split('\r\n')[2:]:
        row = [entry.strip() for entry in rowStr.split('|')]
        record = {headers[i]:row[i] for i in range(len(headers))}
        records.append(record)
    
    print(records)
    

    you would get in console:

    [{'start': '1/30', 'end': '2/6', 'main': 'Alice', 'sub': 'Bob'}, {'start': '2/6', 'end': '2/13', 'main': 'Charlie', 'sub': 'Dave'}]
    

    I'm not sure about how you define weekly worker, but you could do something with it. For example:

    reverseSortedBySub = sorted(records, key=lambda x: x['sub'], reverse=True)
    print(reverseSortedBySub)
    

    then you'd get:

    [{'start': '2/6', 'end': '2/13', 'main': 'Charlie', 'sub': 'Dave'}, {'start': '1/30', 'end': '2/6', 'main': 'Alice', 'sub': 'Bob'}]