pythonstringdataframelist

How to separate a list of characters following a sequence in Python?


I am trying to separate the following list:

['\nYear\nMonth\nValue\n', '\n2023\nAugust\n(p) 164.06\n', '\n2023\nJuly\n(sf) (r) 148.02\n']

such that the values will be reflected as shown in the table below:

Year Month Value
2023 August (p) 164.06
2023 July sf) (r) 148.02

The first item of the list is separated into 3 column headers: Year, Month, Value. I used bs4 to scrape a website but the data came in a list formatted in a way which is hard to work with.

Hoping someone would be able to share the code to manipulate the said list in a dataframe version.

Appreciate your help and thanks in advance!


Solution

  • One of possible solution is to use pd.read_csv:

    from io import StringIO
    
    lst = [
        "\nYear\nMonth\nValue\n",
        "\n2023\nAugust\n(p) 164.06\n",
        "\n2023\nJuly\n(sf) (r) 148.02\n",
    ]
    
    
    df = pd.read_csv(StringIO("\n".join(s.strip().replace("\n", ",") for s in lst)))
    print(df)
    

    Prints:

       Year   Month            Value
    0  2023  August       (p) 164.06
    1  2023    July  (sf) (r) 148.02