Search code examples
pythonhtmlpandasparsingstringio

How can i convert this lists object into a dataframe?


I have something which lokos like (called lines)

['                id\t    Name\t    Type\t      User\t     Q\t             country\t       Final-score\t       Progress\t                       website',
 'abcde\t                 jen\t      engineer\t  jenabc\t           RUNNING\t         UK\t             75%\t                                N/A',
 'fres\t        Penny\t               dr\t     dr123\t           RUNNING\t         DENMARK\t             67%\t                                N/A'] 

each line which is in the speech marks and separated by ',' is a dataframe row. However i cannot convert to dataframe.

new_df = pd.read_csv(StringIO(",".join(lines[1:])),sep = "\t") 

i do [1:] since the first line is just a comment. i get the error: ParserError: Error tokenizing data. C error: Expected 963 fields in line 3, saw 1099

i'd like my datframe to be such that the first row is the headers, and the rest are the contents separated by \t. how can i do this?


Solution

  • df = pd.read_csv(StringIO("\n".join(lines)), sep=r"\s+")
    print(df)
    

    Prints:

          id   Name      Type    User        Q  country Final-score  Progress  website
    0  abcde    jen  engineer  jenabc  RUNNING       UK         75%       NaN      NaN
    1   fres  Penny        dr   dr123  RUNNING  DENMARK         67%       NaN      NaN