Search code examples
pythonpandasdataframetxt

read from txt file and convert into dataframe in python


I have a txt file as following:

sub_ID: ['sub-01','sub-02']

ses_ID: ['ses-01','ses-01']

mean: [0.3456,0.446]

I want to read this and convert it to a dataframe such as in the image -don't mind the values in mean_e_field column, it's just an example. the values should be the same as in the txt file. desired dataframe

I tried this and got this however I can't transform it to my prefered df :dataframe data = pd.read_csv(filename, sep=",", header=None) data

I appreaciate your answers in advance.


Solution

  • So, several things here.

    The reason why your previous data = pd.read_csv(filename, sep=",", header=None) did not work is that you've indicated that it should separate on , and it treats every single line as a row to be split. So, sub_ID: [ 'sub-01','sub-02' ] is split to sub_ID: ['sub-01' and 'sub-02' ].

    The example data you've provided seems to be in YAML format:

    sub_ID: [ 'sub-01','sub-02' ]
    
    ses_ID: [ 'ses-01','ses-01' ]
    
    mean: [ 0.3456,0.446 ]
    

    If it were CSV, the data would look as follows (it does not):

    sub_ID,ses_ID,mean
    sub-01,ses-01,0.3456
    sub-02,ses-02,0.445
    

    To read this data into a dataframe, you will either need to preprocess it into another format (e.g. csv) or read it as YAML into a dict and pass that to pandas.DataFrame.

    For example:

    import yaml
    with open("data.txt", "r") as file:
        try:
            # This returns a dict from the given YAML data.
            data = yaml.safe_load(file)
        except yaml.YAMLError as exc:
            print(exc)
    
    print(data)
    # {'sub_ID': ['sub-01', 'sub-02'], 'ses_ID': ['ses-01', 'ses-01'], 'mean': [0.3456, 0.446]}
    

    After that, you can create a DataFrame from this dict:

    df = pd.DataFrame(data)
    df.head()
    
    
    +-----+--------+--------+--------+
    |     | sub_ID | ses_ID |  mean  |
    +-----+--------+--------+--------+
    |   0 | sub-01 | ses-01 | 0.3456 |
    |   1 | sub-02 | ses-02 |  0.446 |
    +-----+--------+--------+--------+
    

    as desired.

    If you have certain entries that are not valid YAML, you will need to preprocess the data before loading it into pandas.