Search code examples
pythonpandasdataframesplit

split pandas datafram based on given row string


I have a text file with a data set of the form

Line 1  
Line 2  
!
1.01499999 0.504999995 6.19969398E-7 5.38933136E-7 1.35450875E-6
1.74000001 0.220000029 7.92876381E-6 4.1831604E-6 6.61433387E-6
2.10750008 0.147500038 2.06282803E-5 9.86384475E-6 9.99511121E-6
2.54500008 0.289999962 1.9321451E-5 9.41255712E-6 1.40418542E-5
3.19000006 0.355000019 2.1970769E-5 1.08561271E-5 1.98473426E-5
4.18249989 0.637500048 1.94816221E-5 1.02610993E-5 2.58007622E-5
6.23999977 1.41999984 1.95048287E-5 1.50751257E-5 2.59193912E-5
8.50250053 0.84250021 -3.13448794E-7 7.45566576E-5 1.58940475E-5 
-------
1.02499998 0.389999986 1.25625479E-6 8.79037373E-7 1.47827166E-6
1.57249999 0.157500029 3.83437873E-6 4.57433907E-6 4.91447827E-6
2.16750002 0.4375 9.80125606E-6 4.12876625E-6 9.34428499E-6
3.00500011 0.399999976 2.13497806E-5 9.10624203E-6 1.67928665E-5
4.25500011 0.850000024 1.35475839E-5 8.33513332E-6 2.37308996E-5
5.84749985 0.742500067 4.97072215E-5 2.55404848E-5 2.50197209E-5
8.25749969 1.66750002 1.56722888E-6 4.85851124E-5 1.65847723E-5 
-------
0.741162002 0.158674002 1.99696819E-6 8.0933961E-7 8.04328749E-7
1.11103797 0.211201996 2.83219379E-6 1.18746482E-6 2.33293395E-6
1.48358989 0.161349952 6.8232016E-6 2.48437755E-6 4.83310259E-6
2.01257992 0.367640018 4.58416616E-6 2.24343057E-6 9.16807585E-6
2.67011499 0.289895058 2.16971075E-5 7.58860506E-6 1.58778421E-5
3.64500499 0.684994936 1.36258495E-5 5.31197475E-6 2.35662919E-5
5.09749985 0.767499924 1.92129683E-5 9.40257451E-6 2.89442723E-5
6.32749987 0.462500095 7.3068717E-5 2.60800098E-5 2.67253181E-5
8.11250019 1.32250023 3.10791838E-5 1.88406557E-5 1.89492275E-5
-------

I want to read it as a pandas dataframe to assign each data set separated by the string ------- so that I can read it in a fashion similar to this:

names = ['column_name1','column_name2','column_name3','column_name4','column_name5'] 
for j in range(5): 
    names.append('cols%i' %j)
    
kwargs = {
        "names":names, 
        "delimiter":' ', 
        "skip_blank_lines":True
        }

skiprows=3
with open('file.txt') as f:
    content = "".join(f.readlines()[skiprows:]) 

dfs = {
    k: pd.read_table(io.StringIO(data), **kwargs)
    for k, data in enumerate(content.split(u'-------')) #dfs.values()[i] will be a separated dataset discriminated by the string `-------`
}

axs1.errorbar(dfs.values()[0].column_name1, dfs.values()[0].column_name2, xerr=dfs.values()[0].column_name3, yerr=dfs.values()[0].column_name4, fmt='.')

The above code returns the error TypeError: 'dict_values' object is not subscriptable.

How can I read the data frame dfs splitting it based on the string ------- in order that I can plot its values in a way similar to that used in the code with ax1.errorbar(), e.g., using dfs.values()[0].column_name1?


Solution

  • After assigning content, you can do something like this:

    dfs = {}
    for i, g in enumerate(content.split("-------")):
        df = pd.read_csv(io.StringIO(g), **kwargs)
        if not df.empty:
            dfs[i] = df
    

    Then this:

    for i, df in dfs.items():
        print(f"{i = }")
        print(df)
        print()
    

    will print:

    i = 0
       column_name1  column_name2  column_name3  column_name4  column_name5  cols0  cols1  cols2  cols3  cols4
    0      1.015000        0.5050  6.199694e-07  5.389331e-07      0.000001    NaN    NaN    NaN    NaN    NaN
    1      1.740000        0.2200  7.928764e-06  4.183160e-06      0.000007    NaN    NaN    NaN    NaN    NaN
    2      2.107500        0.1475  2.062828e-05  9.863845e-06      0.000010    NaN    NaN    NaN    NaN    NaN
    3      2.545000        0.2900  1.932145e-05  9.412557e-06      0.000014    NaN    NaN    NaN    NaN    NaN
    4      3.190000        0.3550  2.197077e-05  1.085613e-05      0.000020    NaN    NaN    NaN    NaN    NaN
    5      4.182500        0.6375  1.948162e-05  1.026110e-05      0.000026    NaN    NaN    NaN    NaN    NaN
    6      6.240000        1.4200  1.950483e-05  1.507513e-05      0.000026    NaN    NaN    NaN    NaN    NaN
    7      8.502501        0.8425 -3.134488e-07  7.455666e-05      0.000016    NaN    NaN    NaN    NaN    NaN
    
    i = 1
       column_name1  column_name2  column_name3  column_name4  column_name5  cols0  cols1  cols2  cols3  cols4
    0        1.0250        0.3900      0.000001  8.790374e-07      0.000001    NaN    NaN    NaN    NaN    NaN
    1        1.5725        0.1575      0.000004  4.574339e-06      0.000005    NaN    NaN    NaN    NaN    NaN
    2        2.1675        0.4375      0.000010  4.128766e-06      0.000009    NaN    NaN    NaN    NaN    NaN
    3        3.0050        0.4000      0.000021  9.106242e-06      0.000017    NaN    NaN    NaN    NaN    NaN
    4        4.2550        0.8500      0.000014  8.335133e-06      0.000024    NaN    NaN    NaN    NaN    NaN
    5        5.8475        0.7425      0.000050  2.554048e-05      0.000025    NaN    NaN    NaN    NaN    NaN
    6        8.2575        1.6675      0.000002  4.858511e-05      0.000017    NaN    NaN    NaN    NaN    NaN
    
    i = 2
       column_name1  column_name2  column_name3  column_name4  column_name5  cols0  cols1  cols2  cols3  cols4
    0      0.741162      0.158674      0.000002  8.093396e-07  8.043287e-07    NaN    NaN    NaN    NaN    NaN
    1      1.111038      0.211202      0.000003  1.187465e-06  2.332934e-06    NaN    NaN    NaN    NaN    NaN
    2      1.483590      0.161350      0.000007  2.484378e-06  4.833103e-06    NaN    NaN    NaN    NaN    NaN
    3      2.012580      0.367640      0.000005  2.243431e-06  9.168076e-06    NaN    NaN    NaN    NaN    NaN
    4      2.670115      0.289895      0.000022  7.588605e-06  1.587784e-05    NaN    NaN    NaN    NaN    NaN
    5      3.645005      0.684995      0.000014  5.311975e-06  2.356629e-05    NaN    NaN    NaN    NaN    NaN
    6      5.097500      0.767500      0.000019  9.402575e-06  2.894427e-05    NaN    NaN    NaN    NaN    NaN
    7      6.327500      0.462500      0.000073  2.608001e-05  2.672532e-05    NaN    NaN    NaN    NaN    NaN
    8      8.112500      1.322500      0.000031  1.884066e-05  1.894923e-05    NaN    NaN    NaN    NaN    NaN