Search code examples
pythonpandasnumpycsvanalysis

How to subdivide a textfile into several separate arrays with numpy


I have a text file that I would like to subdivide into 3 separate text files based on the value in one of the rows. If LineID is 1 I want to move all rows with that LineID to a separate array or even a separate text file.

Text file output:

Num  LineID  ColA  ColB ColC
1 1 7 3.5 89.9
1 2 6.8 3.1 90.02
1 3 7.5 2.9 90
2 1 7.2 3.2 92
2 2 7.1 3.1 89.8
2 3 6.9 2.87 88
3 1 7.3 2.9 90
3 2 7.03 3.04 90
3 3 7.2 3 89.6

Which I would like to separate into three separate arrays or text files based on LineID value.

First array for LineID = 1

Num  LineID  ColA  ColB ColC
1 1 7 3.5 89.9
2 1 7.2 3.2 92
3 1 7.3 2.9 90

Second array for LineID=2

Num  LineID  ColA  ColB ColC
1 2 6.8 3.1 90.02
2 2 7.1 3.1 89.8
3 2 7.03 3.0 4 90

Third array for LineID=3

Num  LineID  ColA  ColB ColC
1 3 7.5 2.9 90
2 3 6.9 2.87 88
3 3 7.2 3 89.6

Has anyone got any pointers for how to do this in python or with Numpy/Pandas?

Ivan offered a good solution, I haven't checked them all yet, it just adds an extra number to the start of each line which corresponds to that line's original position in the original array/text file. I have tried it with both ',' and ' ' separated csv and space separated txt files and it comes out the same way.

   Num  LineID  ColA  ColB  CoLC
0    1       1   7.0   3.5  89.9
3    2       1   7.2   3.2  92.0
6    3       1   7.3   2.9  90.0
   Num  LineID  ColA  ColB   CoLC
1    1       2  6.80  3.10  90.02
4    2       2  7.10  3.10  89.80
7    3       2  7.03  3.04  90.00
   Num  LineID  ColA  ColB  CoLC
2    1       3   7.5  2.90  90.0
5    2       3   6.9  2.87  88.0
8    3       3   7.2  3.00  89.6

Solution

  • This should help, id1, id2 and id3 have what you need, you can write a file later with each of them.

    import pandas as pd
    
    data = pd.read_csv('textfile.txt', sep=" ")
    id1 = data[data['LineID'] == 1]
    id2 = data[data['LineID'] == 2]
    id3 = data[data['LineID'] == 3]
    
    print(id1)
    print(id2)
    print(id3)