Search code examples
pythonarraysdata-processing

Reading in data from a text file and storing it in an array in python


I'm trying to read data from a text file line by line and store it in a 2D array so that I can process it further at a later stage.

Every time the string 'EOE' is found I would like to move over to a new row and continue reading in entries line by line from the text file.

I can't seem to be able to declare a 2D string array or read in the values sucessfully. I'm new to python coming from C so my syntax and general python understanding isn't great.

rf = open('data_small.txt', 'r')
lines = rf.readlines()
rf.close()
i = 0
j = 0

line_array = np.array((200, 200))

for line in lines:
    line=line.strip()
    print(line)
    line_array[i][j] = line
    if line == 'EOE':
        i+=1
    j+=1

rf.close()

line_array

The text file looks something like this:

-----
Entry1=50
Entry2=SomeText
Entry3=Instance.Test.ID=67
EOE
-----
Entry1=Processing
Entry2=50.87.78
Entry3=Instance.Test.ID=91
EOE
-----
Entry1=50
Entry2=SomeText
Entry3=Instance.Test.ID=67
EOE
-----

and I would like the array string array to look something like this, the rows and columns can be transposed but the overall idea is that either one row or one column represents an EOE entry:

array = [
['-----', 'Entry1=50', 'Entry2=SomeText', 'Entry3=Instance.Test.ID=67', 'EOE'],
['-----', 'Entry1=Processing', 'Entry2=50.87.78', 'Entry3=Instance.Test.ID=91', 'EOE'],
['-----', 'Entry1=50', 'Entry2=SomeText', 'Entry3=Instance.Test.ID=67', 'EOE']
]

Solution

  • This is one approach.

    Ex:

    res = [[]]
    with open(filename) as infile:
        for line in infile:            #Iterate each line
            line = line.strip()        #strip new line
            if line == 'EOE':          #check for `EOE`
                res.append([])         #Add new sub-list
            else:
                res[-1].append(line)   #Append content to previous sub-list
    
    print(res)
    

    Output:

    [['-----', 'Entry1=50', 'Entry2=SomeText', 'Entry3=Instance.Test.ID=67'],
     ['-----',
      'Entry1=Processing',
      'Entry2=50.87.78',
      'Entry3=Instance.Test.ID=91'],
     ['-----', 'Entry1=50', 'Entry2=SomeText', 'Entry3=Instance.Test.ID=67'],
     ['-----']]