Search code examples
pythonarrayscsvreadfile

Read a matrix from a .txt file?


I have a txt file, which can be shown as:

10 1:0.870137474304 2:0.722354071782 3:0.671913562758 
11 1:0.764133072717 2:0.4893616821 3:0.332713609364 
20 1:0.531732713984 2:0.0967819558321 3:0.169802773309 

Then I want to read the file and form a matrix in the form of :

[[10 0.870137474304 0.722354071782 0.671913562758 ]
[11 0.764133072717 0.4893616821   0.332713609364 ]
[20 0.531732713984 0.0967819558321 0.169802773309]]

I know how to split the elements except the first column. How to deal with the first column?

matrix = []

lines = open("test.txt").read().split("\n")  # read all lines into an array
for line in lines:
    array [0] = line.split(" ")[0]
    # Split the line based on spaces and the sub-part on the colon
    array = [float(s.split(":")[1]) for s in line.split(" ")]  

    matrix.append(array)

print(matrix)

Solution

  • You can use regex:

    import re
    data = [map(float, re.findall('(?<=:)[\d\.]+|^\d+', i.strip('\n'))) for i in open('filename.txt')]
    

    Output:

    [[10.0, 0.870137474304, 0.722354071782, 0.671913562758], [11.0, 0.764133072717, 0.4893616821, 0.332713609364], [20.0, 0.531732713984, 0.0967819558321, 0.169802773309]]
    

    Edit: to create a numpy array with data:

    import numpy as np
    import re
    data = [map(float, re.findall('(?<=:)[\d\.]+|^\d+', i.strip('\n'))) for i in open('filename.txt')]
    new_data = np.array(data)
    

    Output:

    array([[ 10.        ,   0.87013747,   0.72235407,   0.67191356],
       [ 11.        ,   0.76413307,   0.48936168,   0.33271361],
       [ 20.        ,   0.53173271,   0.09678196,   0.16980277]])