I have a txt file, which can be shown as:
10 1:0.870137474304 2:0.722354071782 3:0.671913562758
11 1:0.764133072717 2:0.4893616821 3:0.332713609364
20 1:0.531732713984 2:0.0967819558321 3:0.169802773309
Then I want to read the file and form a matrix in the form of :
[[10 0.870137474304 0.722354071782 0.671913562758 ]
[11 0.764133072717 0.4893616821 0.332713609364 ]
[20 0.531732713984 0.0967819558321 0.169802773309]]
I know how to split the elements except the first column. How to deal with the first column?
matrix = []
lines = open("test.txt").read().split("\n") # read all lines into an array
for line in lines:
array [0] = line.split(" ")[0]
# Split the line based on spaces and the sub-part on the colon
array = [float(s.split(":")[1]) for s in line.split(" ")]
matrix.append(array)
print(matrix)
You can use regex:
import re
data = [map(float, re.findall('(?<=:)[\d\.]+|^\d+', i.strip('\n'))) for i in open('filename.txt')]
Output:
[[10.0, 0.870137474304, 0.722354071782, 0.671913562758], [11.0, 0.764133072717, 0.4893616821, 0.332713609364], [20.0, 0.531732713984, 0.0967819558321, 0.169802773309]]
Edit: to create a numpy
array with data
:
import numpy as np
import re
data = [map(float, re.findall('(?<=:)[\d\.]+|^\d+', i.strip('\n'))) for i in open('filename.txt')]
new_data = np.array(data)
Output:
array([[ 10. , 0.87013747, 0.72235407, 0.67191356],
[ 11. , 0.76413307, 0.48936168, 0.33271361],
[ 20. , 0.53173271, 0.09678196, 0.16980277]])