Hi, I have such array in my .txt file:
n|vechicle.car.characteristics[0].speed|180
n|vechicle.car.characteristics[0].weight|3
c|vechicle.car.characteristics[0].color|black
c|vechicle.car.characteristics[0].fuel|95
n|vechicle.car.characteristics[1].speed|160
n|vechicle.car.characteristics[1].weight|4
c|vechicle.car.characteristics[1].color|green
c|vechicle.car.characteristics[1].fuel|92
n|vechicle.car.characteristics[2].speed|200
n|vechicle.car.characteristics[2].weight|5
c|vechicle.car.characteristics[2].color|white
c|vechicle.car.characteristics[2].fuel|95
And I'd like to parse it into such dataFrame:
speed weight color fuel
0 180 3 black 95
1 160 4 green 92
2 200 5 white 95
That's, how i solved it:
import re
import pandas as pd
df_output_list = {}
df_output_dict = []
match_counter = 1
with open('sample_car.txt',encoding='utf-8') as file:
line = file.readline()
while line:
result = re.split(r'\|',line.rstrip())
result2 = re.findall(r'.(?<=\[)(\d+)(?=\])',result[1])
regex = re.compile('vechicle.car.characteristics.')
match = re.search(regex, result[1])
if match:
if match_counter == 1:
ArrInd = 0
match_counter+=1
#print(df_output_list)
if ArrInd == int(result2[0]):
df_output_list[result[1].split('.')[3]] = result[2]
ArrInd = int(result2[0])
else:
df_output_dict.append(df_output_list)
df_output_list = {}
df_output_list[result[1].split('.')[3]] = result[2]
ArrInd = int(result2[0])
line = file.readline()
df_output_dict.append(df_output_list)
#print(df_output_dict)
df_output = pd.DataFrame(df_output_dict)
print(df_output)
And i found it so complicated. Is it possible to simplify it?
Column names should be parsed automatically.
Read as csv
file with sep='|'
then get last column which contain values and then reshape
in appropriate shape.
>>> columns=['speed','weight','color','fuel']
>>> s = pd.read_csv('filename.txt', sep='|', header=None).iloc[:,-1]
>>> df = pd.DataFrame(s.to_numpy().reshape(-1,4), columns=columns)
>>> df
speed weight color fuel
0 180 3 black 95
1 160 4 green 92
2 200 5 white 95
If you have fix row formate like n|vechicle.car.characteristics[0].speed|180
then we can do this
>>> df = pd.read_csv('d.csv', sep='|', header=None)
>>> columns = df.iloc[:,1].str.split('.').str[-1].unique()
>>> df_out = pd.DataFrame(df.iloc[:,-1].to_numpy().reshape(-1,len(columns)), columns=columns)
>>> df_out
speed weight color fuel
0 180 3 black 95
1 160 4 green 92
2 200 5 white 95