I am extracting data points from a multi line text file and and trying to add the group data to a single row in the dataframe, but I am getting each data point on its own row I would like to flatten this to 2 rows group1 and group2. I am new to python. Also if there is a more efficient way to do this that would be great too. I have tried groupby()
but this did not seem to work? Thanks in advance.
import pandas as pd
data = """
Jan 2024
Group1 02/02/2024
dog 10 20
cat 21 32
Group2 05/02/2024
dog 23 45
cat 45 65
owl 24 12
monthly
Admin 02 22
clean 05 32
"""
extract = []
dog, cat, owl = [], [], []
for line in data.splitlines():
a = c = e = ''
# print(line)
if 'Group' in line:
group = line.rsplit()[0]
if 'dog' in line or 'cat' in line or 'owl' in line:
if line.startswith("dog"):
dog, a, b = line.split()
elif line.startswith("cat"):
cat, c, d = line.split()
elif line.startswith("owl"):
owl, e, f = line.split()
extract.append({
'group': group,
'dog': a,
'cat': c,
'owl': e
})
df = pd.DataFrame(extract)
df = df[['group', 'dog', 'cat', 'owl']]
print(df)
group dog cat owl
0 Group1 10
1 Group1 21
2 Group2 23
3 Group2 45
4 Group2 24
group dog cat owl
0 Group1 10 21
1 Group2 23 45 24
You can collapse rows before constructing a dataframe. This can be achieved by maintaining one dict
column_name->value
per group, refreshing it when a new group starts and adding it as a row right before that. Don't forget to add a row in the very end.
extract = []
row = None
for line in data.splitlines():
if 'Group' in line:
if row is not None: # we have something to add
extract.append(row)
group = line.rsplit()[0]
row = {'group': group} # new group starts - refreshing our dict
if 'dog' in line or 'cat' in line or 'owl' in line:
animal, val1, val2 = line.split()
row[animal] = val1
if row is not None: # a final group
extract.append(row)
df = pd.DataFrame(extract)
df = df[['group', 'dog', 'cat', 'owl']]
print(df)
Output:
group dog cat owl
0 Group1 10 21 NaN
1 Group2 23 45 24