I have an input file, now converted to a pandas.dataframe
. The records/rows are in a sequence which contain related data of the form
survey, a, b, c
section, 1, 2, 3
observation, a, b, c
values, 1, 2, 3
values, 4, 5, 6
observation, d, e, f
values, 7, 8, 9
section, 4, 5, 6
...
The survey record only occurs once. A section may occur multiple times and will contain observation and value records. Observations will always be followed by values sometimes as multiple records.
I am trying to reformat this into rows where each set of values is on a separate row with its corresponding survey, section, and observation.
survey, a,b,c, section, 1,2,3, observation, a,b,c, values, 1,2,3
survey, a,b,c, section, 1,2,3, observation, a,b,c, values, 4,5,6
survey, a,b,c, section, 1,2,3, observation, d, e, f, values, 7, 8, 9
survey, a,b,c, section, 4, 5, 6 and so on....
Can this be done with pandas
or should I iterate through an if, then else structure ?
The methods I've tried so far are the following (these are probably simplistic and heading in the wrong directions):
#pd.DataFrame(hmdDataToProcess.unstack())
#hmdDataToProcess.unstack
#hmdDataToProcess.stack
#pd.melt(hmdDataToProcess, id_vars =[0], value_vars =
['SURVEY','SECTION','OBSERV','OBVAL'])
#df2 = hmdDataToProc0ess.pivot_table(index = [0]).reset_index()
#df2 = df_in.pivot_table(index =
#['Example1','Example2'],columns='VC', values=
#['Weight','Rank']).reset_index()
#hmdDataToProcess.groupby('SECTION').groups #, 'OBSERV', 'OBVAL'
You could do it without using Pandas
s = '''survey, a, b, c
section, 1, 2, 3
observation, a, b, c
values, 1, 2, 3
values, 4, 5, 6
observation, d, e, f
values, 7, 8, 9
section, 4, 5, 6'''
list_s = s.strip().split('\n')
list_s = [x.strip() for x in list_s]
list_s
# ['survey, a, b, c', 'section, 1, 2, 3', 'observation, a, b, c', 'values, 1, 2, 3', 'values, 4, 5, 6', 'observation, d, e, f', 'values, 7, 8, 9', 'section, 4, 5, 6']
for el in list_s:
if el.split(',')[0] == 'survey':
survey = el
if el.split(',')[0] == 'section':
section = el
if el.split(',')[0] == 'observation':
observation = el
if el.split(',')[0] == 'values':
print(f"{survey},{section},{observation},{el}")
#survey, a, b, c,section, 1, 2, 3,observation, a, b, c,values, 1, 2, 3
#survey, a, b, c,section, 1, 2, 3,observation, a, b, c,values, 4, 5, 6
#survey, a, b, c,section, 1, 2, 3,observation, d, e, f,values, 7, 8, 9