I am working with a Pandas (version 0.17.1) DataFrame that looks like this:
time type module msg_type content
36636 2016-08-25 17:59:50.051 INFO MOD_1_NAME STATUS Received Status Monitoring from MODULE_1 'Property A' = some_value_1
36637 2016-08-25 17:59:50.051 INFO MOD_1_NAME STATUS Received Status Monitoring from MODULE_1 'Property B' = some_value_2
36638 2016-08-25 17:59:50.051 INFO MOD_1_NAME STATUS Received Status Monitoring from MODULE_1 'Property C' = some_value_3
36639 2016-08-25 17:59:50.051 INFO MOD_1_NAME STATUS Received Status Monitoring from MODULE_1 'Property D' = some_value_4
36715 2016-08-25 17:59:50.964 INFO MOD_2_NAME STATUS Received Status Monitoring from MODULE_2 'Parameter 1' = some_value_a
36716 2016-08-25 17:59:50.964 INFO MOD_2_NAME STATUS Received Status Monitoring from MODULE_2 'Parameter 2' = some_value_b
36717 2016-08-25 17:59:50.964 INFO MOD_2_NAME STATUS Received Status Monitoring from MODULE_2 'Parameter 3' = some_value_c
36718 2016-08-25 17:59:50.964 INFO MOD_2_NAME STATUS Received Status Monitoring from MODULE_2 'Parameter 4' = some_value_d
36719 2016-08-25 17:59:50.964 INFO MOD_2_NAME STATUS Received Status Monitoring from MODULE_2 'Parameter 5' = some_value_e
36720 2016-08-25 17:59:50.964 INFO MOD_2_NAME STATUS Received Status Monitoring from MODULE_2 'Parameter 6' = some_value_f
36721 2016-08-25 17:59:50.964 INFO MOD_2_NAME STATUS Received Status Monitoring from MODULE_2 'Parameter 7' = some_value_g
36722 2016-08-25 17:59:50.964 INFO MOD_2_NAME STATUS Received Status Monitoring from MODULE_2 'Parameter 8' = some_value_h
36723 2016-08-25 17:59:50.964 INFO MOD_2_NAME STATUS Received Status Monitoring from MODULE_2 'Parameter 9' = some_value_i
36724 2016-08-25 17:59:50.964 INFO MOD_2_NAME STATUS Received Status Monitoring from MODULE_2 'Parameter 10' = some_value_j
36725 2016-08-25 17:59:50.964 ERROR MOD_2_NAME STATUS Didn't receive Status Monitoring 'Parameter 11' from MODULE_2!
36726 2016-08-25 17:59:50.964 INFO MOD_2_NAME STATUS Received Status Monitoring from MODULE_2 'Parameter 12' = some_value_k
36727 2016-08-25 17:59:50.964 INFO MOD_2_NAME STATUS Received Status Monitoring from MODULE_2 'Parameter 13' = some_value_l
36785 2016-08-25 18:59:50.051 INFO MOD_1_NAME STATUS Received Status Monitoring from MODULE_1 'Property A' = some_value_1
36786 2016-08-25 18:59:50.051 INFO MOD_1_NAME STATUS Received Status Monitoring from MODULE_1 'Property B' = some_value_2
36787 2016-08-25 18:59:50.051 INFO MOD_1_NAME STATUS Received Status Monitoring from MODULE_1 'Property C' = some_value_3
36788 2016-08-25 18:59:50.051 INFO MOD_1_NAME STATUS Received Status Monitoring from MODULE_1 'Property D' = some_value_4
36827 2016-08-25 19:01:50.964 INFO MOD_2_NAME STATUS Received Status Monitoring from MODULE_2 'Parameter 1' = some_value_a
36828 2016-08-25 19:01:50.964 INFO MOD_2_NAME STATUS Received Status Monitoring from MODULE_2 'Parameter 2' = some_value_b
36829 2016-08-25 19:01:50.964 INFO MOD_2_NAME STATUS Received Status Monitoring from MODULE_2 'Parameter 3' = some_value_c
36830 2016-08-25 19:01:50.964 INFO MOD_2_NAME STATUS Received Status Monitoring from MODULE_2 'Parameter 4' = some_value_d
36831 2016-08-25 19:01:50.964 INFO MOD_2_NAME STATUS Received Status Monitoring from MODULE_2 'Parameter 5' = some_value_e
36832 2016-08-25 19:01:50.964 INFO MOD_2_NAME STATUS Received Status Monitoring from MODULE_2 'Parameter 6' = some_value_f
36833 2016-08-25 19:01:50.964 INFO MOD_2_NAME STATUS Received Status Monitoring from MODULE_2 'Parameter 7' = some_value_g
36834 2016-08-25 19:01:50.964 INFO MOD_2_NAME STATUS Received Status Monitoring from MODULE_2 'Parameter 8' = some_value_h
36835 2016-08-25 19:01:50.964 INFO MOD_2_NAME STATUS Received Status Monitoring from MODULE_2 'Parameter 9' = some_value_i
36836 2016-08-25 19:01:50.964 INFO MOD_2_NAME STATUS Received Status Monitoring from MODULE_2 'Parameter 10' = some_value_j
36837 2016-08-25 19:01:50.964 ERROR MOD_2_NAME STATUS Didn't receive Status Monitoring 'Parameter 11' from MODULE_2!
36838 2016-08-25 19:01:50.964 INFO MOD_2_NAME STATUS Received Status Monitoring from MODULE_2 'Parameter 12' = some_value_k
36839 2016-08-25 19:01:50.964 INFO MOD_2_NAME STATUS Received Status Monitoring from MODULE_2 'Parameter 13' = some_value_l
(The frame has already been reduced to remove rows that are not of interest. That is why the index column has missing numbers)
As you can see there are multiple parameters read from a device at the same time. Each reading is a separate row. I would like to do some "reduction" and "compression" so that each reading is only a single row. I would also like the content
column to be a dictionary so I could easily lookup a particular item of interest. So the result would look like this:
time type module msg_type content
36636 2016-08-25 17:59:50.051 INFO MOD_1_NAME STATUS {'Property A' = 'some_value_1', 'Property B' = 'some_value_2', 'Property C' = 'some_value_3', 'Property D' = 'some_value_4'}
36715 2016-08-25 17:59:50.964 INFO MOD_2_NAME STATUS {'Parameter 1' = 'some_value_a', 'Parameter 2' = 'some_value_b', 'Parameter 3' = 'some_value_c', 'Parameter 4' = 'some_value_d', 'Parameter 5' = 'some_value_e', 'Parameter 6' = 'some_value_f', 'Parameter 7' = 'some_value_g','Parameter 8' = some_value_h, 'Parameter 9' = 'some_value_i', 'Parameter 10' = 'some_value_j', 'Parameter 11' = '', 'Parameter 12' = 'some_value_k', 'Parameter 13' = 'some_value_l'}
36785 2016-08-25 18:59:50.051 INFO MOD_1_NAME STATUS {'Property A' = 'some_value_1', 'Property B' = 'some_value_2', 'Property C' = 'some_value_3', 'Property D' = 'some_value_4'}
36827 2016-08-25 19:01:50.964 INFO MOD_2_NAME STATUS {'Parameter 1' = 'some_value_a', 'Parameter 2' = 'some_value_b', 'Parameter 3' = 'some_value_c', 'Parameter 4' = 'some_value_d', 'Parameter 5' = 'some_value_e', 'Parameter 6' = 'some_value_f', 'Parameter 7' = 'some_value_g','Parameter 8' = some_value_h, 'Parameter 9' = 'some_value_i', 'Parameter 10' = 'some_value_j', 'Parameter 11' = '', 'Parameter 12' = 'some_value_k', 'Parameter 13' = 'some_value_l'}
So basically I would like for all rows with the same value for their time
and module
columns to be "merged" together, with their contents
columns parsed into a dictionary. (There could also be some "missing" or "empty" readings.) I don't want to filter or remove data, just reduce and summarize it.
I am guessing that I need to you some combination of groupby()
, transform()
, and apply()
but I am not sure where to even begin.
Part of my difficulty is that I cannot inspect the result of groupby()
to see if it is doing what I want.
g1 = df.groupby(['module', 'time'])
g1
does not show up in the Spyder variable explorer. print
ing does not show anything. I cannot access attribute index
or call info()
on g1
. But I am having doubts that groupby()
is even worthwhile here... I don't want to eliminate anything.
Been doing some searching to find an example but keep getting what seems like false positives. Any help to get started would be appreciated.
Define a function and use groupby() and then apply():
In [235]: def create_data_dict(rows):
...: return {k:v for k,v in re.findall(r"'([^']*)' = ([^ ]*)", ' '.join(rows.content.astype(str)))}
...:
In [236]: df[df['type'] != 'ERROR'].groupby(['time', 'module', 'msg_type']).apply(create_data_dict).to_frame(name = 'content').reset_index()
Out[236]:
time module msg_type content
0 2016-08-25 17:59:50.051 MOD_1_NAME STATUS {u'Property A': u'some_value_1', u'Property C': u'some_value_3', u'Property B': u'some_value_2', u'Property D': u'some_value_4'}
1 2016-08-25 17:59:50.964 MOD_2_NAME STATUS {u'Parameter 6': u'some_value_f', u'Parameter 7': u'some_value_g', u'Parameter 4': u'some_value_d', u'Parameter 5': u'some_value_e', u'Parameter 2': u'some_value_b', u'Parameter 3': u'some_value_c', u'Parameter 1': u'some_value_a', u'Parameter 8': u'some_value_h', u'Parameter 9': u'some_value_i', u'Parameter 10': u'some_value_j', u'Parameter 12': u'some_value_k', u'Parameter 13': u'some_value_l'}
2 2016-08-25 18:59:50.051 MOD_1_NAME STATUS {u'Property A': u'some_value_1', u'Property C': u'some_value_3', u'Property B': u'some_value_2', u'Property D': u'some_value_4'}
3 2016-08-25 19:01:50.964 MOD_2_NAME STATUS {u'Parameter 6': u'some_value_f', u'Parameter 7': u'some_value_g', u'Parameter 4': u'some_value_d', u'Parameter 5': u'some_value_e', u'Parameter 2': u'some_value_b', u'Parameter 3': u'some_value_c', u'Parameter 1': u'some_value_a', u'Parameter 8': u'some_value_h', u'Parameter 9': u'some_value_i', u'Parameter 10': u'some_value_j', u'Parameter 12': u'some_value_k', u'Parameter 13': u'some_value_l'}