I would imagine what I'm trying to do is fairly simple in pandas but I just can't get it.
Really I want to do this in dataframe-js
(or danfojs
) but any help in either pandas
or dataframe-js
will be helpful.
Essentially:
uuid
.uuid
but some might be missing.uuid
, so using "merge on" or similar with any other column name isn't an option.example dataframes:
let data1 = [
[['col A', 'uuid'], ['1238', '12']],
[['col B', 'uuid'], ['42.4', '12']],
[['col A', 'uuid'], ['1091', '48']],
[['col B', 'uuid'], ['35.1', '48']],
[['col B', 'uuid'], ['44.4', '77']],
]
desired output (column order doesn't matter):
[
['col A', 'uuid', 'col B'],
['1238', '12', '42.4'],
['1091', '48', '35.1'],
[null, '77', '44.4'] // null, undefined, NaN...doesn't matter for the gaps
]
please help :)
Ok I combined @onyambu's answer with the merge
function, which now accepts dataframes of different sizes
# create an initial empty df
t = pd.DataFrame(columns=['uuid'])
# reduce list of dataframes into one
df = reduce(lambda x,y: x.merge(pd.DataFrame(y[1:], columns=y[0]), how='outer'), data1, t)
# squash rows on `uuid` index with stack/unstack
df = df.set_index('uuid').stack().unstack().reset_index()
# output in original "table" format
df2 = np.r_[df.columns.values[None],df.iloc[:].values].tolist()
print(df2)