I have a list of tuples like:
tuple_lst = [('foo', 'bar'), ('bar', 'foo'), ('ping', 'pong'), ('pong', 'ping')]
And I want to create a Dataframe with one column containing each tuple pair, like:
| one col |
| -------- |
| ('foo', 'bar') |
| ('bar', 'foo') |
| ('ping', 'pong') |
| ('pong', 'ping') |
I tried:
df = pd.DataFrame(tuple_lst, columns='one col')
But this throws an error as it's trying to split the tuples into 2 separate columns. I know if I pass a list of 2 column names here, it would produce a dataframe with 2 columns which is not what I want. I guess I could then put these two columns back together into a list of tuples, but this feels like a lot of work to break them up and put them back together, I feel there must be a simpler way to do this? I need the output to be a dataframe not a series so I can add other columns etc later on.
Use a dictionary, this will ensure the DataFrame
constructor doesn't try to interpret the data as 2D:
pd.DataFrame({'one col': tuple_lst})
You could also have used a Series
and converted to_frame
:
pd.Series(tuple_lst).to_frame(name='one col')
Or, closer to your original approach, which could be useful if you have constraints on the format passed to the constructor. Although not as efficient (for small lists):
pd.DataFrame(pd.Series(tuple_lst), columns=['one col'])
Output:
one col
0 (foo, bar)
1 (bar, foo)
2 (ping, pong)
3 (pong, ping)
For small lists pd.DataFrame(pd.Series(tuple_lst), columns=['one col'])
is not as efficient, but for large lists all solutions are equivalent: