Search code examples
pythonpandas

Turn a list of tuples into pandas dataframe with single column


I have a list of tuples like:

tuple_lst = [('foo', 'bar'), ('bar', 'foo'), ('ping', 'pong'), ('pong', 'ping')]

And I want to create a Dataframe with one column containing each tuple pair, like:

|     one col       |
|     --------      |
|  ('foo', 'bar')   |
|  ('bar', 'foo')   |
|  ('ping', 'pong') |
|  ('pong', 'ping') |

I tried:

df = pd.DataFrame(tuple_lst, columns='one col')

But this throws an error as it's trying to split the tuples into 2 separate columns. I know if I pass a list of 2 column names here, it would produce a dataframe with 2 columns which is not what I want. I guess I could then put these two columns back together into a list of tuples, but this feels like a lot of work to break them up and put them back together, I feel there must be a simpler way to do this? I need the output to be a dataframe not a series so I can add other columns etc later on.


Solution

  • Use a dictionary, this will ensure the DataFrame constructor doesn't try to interpret the data as 2D:

    pd.DataFrame({'one col': tuple_lst})
    

    You could also have used a Series and converted to_frame:

    pd.Series(tuple_lst).to_frame(name='one col')
    

    Or, closer to your original approach, which could be useful if you have constraints on the format passed to the constructor. Although not as efficient (for small lists):

    pd.DataFrame(pd.Series(tuple_lst), columns=['one col'])
    

    Output:

            one col
    0    (foo, bar)
    1    (bar, foo)
    2  (ping, pong)
    3  (pong, ping)
    

    timings

    For small lists pd.DataFrame(pd.Series(tuple_lst), columns=['one col']) is not as efficient, but for large lists all solutions are equivalent:

    enter image description here