Search code examples
pythonpandasdataframesklearn-pandas

how to union multiple columns from one panda data frame into one series?


I have a data frame that has actually more than 20 columns. The example below give 4 columns. each column has equal number of rows. How to convert to a new dataframe(exmaple shown below) which has only one columns. I will use the new combined dataframe to calculate some metrics. How do I write a neat and efficient code for this? Thank you so much!

enter image description here

data={"col1":[1,2,3,5], "col_2":[6,7,8,9], "col_3":[10,11,12,14], "col_4":[7,8,9,10]}
pd.DataFrame.from_dict(data)

enter image description here


Solution

  • If you start from your dictionary, use itertools.chain:

    data={"col1":[1,2,3,5], "col_2":[6,7,8,9], "col_3":[10,11,12,14], "col_4":[7,8,9,10]}
    
    from itertools import chain
    pd.DataFrame({'col': chain.from_iterable(data.values())})
    

    Else, ravel the underlying numpy array:

    df = pd.DataFrame.from_dict(data)
    pd.Series(df.to_numpy().ravel('F'))
    

    Output:

    0      1
    1      2
    2      3
    3      5
    4      6
    5      7
    6      8
    7      9
    8     10
    9     11
    10    12
    11    14
    12     7
    13     8
    14     9
    15    10
    dtype: int64
    

    Depending on the computation to perform, you might not even need to instantiate a DataFrame/Series and stick to the array:

    a = df.to_numpy().ravel('F')
    

    Output: array([ 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 14, 7, 8, 9, 10])