I have a data frame that has actually more than 20 columns. The example below give 4 columns. each column has equal number of rows. How to convert to a new dataframe(exmaple shown below) which has only one columns. I will use the new combined dataframe to calculate some metrics. How do I write a neat and efficient code for this? Thank you so much!
data={"col1":[1,2,3,5], "col_2":[6,7,8,9], "col_3":[10,11,12,14], "col_4":[7,8,9,10]}
pd.DataFrame.from_dict(data)
If you start from your dictionary, use itertools.chain
:
data={"col1":[1,2,3,5], "col_2":[6,7,8,9], "col_3":[10,11,12,14], "col_4":[7,8,9,10]}
from itertools import chain
pd.DataFrame({'col': chain.from_iterable(data.values())})
Else, ravel
the underlying numpy array:
df = pd.DataFrame.from_dict(data)
pd.Series(df.to_numpy().ravel('F'))
Output:
0 1
1 2
2 3
3 5
4 6
5 7
6 8
7 9
8 10
9 11
10 12
11 14
12 7
13 8
14 9
15 10
dtype: int64
Depending on the computation to perform, you might not even need to instantiate a DataFrame/Series and stick to the array:
a = df.to_numpy().ravel('F')
Output: array([ 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 14, 7, 8, 9, 10])