I have a dataframe
import pandas as pd
iris=pd.read_csv("https://gist.githubusercontent.com/netj/8836201/raw/6f9306ad21398ea43cba4f7d537619d0e07d5ae3/iris.csv")
iris.tail(5)
iris.head(5)
From iris
dataframe I derived df_setosa
,df_virginica
, and df_versicolor
dataframes
df_setosa = iris[iris['variety'] == 'Setosa']
df_virginica = iris[iris['variety'] == 'Virginica']
df_versicolor = iris[iris['variety'] == 'Versicolor']
# paste the corresponding variety name as the suffix to each dataframe
df_setosa = df_setosa.add_suffix('_setosa')
df_virginica = df_virginica.add_suffix('_virginica')
df_versicolor = df_versicolor.add_suffix('_versicolor')
print(df_virginica.columns)
print(df_versicolor.columns)
print(df_setosa.columns)
print(df_setosa.shape) # 50 row by 5 columns
print(df_versicolor.shape) # 50 rows by 5 columns
print(df_virginica.shape) # 50 rows by 5 columns
Since each dataframe has shape of (50,5)
, I want to concatenate (or as we say in R cbind) the three dataframes.
My attempt:
#### I need help concatenating the three dataframes
concat_df = pd.concat([df_setosa,df_virginica,df_versicolor]) # this returns a lot of NaN
concat_df.shape # this returns a shape of 150 rows by 15 columns instead of 50 rows by 15 columns
The concat_df
should have a 50 rows by 15 columns
shape
Thanks in advance
When you create the "sub" dataframes, reset their indexes, since there's no reason to keep the index of the original iris set in this case
df_setosa = iris[iris['variety'] == 'Setosa'].reset_index(drop=True)
df_virginica = iris[iris['variety'] == 'Virginica'].reset_index(drop=True)
df_versicolor = iris[iris['variety'] == 'Versicolor'].reset_index(drop=True)
Then when you concat, make sure you concat horizontally by setting "axis" argument to 1, like so:
concat_df = pd.concat([df_setosa,df_virginica,df_versicolor], axis=1)
You can also leave the "reset_index" for this last step. If you don't do this the concat will still place 150 rows since it will try to put the indexes from 0 to 149 in order and fill the rest with NaNs