Search code examples
pythonpandasdataframe

How to return one column dataframe or single row dataframe as a dataframe or a series?


Given df,

df = pd.DataFrame({'col1':np.arange(6), 'col2':[*'abcdef']})

   col1 col2
0     0    a
1     1    b
2     2    c
3     3    d
4     4    e
5     5    f

Then when selecting a single column using:

df['col1']
# returns a pd.Series

0    0
1    1
2    2
3    3
4    4
5    5
Name: col1, dtype: int32

Likewise, when selecting a single row:

df.loc[0]
# returns a pd.Series

col1    0
col2    a
Name: 0, dtype: object

How can we force a single column or single row selection to return pd.DataFrame?


Solution

  • Getting a single row or column as a pd.DataFrame or a pd.Series

    There are times you need to pass a dataframe column or a dataframe row as a series and other times you'd like to view that row or column as a dataframe. I am going to show you a few tricks using square brackets, [], and double square brackets, [[]], along with reindex and squeeze.

    df[['col1']]
    # Using double square brackets returns a pd.DataFrame
    
       col1
    0     0
    1     1
    2     2
    3     3
    4     4
    5     5
    
    # Also using pd.DataFrame.reindex we can returns a single column dataframe
    df.reindex(['col1'], axis=1)
    

    Now, lets go the other way from the output:

    # Let's squeeze to get pd.Series from this dataframe
    df.reindex(['col1'], axis=1).squeeze()
    
    0    0
    1    1
    2    2
    3    3
    4    4
    5    5
    Name: col1, dtype: int32
    

    And, likewise with rows:

    df.loc[[0]]
    # Using double square brackets returns a single row dataframe
    
       col1 col2
    0     0    a
    
    # Also using reindex
    df.reindex([0])
    

    Let's squeeze to get pd.Series from this dataframe

    df.reindex([0]).squeeze()
    
    col1    0
    col2    a
    Name: 0, dtype: object
    

    The advantages or using pd.DataFrame.reindex over pd.DataFrame.loc is handling columns or indexes that may or may not be present in your dataframe. Using .loc, you will get a KeyError if the column is not present. However, using reindex, you will not get an Error you results will be all NaN allowing the code to continue executing.

    Using pd.DataFrame.squeeze allows you to convert that single column dataframe to a pd.Series without typing in the column header.