Search code examples
pythonpandasdataframesparse-matrix

Get non empty values of dataframe as a single column


I have a sparse dataframe and would like to get all non empty values as a single column. See the image that I made up to illustrate the problem. I somehow managed to solve it using the python code below. However, I feel there migh be some better | simpler | efficient way to solve it enter image description here

import pandas as pd

list1 = ["x1","x2","?","?","?","?"]
list2 = ["?","?","y1","y2","?","?"]
list3 = ["?","?","?","?","z1","z2"]

df_sparse = pd.DataFrame({"A":list1,"B":list2,"C":list3})
values_vect = []
for col in df_sparse.columns:
  values = [ i for i in list(df_sparse[col]) if i !="?"]
  values_vect.extend(values)
df_sparse["D"] = pd.DataFrame(values_vect,columns=["D"])
display(df_sparse)

Solution

  • df_sparse["D"] = df_sparse.replace("?", np.nan).ffill(axis="columns").iloc[:, -1]
    
    • replace "?"s with NaNs
    • forward fill the values along columns so that non-NaN values will slide to the rightmost positions
    • query the rightmost column, that's where the values are

    to get

    >>> df_sparse
    
        A   B   C   D
    0  x1   ?   ?  x1
    1  x2   ?   ?  x2
    2   ?  y1   ?  y1
    3   ?  y2   ?  y2
    4   ?   ?  z1  z1
    5   ?   ?  z2  z2