Search code examples
pythonpandasmissing-data

Is there a method to save not missing values in another data frame?


I have a data frame by 20441 rows and 158 columns. in each row, there are a lot of "NA" values. so I want to convert it to sth like this:

If a value is not NA, I save it's row name , column name and value in another data frame. for example my first data frame is :

row and column name c1 c2 c3 c4 c5
r1 NA NA NA 5 6
r2 1 3 NA NA NA
row name c1 c2
r1 c4 5
r1 c5 6
r2 c1 1
r2 c2 3

Solution

  • The answer using df.melt works but is slower than the below code. I used %%timeit to measure the time of each code.

    df.melt takes 2.47 ms ± 93.5 and below code takes 314 µs ± 10.4

    if performance is not important then df.melt is better since it is a one line code.

    row = []
    cols = []
    val = []
    for col in data.columns[1:]: 
        for i,e in enumerate(data[col]): 
            if pd.isna(e) == False:
                row.append(data['row&col name'][i])
                cols.append(col)
                val.append(e)  
    new_data = pd.DataFrame(list(zip(row,cols,val)),columns=['row','col','val'])