Search code examples
pythonpandasdaskfillna

Dask dataframe empty values are not replaced with "NA" string


I'm using dask package for file handling and i tried to fill the blank values of few particular columns with "NA" string using fillna() function, however the blank values are not filled with "NA".

Here is the code i tried:

import pandas as pd
import dask.dataframe as dd
import numpy as np

read = dd.read_csv("multianno.csv",sep=",",low_memory=False, keep_default_na = False, na_values = np.nan)
omim = pd.read_excel('OMIM.xlsx')
Merge = dd.merge(read, omim, on = "Gene.refGene", how ="left")
Merge.fillna({"ID" : "NA", "genename" : "NA", "phenotype" : "NA"})

I expected the following results:

Chr     Start     End     Ref    Alt    Gene.refGene    ID  genename    phenotype   
chr1    10617   10637   CGCC    -   NONE;DDX11L17   NA      NA         NA
chr1    12783   12783   G   A   DDX11L1         NA      NA      NA  
chr1    13958   13958   C   -   DDX11L1         NA      NA      NA  

But, instead got thees results where the blank cells still remained empty.

Chr     Start     End     Ref    Alt    Gene.refGene    ID  genename    phenotype   
chr1    10617   10637   CGCC    -   NONE;DDX11L17   NaN NaN         NaN 
chr1    12783   12783   G   A   DDX11L1         NaN NaN     NaN 
chr1    13958   13958   C   -   DDX11L1         NaN NaN     NaN 

It would be of great help, if anyone has a solution on how to fix this issue.


Solution

  • I believe you are missing saving it to a variable

    Merge = Merge.fillna({"ID" : "NA", "genename" : "NA", "phenotype" : "NA"})