I'm using dask package for file handling and i tried to fill the blank values of few particular columns with "NA" string using fillna() function, however the blank values are not filled with "NA".
Here is the code i tried:
import pandas as pd
import dask.dataframe as dd
import numpy as np
read = dd.read_csv("multianno.csv",sep=",",low_memory=False, keep_default_na = False, na_values = np.nan)
omim = pd.read_excel('OMIM.xlsx')
Merge = dd.merge(read, omim, on = "Gene.refGene", how ="left")
Merge.fillna({"ID" : "NA", "genename" : "NA", "phenotype" : "NA"})
I expected the following results:
Chr Start End Ref Alt Gene.refGene ID genename phenotype
chr1 10617 10637 CGCC - NONE;DDX11L17 NA NA NA
chr1 12783 12783 G A DDX11L1 NA NA NA
chr1 13958 13958 C - DDX11L1 NA NA NA
But, instead got thees results where the blank cells still remained empty.
Chr Start End Ref Alt Gene.refGene ID genename phenotype
chr1 10617 10637 CGCC - NONE;DDX11L17 NaN NaN NaN
chr1 12783 12783 G A DDX11L1 NaN NaN NaN
chr1 13958 13958 C - DDX11L1 NaN NaN NaN
It would be of great help, if anyone has a solution on how to fix this issue.
I believe you are missing saving it to a variable
Merge = Merge.fillna({"ID" : "NA", "genename" : "NA", "phenotype" : "NA"})