Search code examples
pythonpandasdataframesyntax-errornan

How to detect #N/A in a data frame (data taken from xlsx file) using pandas?


The blank cells with no data can be checked with: if pd.isna(dataframe.loc[index_name, column_name] == True)

but if the cell has #N/A, the above command does not work nor dataframe.loc[index, column_name] == '#N/A'.

On reading that cell, it shows NaN, but the above codes does not work. My main target is to capture the release dates and store it in a list.

Dataframe Code


Solution

  • If you're reading your dataframe tft from a spreadsheet (and it seems to be the case here), you can use the parameter na_values of pandas.read_excel to consider some values (e.g #N/A) as NaN values like below :

    tft= pd.read_excel("path_to_the_file.xlsx", na_values=["#N/A"])
    

    Otherwise, if you want to preserve those #N/A values/strings, you can check/select them like this :

    tft.loc[tft["Release Data"].eq("#N/A")] #will return a dataframe
    

    In the first scenario, your code would be like this :

    rel_date= []
    
    for i in range(len(tft)):
        if pd.isna(tft["Release Date"])
            continue
        else:
            rel_date.append(int(str(tft.loc[i, "Release Date"]).split()[1]))
    

    However, there is no need for the loop here, you can make a list of the release dates with this :

    rel_date= (
                tft["Release Date"]
                    .str.extract(("Release (\d{8})"), expand=False)
                    .dropna()
                    .astype(int)
                    .drop_duplicates()
                    .tolist()
              )
    print(rel_date)
    [20220603, 20220610]