Search code examples
pythonno-data

replace -3.402823466e+38 with NaN or get rid of this value in a dataframe


I'm working with sentinel 2 images that cut with a bbox with QGis. Some of those images aren't completely under the bbox so Qgis put the lowest float value close to 0 as a NoData value. I tried to indicate that i want a specific value as a NoData but it doesn't work with every pixel since I have now the specific value I told him to put -4.444444 but -3.402823466e+38 values aswell.

So I tried to remove it from the dataframe I'm working on (image opened with rasterio), with :

 df_annee.replace(-3.402823466e+38, np.nan, inplace=True)

I tried to put the value of the dataframe in a variable that I'm calling to replace it :

i = df_mois.loc[:,'Valeurs'][0][5]
df_annee.replace(i, np.nan, inplace=True)

I tried to replace it directly from the array with the script i'm using to create my dataframe :

src = (rasterio.open(each_directory))
array = src.read(1)
array[(array<-2)] = np.nan
array[(array > -2.412823466e+38 ) | (array < -1.401823466e+38 )] = np.nan
array[(array == -3.402823466e+38)] = np.nan

Nothing worked. The problem with this value is that I want to average the values of my images to plot it but seaborn plots consider -3.402823466e+38 as -3, my plot end up false and np.mean() or mean(df) doesn't recognize these values as NoData so my average ends up to be really close to 0 even when it's not.

This loop can find all my values but not the -3.402823466e+38 one :

for i in range(len(df_mois.loc[:,"Valeurs"])):
    for y in range(len(df_mois.loc[:,"Valeurs"][0])):
        for x in range(y): 
            #print(df_mois.loc[:,"Valeurs"][i][y][x])
            if df_mois.loc[:,"Valeurs"][i][y][x] < 0.001 and df_mois.loc[:,"Valeurs"][i][y][x] > -0.00001:
                print(df_mois.loc[:,"Valeurs"][i][y][x])
                print('coucou')

Another weird thing is that, my numbers are in float 32 but I have +38 numbers in my dataframe ...

What can I do ?


Solution

  • The reason why this specific value appears is unclear to me. On the other hand, I was able to resolve my problem by looking at when it first started to appears in my images. I didn't noticed it at first because I did several treatments on QGis and it was able to handle it.

    At first I,

    1 - superimposed in Python with otbcli_superimpose
    only my bands at a resolution of 20 with a band at 10 to save time and didn't realised all my images weren't at the same spatial resolution (because I was in degre-EPSG 4326-)

    2- averaged my images with BandMath in QGIS
    The second problem was the NoData values. As for the spatial resolution, Bandmath in QGis was able to handle it (or hide it) but not python.There was a problem but QGis just went for it without telling me there was a problem, so I was able to average my images with Bandmath without noticing anything.

    3- wanted to plot those images with Pandas, seaborn
    that's when I noticed my matrices were weird and that I wasn't able to handle the -3.402823466e+38 value.

    So I did all the process again with :

    1- otbcli_superimpose all my images with one image
    2- gdalwarp -cutline to cut all my images the same way, specifying the nodata value with -dstnodata
    3- gdalmerge.py to merge the images that where not in the same tile
    4- otbcli_BandMath to average them.

    I think in the future I'll not use QGis again if I need to open my images with python afterwards I have no idea how to specify some parameters and QGis can run with it being implicit so that's not a good practice.