Search code examples
pythonpython-xarraygeotiffrasterio

converting geotiff to datarame and preserving lat/lon in columns


I downloaded a geotiff from here: https://www.nass.usda.gov/Research_and_Science/Crop_Progress_Gridded_Layers/index.php

(file also available: https://drive.google.com/file/d/1XcfEw-CZgVFE2NJytu4B1yBvjWydF-Tm/view?usp=sharing)

Looking at one of the weeks in 2021, I'd like to convert the geotiff to a data frame so I have an associated value with each lat/lon pair in the geotiff.

I tried:

import rioxarray
fl = 'data/cpc2021/corn/cpccorn2021/condition/cornCond21w24.tif'
da = rioxarray.open_rasterio(fl, masked=True)
df = da[0].to_pandas()
df['y'] = df.index
pd.melt(df, id_vars='y')

However, this returns a dataframe with x and y that don't seem to correspond to the lat/lon. How can I add (or retain) this information while converting?

Expect lat/lon points to be in contiguous US

edit: I found a meta file that has the projections: NAD_1983_Contiguous_USA_Albers

which I believe corresponds to EPSG:5070 (also seen later in the same xml file)

I also found the bounding box for lat/lon coordinates:

 <GeoBndBox esriExtentType="search">
     <exTypeCode Sync="TRUE">1</exTypeCode>
          <westBL Sync="TRUE">-127.360895</westBL>
          <eastBL Sync="TRUE">-68.589171</eastBL>
          <northBL Sync="TRUE">51.723828</northBL>
          <southBL Sync="TRUE">23.297865</southBL>

However, still uncertain how to include this information in my quest to convert to dataframe.

Result of print(da) is:

<xarray.DataArray (band: 1, y: 320, x: 479)>
[153280 values with dtype=float32]
Coordinates:
  * band         (band) int64 1
  * x            (x) float64 -2.305e+06 -2.296e+06 ... 1.987e+06 1.996e+06
  * y            (y) float64 3.181e+06 3.172e+06 ... 3.192e+05 3.102e+05
    spatial_ref  int64 0
Attributes:
    AREA_OR_POINT:           Area
    RepresentationType:      ATHEMATIC
    STATISTICS_COVARIANCES:  0.1263692188822515
    STATISTICS_MAXIMUM:      4.8569073677063
    STATISTICS_MEAN:         3.7031858480518
    STATISTICS_MINIMUM:      2.1672348976135
    STATISTICS_SKIPFACTORX:  1
    STATISTICS_SKIPFACTORY:  1
    STATISTICS_STDDEV:       0.35548448472789
    scale_factor:            1.0
    add_offset:              0.0

Solution

  • Credit to Jose from the GIS community:

    import rioxarray
    import pandas as pd
    
    da = rioxarray.open_rasterio(fl, masked=True)
    da = da.rio.reproject("EPSG:4326")
    df = da[0].to_pandas()
    df['y'] = df.index
    df = pd.melt(df, id_vars='y')
    

    https://gis.stackexchange.com/questions/443801/add-lat-and-lon-to-dataarray-read-in-by-rioxarray/443810#443810