I downloaded a geotiff from here: https://www.nass.usda.gov/Research_and_Science/Crop_Progress_Gridded_Layers/index.php
(file also available: https://drive.google.com/file/d/1XcfEw-CZgVFE2NJytu4B1yBvjWydF-Tm/view?usp=sharing)
Looking at one of the weeks in 2021, I'd like to convert the geotiff to a data frame so I have an associated value with each lat/lon pair in the geotiff.
I tried:
import rioxarray
fl = 'data/cpc2021/corn/cpccorn2021/condition/cornCond21w24.tif'
da = rioxarray.open_rasterio(fl, masked=True)
df = da[0].to_pandas()
df['y'] = df.index
pd.melt(df, id_vars='y')
However, this returns a dataframe with x and y that don't seem to correspond to the lat/lon. How can I add (or retain) this information while converting?
Expect lat/lon points to be in contiguous US
edit: I found a meta file that has the projections: NAD_1983_Contiguous_USA_Albers
which I believe corresponds to EPSG:5070 (also seen later in the same xml file)
I also found the bounding box for lat/lon coordinates:
<GeoBndBox esriExtentType="search">
<exTypeCode Sync="TRUE">1</exTypeCode>
<westBL Sync="TRUE">-127.360895</westBL>
<eastBL Sync="TRUE">-68.589171</eastBL>
<northBL Sync="TRUE">51.723828</northBL>
<southBL Sync="TRUE">23.297865</southBL>
However, still uncertain how to include this information in my quest to convert to dataframe.
Result of print(da)
is:
<xarray.DataArray (band: 1, y: 320, x: 479)>
[153280 values with dtype=float32]
Coordinates:
* band (band) int64 1
* x (x) float64 -2.305e+06 -2.296e+06 ... 1.987e+06 1.996e+06
* y (y) float64 3.181e+06 3.172e+06 ... 3.192e+05 3.102e+05
spatial_ref int64 0
Attributes:
AREA_OR_POINT: Area
RepresentationType: ATHEMATIC
STATISTICS_COVARIANCES: 0.1263692188822515
STATISTICS_MAXIMUM: 4.8569073677063
STATISTICS_MEAN: 3.7031858480518
STATISTICS_MINIMUM: 2.1672348976135
STATISTICS_SKIPFACTORX: 1
STATISTICS_SKIPFACTORY: 1
STATISTICS_STDDEV: 0.35548448472789
scale_factor: 1.0
add_offset: 0.0
Credit to Jose from the GIS community:
import rioxarray
import pandas as pd
da = rioxarray.open_rasterio(fl, masked=True)
da = da.rio.reproject("EPSG:4326")
df = da[0].to_pandas()
df['y'] = df.index
df = pd.melt(df, id_vars='y')