With data like below, - captures measurements at various close locations
Lat Long val
35.611053 139.628525 -72.82
35.61105336 139.6285236 -78.04
35.61105373 139.6285223 -72.99
35.61105409 139.6285209 -69.04
35.61105445 139.6285195 -65.4
35.61105482 139.6285182 -66.68
35.61105518 139.6285168 -65.82
35.61105555 139.6285155 -64.47
35.61105591 139.6285141 -71.26
35.61105627 139.6285127 -68.36
35.61105664 139.6285114 -74.48
35.611057 139.62851 -74.27
35.61105736 139.62851 -77.97
35.61105773 139.62851 -68.66
35.61105809 139.62851 -70.21
35.61105845 139.62851 -76.05
35.61105882 139.62851 -88.83
35.61105918 139.62851 -73.17
35.61105955 139.62851 -67.63
35.61105991 139.62851 -71.85
35.61106027 139.62851 -77.42
35.61106064 139.62851 -71.08
35.611061 139.62851 -79.27
Need to perform binning operation on this data - that is to get mean of all the values in val
every 0.1x0.1 meters. One approach could be to find the edges ( like NW, SW, NE & SE) and divide it into a set of 0.1x0.1 meter grids and lookup values within each grid and compute average and attribute to the lat/long at the center of the grid so that we have results like below.
Lat Long Mean_val Sample_count
While the proposed approach may be naive, wanted also know if there could be an approach based on pandas
To do that you must convert your latitude,longitude coordinate into an x,y coordinate.
Here I use the utm
module:
x,y,_,_ = utm.from_latlon(latitude, longitude)
After that you can create a new column which represent your x,y coordinate in decimeter :
def apply_fun (raw):
x,y,_,_ = utm.from_latlon(raw['Lat'],raw['Long'])
return str(np.round(x*10))+"|"+str(np.round(y*10))
Then add it to your dataframe :
x = df.apply(lambda row : apply_fun(row),axis=1)
df.insert(3,'Group',x)
and you apply the groupby function :
gdf = df.groupby(['Group']).agg({"Lat":["mean"],"Long":["mean","count"],"val":["mean"]})
gdf = gdf.reset_index().drop(columns=['Group'],level=0)
gdf.columns = [' '.join(col) for col in gdf.columns]
And we are done ! :)
To group data by k1 meters * k2 meters area, just modify this function :
def apply_fun (raw):
x,y,_,_ = utm.from_latlon(raw['Lat'],raw['Long'])
return str(np.round(x/k1))+"|"+str(np.round(y/k2))
As I indicated previously to solve this problem, we have to convert the lat, long into x, y coordinates.
In the previous solution I converted the lat,long to utm coordinates. The utm system is a cartographic projection which divides the earth into 120 areas : 60 north and 60 south. So when we do :
x,y,area_number,NS = utm.from_latlon(raw['Lat'],raw['Long'])
(x,y)
is our position in the (area_number,NS)
area. We can conclude that our solution work if and only if our sensors are in the same UTM area.
We can also do this conversion using the ECEF conversions which directly converts lat,long into x, y coordinates. I do not know the precision of these methods and as we are asked for precision to the tenth of a meter I prefer to choose the utm convertion which look more accurate.
If you want to use the ECEF method done like this :
import pyproj
def gps_to_ecef_pyproj(lat, lon, alt):
ecef = pyproj.Proj(proj='geocent', ellps='WGS84', datum='WGS84')
lla = pyproj.Proj(proj='latlong', ellps='WGS84', datum='WGS84')
x, y, z = pyproj.transform(lla, ecef, lon, lat, alt, radians=False)
return x, y, z
x,y,z = gps_to_ecef_pyproj(raw['Lat'],raw['Long'],0)
(I take the code from here : https://gis.stackexchange.com/questions/230160/converting-wgs84-to-ecef-in-python)