Search code examples
pythonpandasnumpygisgeopandas

Convert Regularly Geographic Points into a matrix (keeping the correct order)


Considering the following dataset:

freguesia                       Datetime             C1     latitude    longitude   geometry
0   Parque das Nações   2022-09-07 09:30:00+00:00   72.37   38.753769   -9.095670   POINT (-9.09567 38.75377)
1   Parque das Nações   2022-09-07 09:30:00+00:00   4.65    38.753769   -9.093873   POINT (-9.09387 38.75377)
2   Parque das Nações   2022-09-07 09:30:00+00:00   433.18  38.755170   -9.101060   POINT (-9.10106 38.75517)
3   Parque das Nações   2022-09-07 09:30:00+00:00   274.43  38.755170   -9.099263   POINT (-9.09926 38.75517)
4   Parque das Nações   2022-09-07 09:30:00+00:00   212.09  38.755170   -9.097466   POINT (-9.09747 38.75517)
5   Parque das Nações   2022-09-07 09:30:00+00:00   49.86   38.755170   -9.095670   POINT (-9.09567 38.75517)

You can reproduce this entire dataframe loading the following dict:

{'freguesia': {0: 'Parque das Nações', 1: 'Parque das Nações', 2: 'Parque das Nações', 3: 'Parque das Nações', 4: 'Parque das Nações', 5: 'Parque das Nações', 6: 'Parque das Nações', 7: 'Parque das Nações', 8: 'Parque das Nações', 9: 'Parque das Nações', 10: 'Parque das Nações', 11: 'Parque das Nações', 12: 'Parque das Nações', 13: 'Parque das Nações', 14: 'Parque das Nações'}, 'Datetime': {0: '2022-09-07 09:30:00+00:00', 1: '2022-09-07 09:30:00+00:00', 2: '2022-09-07 09:30:00+00:00', 3: '2022-09-07 09:30:00+00:00', 4: '2022-09-07 09:30:00+00:00', 5: '2022-09-07 09:30:00+00:00', 6: '2022-09-07 09:30:00+00:00', 7: '2022-09-07 09:30:00+00:00', 8: '2022-09-07 09:30:00+00:00', 9: '2022-09-07 09:30:00+00:00', 10: '2022-09-07 09:30:00+00:00', 11: '2022-09-07 09:30:00+00:00', 12: '2022-09-07 09:30:00+00:00', 13: '2022-09-07 09:30:00+00:00', 14: '2022-09-07 09:30:00+00:00'}, 'C1': {0: 72.37, 1: 4.65, 2: 433.18, 3: 274.43, 4: 212.09, 5: 49.86, 6: 3.82, 7: 173.22, 8: 75.16, 9: 506.67, 10: 433.19, 11: 136.86, 12: 2.24, 13: 0.0, 14: 0.0}, 'latitude': {0: 38.7537686909, 1: 38.7537686909, 2: 38.7551697675, 3: 38.7551697675, 4: 38.7551697675, 5: 38.7551697675, 6: 38.7551697675, 7: 38.7565708166, 8: 38.7565708166, 9: 38.7565708166, 10: 38.7565708166, 11: 38.7565708166, 12: 38.7565708166, 13: 38.7565708166, 14: 38.7565708166}, 'longitude': {0: -9.09566985629, 1: -9.09387322572, 2: -9.10105974799, 3: -9.09926311742, 4: -9.09746648685, 5: -9.09566985629, 6: -9.09387322572, 7: -9.10285637856, 8: -9.10105974799, 9: -9.09926311742, 10: -9.09746648685, 11: -9.09566985629, 12: -9.09387322572, 13: -9.09207659515, 14: -9.09027996458}}

If I load into geopandas, I get these points regularly divided:

import geopandas as gpd
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

crs={'init':'epsg:4326'}
gps_data = gpd.GeoDataFrame(
    test, geometry=gpd.points_from_xy(test.longitude, test.latitude, crs=crs))

gps_data.plot()

enter image description here

My question is:

How to convert these points into a matrix, with C1 column values, respecting the points location of the plot ?

Expected output:

[
[173.22, 75.16, 506.67, 433.19, 136.86,   2.24,   0 ,   0],
[0,     433.18, 274.43, 212.09, 49.86,    3.82,   0,    0],
[0,        0,     0,       0,   72.37,    4.65,   0,    0]]

Solution

  • Wow! A pandas problem with a reprex, I am duly impressed.

    The (lat, lng) spacing is quite regular, so the grid is easily regenerated.

    def main():
        df = pd.DataFrame(dataset)
        df["Datetime"] = pd.to_datetime(df.Datetime)
        print(make_grid(df))
    
        crs = {"init": "epsg:4326"}
        gps_data = gpd.GeoDataFrame(
            df, geometry=gpd.points_from_xy(df.longitude, df.latitude, crs=crs)
        )
        gps_data.plot()
        plt.show()
    
    
    def make_grid(df):
        lat_start = df.latitude.min()
        lng_start = df.longitude.min()
        lat_delta = df.latitude.diff().max()
        lng_delta = df.longitude.diff().max()
        i_max = j_max = 0
        d = {}
        for _, row in df.iterrows():
            i = round((row.latitude - lat_start) / lat_delta)
            j = round((row.longitude - lng_start) / lng_delta)
            d[(i, j)] = row.C1
            i_max = max(i, i_max)
            j_max = max(j, j_max)
    
        g = np.zeros((i_max + 1, j_max + 1))
        for (i, j), v in d.items():
            g[i_max - i, j] = v
    
        return np.array(g)
    

    output:

    [[173.22  75.16 506.67 433.19 136.86   2.24   0.     0.  ]
     [  0.   433.18 274.43 212.09  49.86   3.82   0.     0.  ]
     [  0.     0.     0.     0.    72.37   4.65   0.     0.  ]]