Search code examples
sql-serverdata-warehousedimensional-modeling

Dimension for geozones or Lat & Long in data warehouse


I have a DimPlace dimension that has the name of the place (manually entered by the user) and the latitude and longitude of the place (automatically captured). Since the Places are entered manually the same place could be in there multiple time with different names, additionally, two distinct places could be very close to each other.

We want to be able to analyze the MPG between two "places" but we want to group them to make a larger area - i.e. using lat & long put all the various spellings of one location, as well as distinct but very close locations, in one record.

I am planning on making a new dimension for this - something like DimPlaceGeozone. I am looking for a resource to help with loading all the lat & long values mapped to ... something?? Maybe postal code, or city name? Sometimes you can find a script to load common dimensions (like DimTime) - I would love something similar for lat & long values in North America?


Solution

  • I've done something similar in the past... The one stumbling block I hit up front was that 2 locations, straddling a border could be physically closer together than 2 locations that are both in the same area.

    I got around it by creating a "double grid" system that causes each location to fall into 4 areas. That way 2 locations that share at least 1 "area" you know they are within range of each other.

    Here's an example, covering most of the United States...

    IF OBJECT_ID('tempdb..#LatLngAreas', 'U') IS NOT NULL 
    DROP TABLE #LatLngAreas;
    GO
    
    WITH 
        cte_Lat AS (
            SELECT 
                t.n,
                BegLatRange = -37.9 + (t.n / 10.0),
                EndLatRange  = -37.7 + (t.n / 10.0)
            FROM
                dbo.tfn_Tally(1030, 0) t
            ),
        cte_Lng AS (
            SELECT 
                t.n,
                BegLngRange = -159.7 + (t.n / 10.0),
                EndLngRange = -159.5 + (t.n / 10.0)
            FROM
                dbo.tfn_Tally(3050, 0) t
            )
    SELECT 
        Area_ID = ROW_NUMBER() OVER (ORDER BY lat.n, lng.n),
        lat.BegLatRange, 
        lat.EndLatRange, 
        lng.BegLngRange, 
        lng.EndLngRange
        INTO #LatLngAreas
    FROM
        cte_Lat lat
        CROSS JOIN cte_Lng lng;
    
    
    SELECT 
        b3.Branch_ID,
        b3.Name,
        b3.Lat,
        b3.Lng,
        lla.Area_ID
    FROM
        dbo.ContactBranch b3    -- replace with DimPlace
        JOIN #LatLngAreas lla
            ON b3.Lat BETWEEN lla.BegLatRange AND lla.EndLatRange
            AND b3.lng BETWEEN lla.BegLngRange AND lla.EndLngRange;
    

    HTH, Jason