Search code examples
pythonalgorithmbing-maps

Decompress Bing Maps GeoData (borders) with Python


I have been trying to decompress the Bing Maps location/border shape algorithm using Python. My end goal is to have custom regions/borders created from combining multiple counties and cities, and save the location data to our database for faster and more accurate location based analysis.

My strategy is as follows, but I'm a little stuck on #2, since I can't seem to accurately decompress the code:

  1. Retrieve County/City borders from Bing Maps GeoData API - They refer to it as "shape"
  2. Decompress their "shape" data to get the latitude and longitude of the border points
  3. Remove the points that have the same lat/lng as other shapes (The goal is to make one large shape of multiple counties, as opposed to 5-6 separate shapes)
  4. Compress the end result and save in the database

The function I am using seems to work for the example of 'vx1vilihnM6hR7mEl2Q' provided in the Point Compression Algorithm documentation. However, when I insert something a little more complex, like Cook County, the formula seems to be off (tested by inserting several of the points into different polygon mapping/drawing applications that also use Bing Maps). It basically creates a line at the south side of Chicago that vigorously goes East and West into Indiana, without much North-South movement. Without knowing what the actual coordinates of any counties are supposed to be, I'm not sure how to figure out where I'm going wrong.

Any help is greatly appreciated, even if it is a suggestion of a different strategy.

Here is the python code (sorry for the overuse of the decimal format - my poor attempt to ensure the error wasn't a result of inadvertently losing precision):

safeCharacters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_-'

def decodeBingBorder(compressedData): latLng = [] pointsArray = [] point = [] lastLat = Decimal(0) lastLng = Decimal(0)

# Assigns the number of of each character based on the respective index of 'safeCharacters'
# numbers under 32 indicate it is the last number of the combination of the point, and a new point is begun
for char in compressedData:
    num = Decimal(safeCharacters.index(char))
    if num < 32:
        point.append(num)
        pointsArray.append(point)
        point = []
    else:
        num -= Decimal(32)
        point.append(num)

# Loops through each point to determine the lat/lng of each point
for pnt in pointsArray:
    result = Decimal(0)

    # This revereses step 7 of the Point Compression Algorithm https://msdn.microsoft.com/en-us/library/jj158958.aspx
    for num in reversed(pnt):
        if result == 0:
            result = num
        else:
            result = result * Decimal(32) + num

    # This was pretty much taken from the Decompression Algorithm (not in Python format) at https://msdn.microsoft.com/en-us/library/dn306801.aspx
    # Determine which diaganal it's on
    diag = Decimal(int(round((math.sqrt(8 * result + 5) - 1) / 2)))

    # submtract the total number of points from lower diagonals, and get the X and Y from what's left over
    latY = Decimal(result - Decimal(diag * (diag + 1) / 2))
    lngX = Decimal(diag - latY)

    # undo the sign encoding
    if latY % 2 == 1:
        latY = (latY + Decimal(1)) * Decimal(-1)
    if lngX % 2 == 1:
        lngX = (lngX + Decimal(1)) * Decimal(-1)
    latY /= 2
    lngX /= 2

    # undo the delta encoding
    lat = latY + lastLat
    lng = lngX + lastLng
    lastLat = lat
    lastLng = lng

    # position the decimal point
    lat /= Decimal(100000)
    lng /= Decimal(100000)

    # append the point to the latLng list in a string format, as opposed to the decimal format
    latLng.append([str(lat), str(lng)])

return latLng

The compressed algorithm:

1440iqu9vJ957r8pB_825syB6rh_gXh1-ntqB56sk2B2nq07Mwvq5f64r0m0Fni11ooE4kkvxEy4wzMuotr_DvsiqvFozvt-Lw9znxH-r5oxLv9yxCwhh7wKnk4sB8o0Rvv56D8snW5n1jBg50K4kplClkpqBpgl9F4h4X_sjMs85Ls6qQi6qvqBr188mBqk-pqIxxsx5EpsjosI-8hgIoygDigU94l_4C

This is the result:

[['41.46986', '-87.79031'], ['41.47033', '-87.52569'], ['41.469145', '-87.23372'], ['41.469395', '-87.03741'], ['41.41014', '-86.7114'], ['41.397545', '-86.64553'], ['41.3691', '-86.47018'], ['41.359585', '-86.41984'], ['41.353585', '-86.9637'], ['41.355725', '-87.43971'], ['41.35561', '-87.52716'], ['41.3555', '-87.55277'], ['41.354625', '-87.63504'], ['41.355635', '-87.54018'], ['41.360745', '-87.40351'], ['41.362315', '-87.29262'], ['41.36214', '-87.43194'], ['41.360915', '-87.44473'], ['41.35598', '-87.58256'], ['41.3551', '-87.59025'], ['41.35245', '-87.59828'], ['41.34782', '-87.60784'], ['41.34506', '-87.61664'], ['41.34267', '-87.6219'], ['41.34232', '-87.62643'], ['41.33809', '-87.63286'], ['41.33646', '-87.63956'], ['41.32985', '-87.65056'], ['41.33069', '-87.65596'], ['41.32965', '-87.65938'], ['41.33063', '-87.6628'], ['41.32924', '-87.66659'], ['41.32851', '-87.71306'], ['41.327105', '-87.75963'], ['41.329515', '-87.64388'], ['41.32698', '-87.73614'], ['41.32876', '-87.61933'], ['41.328275', '-87.6403'], ['41.328765', '-87.63857'], ['41.32866', '-87.63969'], ['41.32862', '-87.70802']]


Solution

  • As mentioned by rbrundritt, storing the data from Big Maps is against the terms of use. However, there are other sources of this same data available, such as http://nationalmap.gov/boundaries.html

    In the interest of solving the problem, and to store this and other coordinate data more efficiently, I solved the problem by removing the 'round' function when calculating 'diag'. This should be what replaces it:

    diag = int((math.sqrt(8 * result + 5) - 1) / 2)
    

    All of the 'Decimal' crap I added is not necessary, so you can remove it if you wish.