Search code examples
pythonjsonpython-2.7jythonpolyline

Calculate the midpoint of JSON polylines (Python 2.7.0, no libraries)


Scenario:

I have a system that makes a request to a web service.

  • The web service returns a JSON object.
  • The JSON object contains polyline vertices in an array.

A small snippet from the JSON object would be:

{
  "objectIdFieldName": "OBJECTID",
  "globalIdFieldName": "",
  "geometryType": "esriGeometryPolyline",
  "spatialReference": {
    "wkid": 476,
    "latestWkid": 476
  },
  "fields": [
    {
      "name": "OBJECTID",
      "alias": "OBJECTID",
      "type": "esriFieldTypeOID"
    }
  ],
  "features": [
    {
      "attributes": {
        "OBJECTID": 3311
      },
      "geometry": {
        "paths": [
          [
            [
              675844.1562959617,
              4861766.9811610579
            ],
            [
              675878.30397594348,
              4861792.5977392439
            ],
            [
              675891.38832408097,
              4861800.4024024364
            ],
            [
              675902.17710777745,
              4861804.9933949765
            ],
            [
              675912.27726199664,
              4861808.2070551421
            ],
            [
              675923.52513550781,
              4861810.2730065044
            ],
            [
              675934.77300901897,
              4861811.1911861338
            ],
            [
              675943.03676202707,
              4861811.1911861338
            ],
            [
              675951.07095439639,
              4861810.502546167
            ],
            [
              675961.17111910321,
              4861808.6661449578
            ],
            [
              675970.35304125212,
              4861806.1411667075
            ],
            [
              675981.51595173683,
              4861800.7007851209
            ],
            [
              675998.03647276573,
              4861792.2469376959
            ]
          ]
        ]
      }
    },

**The JSON object has been cut off.**

The full JSON object can be found here: JSON Polylines


Question:

Using the JSON vertices, I would like to calculate the midpoints of the polylines (see green dots below):

  • Some of the lines (OBJECTIDs 3716 and 3385) are multi-part. In this case, the midpoint should only be generated for the longest part of the line (not the other parts).
  • For the purpose of solving this problem, the JSON text could be saved as a text file, and loaded into the python script. In this case, Python's JSON library could be used--despite the catch that is mentioned below.

enter image description here


The output would look like this (the formatting can be different):

OBJECTID  MIDPOINT_X    MIDPOINT_Y
2165      676163.9343   4861476.373
2320      676142.0017   4861959.66
2375      676118.1226   4861730.258
2682      676060.3917   4861904.762
2683      675743.1666   4861724.081
2691      676137.4796   4861557.709
3311      675916.9815   4861809.071
3385      676208.669    4861536.555
3546      676262.2623   4861665.146
3547      676167.5738   4861612.699
3548      676021.3677   4861573.141
3549      675914.4334   4861669.87
3550      675866.6003   4861735.572
3551      675800.1232   4861827.482
3552      675681.9432   4861918.989
3716      675979.6493   4861724.323

The Catch:

This would need to be done in Python 2.7.0 -- since my system uses Jython 2.7.0.

  • It's important to note that I can't import any Python libraries into the Jython implementation in the system that I'm using. So, unfortunately, the script should not import any python libraries (other than the JSON library for testing).

Is it possible to calculate the midpoints of a JSON polylines using Python 2.7.0 (without importing libraries)?


Solution

  • Yes you could easily compute the midpoint of your polylines using only built-ins Python function / without importing external libraries.

    Let's break your requirements :

    • iterate over the feature in the features field of your object
    • be able to compute the length of polylines
    • pick the longest polyline for each feature if there is more than one
    • find the midpoint of this polyline (as suggested in comment, you can progress along your polyline by summing the length of it's segment, until you identify the two points between which is located the midpoint, and calculate it's value using vector math)

    So we will need a few helper function to compute the distance between two points, then the distance of a Polyline and so on :

    # Euclidean distance between two points
    def get_distance(p1, p2):
        return sum([(x-y)**2 for (x,y) in zip(p1, p2)]) ** (0.5)
    
    # Length of a polyline by summing the length of its segments
    def get_distance_line(line):
        total = 0
        for start_index in range(len(line) - 1):
            stop_index = start_index + 1
            total += get_distance(line[start_index], line[stop_index])
        return total
    
    # Get the polyline with the longest distance
    # within a list of polyline
    def get_longest(li):
        return max(li, key=get_distance_line)
    
    # Compute the target point at `_target_dist`
    # of `p1` along the p1-p2 segment
    def _get_pt_at_dist(p1, p2, _target_dist):
        # Define the vector from p1 to p2
        vx = p2[0] - p1[0]
        vy = p2[1] - p1[1]
        # Compute the length of the vector
        lv = (vx ** 2 + vy ** 2) ** 0.5
        # Compute the unit vector (the vector with length 1)
        nv = [vx / lv, vy / lv]
        # Compute the target point
        return [
            p1[0] + nv[0] * _target_dist,
            p1[1] + nv[1] * _target_dist,
        ]
    
    # Get a point at a specific distance on a Polyline
    # - 1st step to find the two points enclosing the `target_dist
    # - 2nd step to calculate the midpoint along the 2 previously selected points
    def get_point_at_distance(line, target_dist):
        sum_dist = 0
        for start_index in range(len(line) - 1):
            stop_index = start_index + 1
            n_dist = get_distance(line[start_index], line[stop_index])
            if sum_dist + n_dist > target_dist:
                # We have found the two enclosing points
                p1, p2 = line[start_index], line[stop_index]
                _target_dist = target_dist - sum_dist
                return _get_pt_at_dist(p1, p2, _target_dist)
            else:
                sum_dist += n_dist
    
        raise ValueError("target distance is greater than the length of the line")
    

    Let's iterate over your data (i name your object dataset) and use theses functions to compute the midpoints :

    result = {}
    
    for ft in dataset['features']:
        paths = ft['geometry']['paths']
    
        # Pick the longest path of default to
        # the only existing one:
        if len(paths) == 1:
            p = paths[0]
        else:
            p = get_longest(paths)
    
        # Compute the distance
        # and the half of the distance
        # for this polyline
        distance_line = get_distance_line(p)
        middle_dist = distance_line / 2
    
        # Compute the midpoint and save it
        # into a `dict` using the `OBJECTID`
        # attribute as a key
        midpoint = get_point_at_distance(p, middle_dist)
        result[ft['attributes']['OBJECTID']] = midpoint
    

    The resulting object :

    {3311: [675916.9814634613, 4861809.071098591],
     3385: [676208.6690235228, 4861536.554984818],
     2165: [676163.9343346333, 4861476.37263185],
     2682: [676060.391694662, 4861904.761846619],
     2683: [675743.1665799635, 4861724.081134027],
     2691: [676137.4796253176, 4861557.709372229],
     2375: [676118.1225925689, 4861730.258496471],
     2320: [676142.0016617056, 4861959.660392438],
     3716: [675979.6112380569, 4861724.356721315],
     3546: [676262.2623328466, 4861665.145686949],
     3547: [676167.5737531717, 4861612.6987658115],
     3548: [676021.3677275265, 4861573.140917723],
     3549: [675914.4334252588, 4861669.870444033],
     3550: [675866.6003329497, 4861735.571798388],
     3551: [675800.1231731868, 4861827.48182595],
     3552: [675681.9432478376, 4861918.988687315]}
    

    Note to the OP: At first I picked the path with the largest number of nodes (instead of the path with the longest distance - using something like def get_longest(li): return max(li, key=len)) and I had a (visual) result much closer to the one provided, so maybe that's why you wanted when saying the longest part of the line but I wasn't sure!