Search code examples
pythonloopsfor-loopbig-opolygon

How to to remove a double/nested for-loop? String to float transformation in python


I have the following polygon of a geographic area that I fetch via a request in CAP/XML format from an API

The raw data looks like this:

<polygon>22.3243,113.8659 22.3333,113.8691 22.4288,113.8691 22.4316,113.8742 22.4724,113.9478 22.5101,113.9951 22.5099,113.9985 22.508,114.0017 22.5046,114.0051 22.5018,114.0085 22.5007,114.0112 22.5007,114.0125 22.502,114.0166 22.5038,114.0204 22.5066,114.0245 22.5067,114.0281 22.5057,114.0371 22.5051,114.0409 22.5041,114.0453 22.5025,114.0494 22.5023,114.0511 22.5035,114.0549 22.5047,114.0564 22.5059,114.057 22.5104,114.0576 22.512,114.0584 22.5144,114.0608 22.5163,114.0637 22.517,114.0657 22.5172,114.0683 22.5181,114.0717 22.5173,114.0739</polygon>

I store the requested items in a dictionary and then work through them to transform to a GeoJSON list object that is suitable for ingestion into Elasticsearch according to the schema I'm working with. I've removed irrelevant code here for ease of reading.

# fetches and store data in a dictionary
r = requests.get("https://alerts.weather.gov/cap/ny.php?x=0")
xpars = xmltodict.parse(r.text)
json_entry = json.dumps(xpars['feed']['entry'])
dict_entry = json.loads(json_entry)

# transform items if necessary
for entry in dict_entry:

    if entry['cap:polygon']:
        polygon = entry['cap:polygon']
        polygon = polygon.split(" ") 
        coordinates = []
        # take the split list items swap their positions and enclose them in their own arrays
        for p in polygon:
            p = p.split(",")
            p[0], p[1] = float(p[1]), float(p[0]) # swap lon/lat
            coordinates += [p]

        # more code adding fields to new dict object, not relevant to the question

The output of the p in polygon loop looks like:

[ [113.8659, 22.3243], [113.8691, 22.3333], [113.8691, 22.4288], [113.8742, 22.4316], [113.9478, 22.4724], [113.9951, 22.5101], [113.9985, 22.5099], [114.0017, 22.508], [114.0051, 22.5046], [114.0085, 22.5018], [114.0112, 22.5007], [114.0125, 22.5007], [114.0166, 22.502], [114.0204, 22.5038], [114.0245, 22.5066], [114.0281, 22.5067], [114.0371, 22.5057], [114.0409, 22.5051], [114.0453, 22.5041], [114.0494, 22.5025], [114.0511, 22.5023], [114.0549, 22.5035], [114.0564, 22.5047], [114.057, 22.5059], [114.0576, 22.5104], [114.0584, 22.512], [114.0608, 22.5144], [114.0637, 22.5163], [114.0657, 22.517], [114.0683, 22.5172], [114.0717, 22.5181], [114.0739, 22.5173] ]

Is there a way to do this that is better than O(N^2)? Thank you for taking the time to read.


Solution

  • O(KxNxM)

    This process involves three obvious loops. These are:

    1. Checking each entry (K)
    2. Splitting valid entries into points (MxN) and iterating through those points (N)
    3. Splitting those points into respective coordinates (M)

    The amount of letters in a polygon string is ~MxN because there are N points each roughly M letters long, so it iterates through MxN characters.

    Now that we know all of this, let's pinpoint where each occurs.

    ENTRIES (K):
        IF:
            SPLIT (MxN)
            POINTS (N):
                COORDS(M)
    

    So, we can finally conclude that this is O(K(MxN + MxN)) which is just O(KxNxM).