Search code examples
cluster-analysisdata-miningpartitioningdbscanbigdata

spatial partitioning algorithms for 1D dataset?


Area divided into 10000 squares and data set has traffic vol per sq

This is Grid which represents geographical area in 10,000 squares, where each square is 55225 square meter.

The dataset has the traffic volume per square raning from 100 to 1000.

for ex:

square 1 - 100,

square 2 - 500

.

.

Square 10,000 - 800

Now, I want to partition this area in such a way that each partition may have different area but will carry similar amount of traffic, standard deviation of traffic among partitions should be minimum.Any suggestion for the spatial partition algorithm?


Solution

  • There are a few decisions you have to make in order to inform your procedure. The first question that comes to mind is if the number of partitions is defined? The second question is if there are any geometric restrictions on a group, i.e. must they be contiguous, or is any particular shape ideal? The third question is regarding how good is good enough? There is often a huge difference in the run time of an algorithm that provides an ideal answer (perhaps a greedy algorithm) and an algorithm that provides an optimal answer (perhaps an exhaustive or "brute force" approach). You will get a minimum standard deviation by grouping any 2 sectors that have the same volume, as your groups will will each have 0 standard deviation. Any way, this sounds a lot like a expanded bin packing problem and you should probably start your literature review there.

    You need to pack your bins in order...

    Here I selected center points for my circles biased on the highest traffic flow and filled them in from there.

    class trafficNode:
        def __init__(self,v,i):
            self.cluster = None
            self.value = v
            self.index = i
            self.occupied = False
        def occupy(self):
            self.occupied=True
    
    def tryAdd(xList,mList,irow,icol):
        try:
            if not(mList[irow][icol] in xList and !mList[irow][icol].occupied):
                xlist.append(mList[irow][icol])
        except IndexError:
            chill = None
        return(xlist)
    
    class cluster:
        def __init__(self):
            self.nodes = []
        def getTotal(self):
            total = 0
            for k in self.nodes:
                total += k.value
            return(total)
        def addNode(self,n):
            self.nodes.append(n)
        def getNeighbors(self,m,r = 0):
            neighbors = []
            for k in self.nodes:
                i = k.index()
                for k2 in range(0,4):
                    if k2==0:
                        neighbors = tryAdd(neighbors,m,i[0]+0,i[1]+1)
                    elif k2==1:
                        neighbors = tryAdd(neighbors,m,i[0]+1,i[1]+0)
                    elif k2==2:
                        neighbors = tryAdd(neighbors,m,i[0]+0,i[1]-1)
                    elif k2==3:
                        neighbors = tryAdd(neighbors,m,i[0]-1,i[1]+0)
                    if r != 0:
                        if k2==0:
                            neighbors = tryAdd(neighbors,m,i[0]+1,i[1]+1)
                        elif k2==1:
                            neighbors = tryAdd(neighbors,m,i[0]+1,i[1]-1)
                        elif k2==2:
                            neighbors = tryAdd(neighbors,m,i[0]-1,i[1]+1)
                        elif k2==3:
                            neighbors = tryAdd(neighbors,m,i[0]-1,i[1]-1)
            return(neighbors)
        def seed(self,m,irow,icol):
            self.nodes.append(m[irow][icol])
            m[irow][icol].occupy()
        def propogate(self,m,target):
            total = 0
            for k in self.nodes:
                total += k.value
            s = 1
            while total<target:
                s = 1 if !s else 0
                lastTotal=Total
                n = self.getNeighbors(m,s)
                if len(n==0):
                    break;
                else:
                    if(abs(target-(total+sum([k.value for k in n])))<abs(target-total)):
                        for k in n:
                            self.nodes.append(k)
                            m[k.index[0]][k.index[1]].occupy()
                    else:
                        break;
        def contains(self,i):
            ret = False
            for k in self.nodes 
                if k.index == i
                    ret = False
                    break;
            return(ret)
    
    def parseData(d,s): # Where d is the source datafile and s is the number of units per row.
        ret = []
        f = open(d,"r")
        text = f.read()
        lines = f.split("\n")
        n = 0
        r = 0
        temp = []
        for k in lines:
            v = k.split(" - ")[1]
            n+=1
            temp.append(trafficNode(v,(r,n)))
            if n == s:
                n = 0
                r += 1
                ret.append(temp)
                temp = []
        return(ret)
    
    def mapTotal(m):
        return sum([sum([k2.value for k2 in k]) for k in m])
    
    def pTotal(m,n):
        return(mapTotal/n)
    
    import sys
    
    infile = sys.argv[1]
    ncols = sys.argv[2]
    ntowers = sys.argv[3]
    m = parseData(infile,ncols)
    s = pTotal(m,ntowers)
    
    spots = [k.index for k in m if !k.occupied]
    clusters = []
    while len(spots > 0):
        spotVals = [m[k[0]][k[1]].value for k in spots]
        nextSpotIndex = spots[spotVals.index(max(spotVals))]
        clusters.append(cluster)
        clusters[n].seed(self,m,nextSpotIndex[0],nextSpotIndex[1])
        clusters[n].propogate(m,s)
        spots = [k.index for k in m if !k.occupied]
    

    That said I haven't tested it yet... Does your data come as that image or another file?