Search code examples
pythondatasetcomparisonnumpyscipy

Matching two grids for data analysis, is there a good algorithm for my problem?


I'd like to compare two differently spaced datasets in python. I always want to find the closest (nearest neighbour) match and regrid the data, see this example:

Dataset A:

ALTITUDE[m]   VALUE
1.            a1
2.            a2
3.            a3
4.            a4

Dataset B:

ALTITUDE[m]   VALUE
0.7           b1
0.9           b2
1.7           b3
2.            b4
2.4           b5
2.9           b6
3.1           b7
3.2           b8
3.9           b9
4.1           b10

ai and bi contain double numbers, but also nan fields.

I'd like to transform dataset B to the altitude grid of dataset A, but since dataset A contains less altitude levels than dataset B, I'd like to average them.

ALTITUDE[m]   VALUE
1.            median(b1,b2)
2.            median(b3,b4,b5)
3.            median(b6,b7,b8)
4.            median(b9,b10)

i.e. the closest altitude levels have been found and averaged over.

Conversely, if I want to match dataset A to the grid of dataset B, dataset A should look like this (nearest neighbour):

ALTITUDE[m]   VALUE
0.7           a1
0.9           a1
1.7           a2
2.            a2
2.4           a2
2.9           a3
3.1           a3
3.2           a3
3.9           a4
4.1           a4

Maybe this even has a name (I imagine it being a common problem), but I don't know it and thus cannot search for it. I believe there is an efficient way of doing this, apart from the obvious solution coding it myself (but I'm afraid it won't be efficient and I'd introduce many bugs).

Preferably using numpy.

EDIT: Thanks for your input to all four contributors. I learned a bit and I apologize for not asking very clearly. I was myself in the progress of understanding the problem. Your answers pointed me towards the usage of interp1d and this answer allowed me to abuse it for me. I will post the result shortly. I can accept only one answer, but anyone would do.


Solution

  • Have a look at numpy.interp:

    http://docs.scipy.org/doc/numpy/reference/generated/numpy.interp.html

    (EDIT: numpy.interp only provides linear interpolation which, evidently, is not what the OP is looking for. Instead use the scipy methods like interp1d using kind='nearest')

    http://docs.scipy.org/doc/scipy/reference/interpolate.html

    What it sounds like you want to do is use the altitude points of one data set to interpolate the values of the other. This can be done pretty easily with either the numpy method or one of the scipy interpolation methods.