Search code examples
pythonpandasnumpystata

What is the Python equivalent of Stata's mkspline?


In Stata, mkspline automatically creates variables containing a linear spline given a series of knot point values...

mkspline knot1 30 knot2 40 knot3 50 knot4 = v1

Here is the result of running this on a series of values in Stata. It basically distributes the value over the spline knots. Sorry I don't know the technical math or statistical term for this, just the concept overall.

v1  knot1  knot2  knot3  knot4
10     10      0      0      0
20     20      0      0      0
30     30      0      0      0
40     30     10      0      0
50     30     10     10      0
60     30     10     10     10
70     30     10     10     20
80     30     10     10     30
90     30     10     10     40
100    30     10     10     50

Is there an equivalent to this in Python with Numpy or Pandas or similar?


Solution

  • I don't think there is a function for that.


    Try with numpy:

    thresh = [0,30,40,50]
    diffs = np.maximum(df[['v1']].to_numpy() - thresh,0)
    diffs[:,:-1] = np.minimum(diffs[:,:-1], [np.diff(thresh)])
    

    Output:

    array([[10,  0,  0,  0],
           [20,  0,  0,  0],
           [30,  0,  0,  0],
           [30, 10,  0,  0],
           [30, 10, 10,  0],
           [30, 10, 10, 10],
           [30, 10, 10, 20],
           [30, 10, 10, 30],
           [30, 10, 10, 40],
           [30, 10, 10, 50]])