Different results with weighted t test between R and Python

I'm running a weighted t test in Python and am seeing different results. It appears that the issue is the degree of freedom calculation. Would like to understand why I'm seeing different outputs.

Here is some sample code.

In R:

library(weights)
x <- c(373,398,245,272,238,241,134,410,158,125,198,252,577,272,208,260)
y <- c(411,471,320,364,311,390,163,424,228,144,246,371,680,384,279,303)

weightsa = c(rep(1,8), rep(2,8))
weightsb = c(rep(2,8), rep(1,8))

wtd.t.test(x = x,
           y = y,
           weight = weightsa,
           weighty = weightsb, samedata=F)

$test [1] "Two Sample Weighted T-Test (Welch)"

$coefficients
    t.value          df     p.value 
-1.88907197 29.93637837  0.06860382 

$additional Difference     Mean.x     Mean.y   Std. Err   -80.50000 
267.12500  347.62500   42.61352

In Python:

import numpy as np
from statsmodels.stats.weightstats import ttest_ind
x = np.asarray([373,398,245,272,238,241,134,410,158,125,198,252,577,272,208,260])
y = np.asarray([411,471,320,364,311,390,163,424,228,144,246,371,680,384,279,303])

weightsa = [1] * 8 + [2] * 8
weightsb = [2] * 8 + [1] * 8

ttest_ind(x, y, usevar='unequal', weights=(weightsa, weightsb))

(-2.3391969704691085, 0.023733058922455107, 45.90244683439944)

P value is .06 in R, .02 in Python.
The R source code uses the Satterthwaite formula for degrees of freedom:

df <- (((vx/n) + (vy/n2))^2)/((((vx/n)^2)/(n - 1)) + 
      ((vy/n2)^2/(n2 - 1)))

The Python function source code also purports to use this formula:

def dof_satt(self):
        '''degrees of freedom of Satterthwaite for unequal variance
        '''
        d1 = self.d1
        d2 = self.d2
        #this follows blindly the SPSS manual
        #except I use  ``_var`` which has ddof=0
        sem1 = d1._var / (d1.nobs-1)
        sem2 = d2._var / (d2.nobs-1)
        semsum = sem1 + sem2
        z1 = (sem1 / semsum)**2 / (d1.nobs - 1)
        z2 = (sem2 / semsum)**2 / (d2.nobs - 1)
        dof = 1. / (z1 + z2)
        return dof

The numerator here looks the same but the denominator appears quite different.

Solution

The problem you had here is that weights::wtd.t.test() has a (to me, strange) default argument mean1 = TRUE, which governs "whether the weights should be forced to have an average value of 1" (from help("wtd.t.test")).

If we use mean1 = FALSE, we get the same behavior as ttest_ind():

wtd.t.test(x = x,
           y = y,
           weight = weightsa,
           weighty = weightsb,
           samedata = FALSE,
           mean1 = FALSE)

$test
[1] "Two Sample Weighted T-Test (Welch)"

$coefficients
    t.value          df     p.value 
-2.33919697 45.90244683  0.02373306 

$additional
Difference     Mean.x     Mean.y   Std. Err 
 -80.50000  267.12500  347.62500   34.41352