I am using the statsmodels.stats.multitest.multipletests
function
to correct p-values
I have stored in a dataframe:
p_value_df = pd.DataFrame({"id": [123456, 456789], "p-value": [0.098, 0.05]})
for _, row in p_value_df.iterrows():
p_value = row["p-value"]
print(p_value)
results = multi.multipletests(
p_value,
alpha=0.05,
method="bonferroni",
maxiter=1,
is_sorted=False,
returnsorted=False,
)
print(results)
I would really like to add each of the elements of the tuple
output as a new column in the p_value_df
and am a bit stuck.
I've attempted to convert the results to a list and use zip(*tuples_converted_to_list)
but as some of the values are floats
this throws an error.
Additionally, I'd like to pull the array
elements so that array([False])
is just False
.
Can anyone make any recommendations on a strategy to do this?
I would use a listcomp to make a nested list of the multitests, then pass it to the DataFrame
constructor and finally join
it with the original p_value_df
:
import numpy as np
import statsmodels.stats.multitest as multi
def fn(pval):
return multi.multipletests(
pval, alpha=0.05, method="bonferroni",
maxiter=1, is_sorted=False, returnsorted=False,
)
l = [
[e[0] if isinstance(e, np.ndarray) and e.size == 1 else e
for e in fn(pval)] for pval in p_value_df["p-value"]
]
cols = ["reject", "pvals_corrected", "alphacSidak", "alphacBonf"]
out = p_value_df.join(pd.DataFrame(l, columns=cols))
Output :
print(out)
id p-value reject pvals_corrected alphacSidak alphacBonf
0 123456 0.098 False 0.098 0.05 0.05
1 456789 0.050 True 0.050 0.05 0.05