Search code examples
python-3.xpandasstatistics-bootstrapscikits

Python Pandas: bootstrap confidence limits by row rather than entire dataframe


What I am trying to do is to get bootstrap confidence limits by row regardless of the number of rows and make a new dataframe from the output.I currently can do this for the entire dataframe, but not by row. The data I have in my actual program looks similar to what I have below:

    0   1   2
0   1   2   3
1   4   1   4
2   1   2   3
3   4   1   4

I want the new dataframe to look something like this with the lower and upper confidence limits:

    0   1   
0   1   2   
1   1   5.5 
2   1   4.5 
3   1   4.2 

The current generated output looks like this:

     0   1
 0  2.0 2.75

The python 3 code below generates a mock dataframe and generates the bootstrap confidence limits for the entire dataframe. The result is a new dataframe with just 2 values, a upper and a lower confidence limit rather than 4 sets of 2(one for each row).

import pandas as pd
import numpy as np
import scikits.bootstrap as sci

zz = pd.DataFrame([[[1,2],[2,3],[3,6]],[[4,2],[1,4],[4,6]],
               [[1,2],[2,3],[3,6]],[[4,2],[1,4],[4,6]]])
print(zz)

x= zz.dtypes
print(x)

a = pd.DataFrame(np.array(zz.values.tolist())[:, :, 0],zz.index, zz.columns)
print(a)
b = sci.ci(a)
b = pd.DataFrame(b)
b = b.T
print(b)

Thank you for any help.


Solution

  • scikits.bootstrap operates by assuming that data samples are arranged by row, not by column. If you want the opposite behavior, just use the transpose, and a statfunction that doesn't combine columns.

    import pandas as pd
    import numpy as np
    import scikits.bootstrap as sci
    
    zz = pd.DataFrame([[[1,2],[2,3],[3,6]],[[4,2],[1,4],[4,6]],
                   [[1,2],[2,3],[3,6]],[[4,2],[1,4],[4,6]]])
    print(zz)
    
    x= zz.dtypes
    print(x)
    
    a = pd.DataFrame(np.array(zz.values.tolist())[:, :, 0],zz.index, zz.columns)
    print(a)
    b = sci.ci(a.T, statfunction=lambda x: np.average(x, axis=0))
    print(b.T)