Search code examples
pythonnumpyrandomiterationstatistics-bootstrap

Bootstrapping function grinds to a halt, due to python pseudorandom generator?


I am working on a kind of bootstrapping procedure for visual fixation data, and would be helped by the insights of others on this issue I am having. I suspect that either I'm missing something related to the functioning of the random number generator (random.randrange), or it shows my currently novice understanding of numpy array iteration and slicing. Being a psychologist with only hobby-level programming experience, i would not be surprised if it turns out I'm doing this in a really backwards way.

When you want to perform statistical analysis on visual fixation data, you often need to take center-bias into account, which is the bias whereby observers tend to fixate more to the center of an image at first and more randomly in the image later. This bias causes a temporal correlation between fixations, and an ROC-analysis (Receiver Operator Characteristic) performed on such data needs a baseline based on a specific kind of bootstrap method.

In this case, the data resides in a numpy array named original. This array is of shape (22, 800, 15, 2), where the dimensions indicate [observer, image, fixation (x, y)]. So, 15 fixations per observer per image.

In the bootstrap, we generally want to replace each fixation with another fixation that occurs somewhere in the set of all other images and all observers, but at the same time (in this case: the same fixation index, index 2 of original).

I think this means that we have to do the following:

  1. create a new array of the same dimensions as original. This array will be called shuffled.
  2. check if current x or y in original == NaN. If so, do not change this fixation. Otherwise continue;
  3. choose a random fixation from the subset of original that satisfies the following index: [all observers, all images except the current image, current fixation]. Make sure it does not contain NaN, otherwise pick another random fixation until it does not contain NaN;
  4. Set shuffled to the random fixation at the current location in original.

I have a function that takes array original and does what is described above with the slight modification that when only one of the original x, y pair is NaN, it only sets that x or y in the random fixation to np.nan. When I iterate through the loops I saw good results. After iterating through +- 10 loops I was satisfied as all data looked perfect, after which I proceeded to remove the raw_input() breakpoints I had set and let the function process all of the data without interruption. When I did so, I noticed that the function slows down each loop and grinds to a halt when it reaches observer=0 image=48.

My code is as follows:

for obs_index, obs in enumerate(original):
        for img_index, img in enumerate(obs):
            print obs_index, img_index
            for fix_index, fix in enumerate(img):

                # do the following because sometimes only x or y in the original is NaN                    
                rand_fix = (np.nan, np.nan)

                while np.isnan(rand_fix[0]) or np.isnan(rand_fix[1]): 

                    rand_obs = randrange(observers)
                    rand_img = img_index

                    while rand_img == img_index:
                        rand_img = randrange(images)

                    rand_fix = original[rand_obs, rand_img, fix_index]

                # do the following because sometimes only x or y in the original is NaN
                if np.isnan(fix[0]):
                    rand_fix[0] = np.nan
                if np.isnan(fix[1]):
                    rand_fix[1] = np.nan

                shuffled[obs_index, img_index, fix_index] = rand_fix

When this function finishes, shuffled should contain correctly shuffled fixation data for use in ROC-analysis.


Solution

  • SOLVED

    I came up with the following code, that no longer slows down:

        for obs_index, obs in enumerate(original):
            for img_index, img in enumerate(obs):
                for fix_index, fix in enumerate(img):
    
                    x = fix[0]
                    y = fix[1]
                    rand_x = np.nan
                    rand_y = np.nan
    
                    if not(np.isnan(x) or np.isnan(y)):
                        while np.isnan(rand_x) or np.isnan(rand_y): 
    
                            rand_obs = randrange(observers)
                            rand_img = img_index
    
                            while rand_img == img_index:
                                rand_img = randrange(images)
    
                            rand_x = original[rand_obs, rand_img, fix_index, 0]
                            rand_y = original[rand_obs, rand_img, fix_index, 1]
    
                    shuffled[obs_index, img_index, fix_index, 0] = rand_x
                    shuffled[obs_index, img_index, fix_index, 1] = rand_y
    

    I also fixed the way the new fixation was assigned to the location in shuffled, to follow numpy indexing properly.