I am working on a kind of bootstrapping procedure for visual fixation data, and would be helped by the insights of others on this issue I am having. I suspect that either I'm missing something related to the functioning of the random number generator (random.randrange), or it shows my currently novice understanding of numpy array iteration and slicing. Being a psychologist with only hobby-level programming experience, i would not be surprised if it turns out I'm doing this in a really backwards way.
When you want to perform statistical analysis on visual fixation data, you often need to take center-bias into account, which is the bias whereby observers tend to fixate more to the center of an image at first and more randomly in the image later. This bias causes a temporal correlation between fixations, and an ROC-analysis (Receiver Operator Characteristic) performed on such data needs a baseline based on a specific kind of bootstrap method.
In this case, the data resides in a numpy array named original. This array is of shape (22, 800, 15, 2), where the dimensions indicate [observer, image, fixation (x, y)]. So, 15 fixations per observer per image.
In the bootstrap, we generally want to replace each fixation with another fixation that occurs somewhere in the set of all other images and all observers, but at the same time (in this case: the same fixation index, index 2 of original).
I think this means that we have to do the following:
I have a function that takes array original and does what is described above with the slight modification that when only one of the original x, y pair is NaN, it only sets that x or y in the random fixation to np.nan. When I iterate through the loops I saw good results. After iterating through +- 10 loops I was satisfied as all data looked perfect, after which I proceeded to remove the raw_input() breakpoints I had set and let the function process all of the data without interruption. When I did so, I noticed that the function slows down each loop and grinds to a halt when it reaches observer=0 image=48.
My code is as follows:
for obs_index, obs in enumerate(original):
for img_index, img in enumerate(obs):
print obs_index, img_index
for fix_index, fix in enumerate(img):
# do the following because sometimes only x or y in the original is NaN
rand_fix = (np.nan, np.nan)
while np.isnan(rand_fix[0]) or np.isnan(rand_fix[1]):
rand_obs = randrange(observers)
rand_img = img_index
while rand_img == img_index:
rand_img = randrange(images)
rand_fix = original[rand_obs, rand_img, fix_index]
# do the following because sometimes only x or y in the original is NaN
if np.isnan(fix[0]):
rand_fix[0] = np.nan
if np.isnan(fix[1]):
rand_fix[1] = np.nan
shuffled[obs_index, img_index, fix_index] = rand_fix
When this function finishes, shuffled should contain correctly shuffled fixation data for use in ROC-analysis.
SOLVED
I came up with the following code, that no longer slows down:
for obs_index, obs in enumerate(original):
for img_index, img in enumerate(obs):
for fix_index, fix in enumerate(img):
x = fix[0]
y = fix[1]
rand_x = np.nan
rand_y = np.nan
if not(np.isnan(x) or np.isnan(y)):
while np.isnan(rand_x) or np.isnan(rand_y):
rand_obs = randrange(observers)
rand_img = img_index
while rand_img == img_index:
rand_img = randrange(images)
rand_x = original[rand_obs, rand_img, fix_index, 0]
rand_y = original[rand_obs, rand_img, fix_index, 1]
shuffled[obs_index, img_index, fix_index, 0] = rand_x
shuffled[obs_index, img_index, fix_index, 1] = rand_y
I also fixed the way the new fixation was assigned to the location in shuffled, to follow numpy indexing properly.