Selecting specific elements and finding their median using numpy

I have the following three data sets.

Basically, i want to create a column where the elements will be the median of the corresponding elements of the second column. The first elements of the second column for each of the sets are (3,7,8) and median=7, second elements of the second column of the data sets are (5,4,3) and median=4 and third elements of the second column of data sets are (6,9,2) and median =6. So I want my output to be a numpy array like [(7,4,6)].

I tried the following approach:

import numpy as np
filelist=[]
for i in range (1,4):
    filelist.append("/Users/Hrihaan/Desktop/A_%s.txt" %i)
for fname in filelist:
    data=np.loadtxt(fname)
    x=data[:,1]
    for j in range (0,3):
        y=np.median(x[j,1]) # tried this method and thought would get the arrays i want (3,7,8) , (5,4,3) and (6,9,2) and their medians
        print(y)

Received the following error : (IndexError: too many indices for array)

Any suggestion would mean a lot.

Solution

Slice the second columns and use np.median along the appropriate axis -

np.median([a[:,1],b[:,1],c[:,1]],axis=0)

Or wrap as an array, then slice and finally use np.median -

np.median(np.asarray([a,b,c])[...,1], axis=0)

Or use np.median, that will take care of conversion to array under the hoods and then slice -

np.median([a,b,c],axis=0)[:,1]

So, if you have arrays as input, go with the first method for efficiency, otherwise the latter two would work just as well with arrays/lists.

Sample run -

In [10]: a = np.array([[2,3],[4,5],[5,6]])
    ...: b = np.array([[5,7],[7,4],[9,9]])
    ...: c = np.array([[1,8],[2,3],[3,2]])
    ...: 

In [11]: np.median([a[:,1],b[:,1],c[:,1]],axis=0)
Out[11]: array([ 7.,  4.,  6.])

To make it work with the posted code in the question :

# Grab filenames
filelist=[]
for i in range (1,4):
    filelist.append("/Users/Hrihaan/Desktop/A_%s.txt" %i)

# Grab second columns off each
data_list = []
for fname in filelist:
    data=np.loadtxt(fname)
    data_list.append(data[:,1])

desired_output = np.median(data_list,axis=0)