Search code examples
pythonnumpyvectorizationgoogle-colaboratory

Encountering strange issue in python numpy.mean function when using vectorise


I am writing a function and passing a nested list to it. The goal is to find the mean of the each nested list. For eg. if the below is the input

[[1914.05004882812], [1930.65002441406, 1934.5]]

I want the output to be like the below, mean of each individual list.

[1914.05004883 1932.57501221]

This works for some, but for some values I am getting the nested list as it is. Below is the function and the output. can you please let me know what have I done wrong here? I am learning python and please excuse me if the terminologies I used are incorrect. I am using google colab to test this.

def groupmeancalc(groups):
  print('type of group is ' + str(type(groups)))
  print(groups)
  print('size of group is ' + str(len(groups)))
  if len(groups)==1:
    nparrr=np.array(groups)
    nparr=np.mean(nparrr)
    print('Below are the values')
    print(nparr)
  else:
    nparrr=np.array(groups)
    nparr=np.vectorize(np.mean)(nparrr)
    print('Below are the values')
    print(nparr)

Below are the outputs that are coming as expected.

type of group is <class 'list'>
[[1918.80004882812], [1938.69995117188, 1940.05004882812]]
size of group is 2
Below are the values
[1918.80004883 1939.375     ]


type of group is <class 'list'>
[[5510.2001953125, 5519.9501953125], [5545.10009765625]]
size of group is 2
Below are the values
[5515.07519531 5545.10009766]

Below is where I am not getting the expected output. Here you can see the nested list is not flattened, and wherever 2 or more elements are present, mean is not calculated.

type of group is <class 'list'>
[[1355.59997558594], [1363.0], [1372.05004882812]]
size of group is 3
Below are the values
[[1355.59997559]
 [1363.        ]
 [1372.05004883]]


type of group is <class 'list'>
[[1349.15002441406, 1351.5], [1358.90002441406, 1359.0]]
size of group is 2
Below are the values
[[1349.15002441 1351.5       ]
 [1358.90002441 1359.        ]]

Solution

  • If you need a list as output, I think map will work:

    def map_f(x):
        return list(map(np.mean,x))
    

    Try:

    map_f([[1914.05004882812], [1930.65002441406, 1934.5]])
    [1350.3250122070299, 1358.9500122070299]
    
    map_f([[1349.15002441406, 1351.5], [1358.90002441406, 1359.0]])
    [1350.3250122070299, 1358.9500122070299]
    
    map_f([[1355.59997558594], [1363.0], [1372.05004882812]])
    [1355.59997558594, 1363.0, 1372.05004882812]