python python-3.x numpy histogram valueerror

NumPy Histogram - ValueError range parameter must be finite - input array is okay

I'm struggling to understand this error, since I'll give you an example that's working and the one I'm interested in that's not.

I have to analyse a set of data with hourly prices for an entire year in it, called sys_prices, which - after various transformations - is a numpy.ndarray object with 8785 rows (1 column), and every row is a numpy.ndarray item with only one element, a numpy.float64 number.

The code not working is the following:

stop_day = 95
start_day = stop_day - 10 # 10 days before
stop_day = (stop_day-1)*24
start_day = (start_day-1)*24

pcs=[] # list of prices to analyse
for ii in range(start_day, stop_day):
    pcs.append(sys_prices[ii][0])

p, x = np.histogram(pcs, bins='fd')

The *24 part is to tune the index within the dataset so that to respect the hourly resolution.

What I expect is to supply the list pcs to the histogram method, so that to get the values of my histogram and bin edges into p and x, respectively.

I say that I expect this because the following code works:

start_day = 1 
start_month = 1 
start_year = 2016 
stop_day = 1
stop_month = 2 
stop_year = 2016
num_prices = (date(stop_year, stop_month, stop_day) - date(start_year, start_month, start_day)).days*24

jan_prices = []
for ii in range(num_prices):
    jan_prices.append(sys_prices[ii][0])

p, x = np.histogram(jan_prices, bins='fd') # bin the data`

The difference in the codes is that the working one is analyzing only 10 days within an arbitrary period starting backwards from a chosen day of the year, while the working example uses all the prices in the month of January (eg. the first 744 values of the dataset).

Strange(r) thing: I used different values for stop_day, and it seems that 95 raises the error, while 99 or 100 or 200 don't.

Could you help me?

Solution

The problem occurs because, by default, histogram uses min(pcs) and max(pcs) to determine the minimum and maximum range of the bins but since you have nans in your dataset the min and max becomes nans. You can fix this by using np.nanmin and np.nanmax for the range parameters.

p, x = np.histogram(pcs, range=(np.nanmin(pcs), np.nanmax(pcs)) bins='fd')

I think this is better than accepted answer since it does not require modifying of pcs.