Search code examples
pythonpvlib

ValueError: Big-endian buffer not supported on little-endian compiler


I am modeling a PV array using PVlib and sometimes when I am trying to access the weather forecast data I get the following error:

ValueError: Big-endian buffer not supported on little-endian compiler

I am unsure why it only happens sometimes and not every time I run the code. Below is the code I am running and the last line is the one causing the error. Any help in solving this would be greatly appreciated, thank you!!

# built-in python modules
import datetime
import inspect
import os
import pytz

# scientific python add-ons
import numpy as np
import pandas as pd

# plotting
# first line makes the plots appear in the notebook
%matplotlib inline 
import matplotlib.pyplot as plt
import matplotlib as mpl

#import the pvlib library
from pvlib import solarposition,irradiance,atmosphere,pvsystem
from pvlib.forecast import GFS
from pvlib.modelchain import ModelChain

pd.set_option('display.max_rows', 500)

latitude, longitude, tz = 21.300268, -157.80723, 'Pacific/Honolulu' 

# specify time range.
# start = pd.Timestamp(datetime.date.today(), tz=tz)
pacific = pytz.timezone('Etc/GMT+10')
# print(pacific)
# datetime.datetime(year, month, day, hour, minute, second, microsecond, tzinfo)
start2 = pd.Timestamp(datetime.datetime(2020, 2, 10, 13, 0, 0, 0, pacific))
# print(start)
# print(start2)
# print(datetime.date.today())

end = start2 + pd.Timedelta(days=1.5)

# Define forecast model
fm = GFS()

# get data from location specified above
forecast_data = fm.get_processed_data(latitude, longitude, start2, end)
# print(forecast_data)

Solution

  • I think I have a solution now. For some reason, data from these UNIDATA DCSS queries occasionally return big-endian bytes. This is not compatible with the Pandas Dataframe or Series object as discussed here. I found the function in PVLIB that takes the data from NetCDF4 and creates the Pandas Dataframe. Look inside pvlib then forecast.py and the function is called _netcdf2pandas. I will copy the source code below:

    data_dict = {}
    for key, data in netcdf_data.variables.items():
        # if accounts for possibility of extra variable returned
        if key not in query_variables:
            continue
        squeezed = data[:].squeeze()
        if squeezed.ndim == 1:
            data_dict[key] = squeezed
        elif squeezed.ndim == 2:
            for num, data_level in enumerate(squeezed.T):
                data_dict[key + '_' + str(num)] = data_level
        else:
            raise ValueError('cannot parse ndim > 2')
    
    data = pd.DataFrame(data_dict, index=self.time)
    

    The goal is to squeeze down the NetCDF4 data into individual Pandas Series, save each series to a dictionary, then import all of that to a Data Frame and return. All I did was add a check here that determines if the squeezed series is Big-Endian and converts it to Little-Endian. My revised code is below:

    for key, data in netcdf_data.variables.items():
        # if accounts for possibility of extra variable returned
        if key not in query_variables:
            continue
        squeezed = data[:].squeeze()
    
        # If the data is big endian, swap the byte order to make it little endian
        if squeezed.dtype.byteorder == '>':
            squeezed = squeezed.byteswap().newbyteorder()
    
        if squeezed.ndim == 1:
            data_dict[key] = squeezed
        elif squeezed.ndim == 2:
            for num, data_level in enumerate(squeezed.T):
                data_dict[key + '_' + str(num)] = data_level
        else:
            raise ValueError('cannot parse ndim > 2')
    
    data = pd.DataFrame(data_dict, index=self.time)
    

    I used this Stack Overflow answer to determine the endian-ness of each Series. The SciPy documentation gave me some clues as to what kinds of data these byte orders may be.

    Here is my pull request to pv-lib that fixes the problem for me. I hope this helps. I still don't know why the problem was inconsistent. About 95% of the time, my attempt to get_processed_data would fail. When it worked, I thought I found the fix, and then Pandas would throw the endian error. After implementing the fix to pv-lib, I don't have any more errors from Pandas about big or little endians.