Search code examples
pythonpvlib

Computing clear sky radiations for a single day using pvlib (Python)


I am facing problems in being able to plot a single graph that presents the clear sky model for a specific location, along with the global irradiation measured for the same respective day.

I need help, because I've been studying for a relatively short time and I've been stuck on it for two weeks now. I've already researched many sources, tried to make several changes to the code, but I still haven't succeeded.

Basically I collected data provided by a collection station, I checked through pandas which day would be the max ghi day for the location. So, my goal would be to model what a clear sky day would be like, using the Ineichein and Solis models, and compare them with the measured the ghi data for the day.

Basically, either I get stuck in a chain of errors, or the generated graph does not match what would be expected, or simply the measured data is not plotted.

I will leave the entire sequence of my code, as well as the csv file with the data. I would be very grateful if someone could help me.

**csv file:**https://drive.google.com/file/d/1Dfrx1Mj7hcb4p5eRnyZYRTLBRhmESxfn/view?usp=sharing

my code:

import pandas as pd
import pvlib
from pvlib import clearsky, atmosphere, solarposition
from pvlib.location import Location
from pvlib.iotools import read_tmy3
import matplotlib.pyplot as plt
location = Location(latitude=-15.60083, longitude=-47.71306, tz="America/Sao_Paulo", altitude=1023, name="Brasilia station")

data = pd.read_csv('brjan19.csv', index_col=0, sep=';', usecols = ['id', 'year', 'day', 'glo'], low_memory=False)

data = data.rename(columns={'glo': 'ghi'})
data['ghi'] = pd.to_numeric(data['ghi'], errors='coerce')`

# Create the date column by specifying the year (2019), month (January) and day
data['date'] = pd.to_datetime(data['year'].astype(str) + '-01-' + data['day'].astype(str), format='%Y-%m-%d')

# Select data only for the 13th and 14th
data_subset = data[(data['date'].dt.day == 13) | (data['date'].dt.day == 14)]

# Set 'date' as the index
data_subset.set_index('date', inplace=True)

# Resample to 2 minutes and calculate the average
ghi = data_subset['ghi'].resample('2T').mean()

# Get Clear Sky values
cs = location.get_clearsky(ghi.index)

cs = cs.loc[ghi.index]
#print(data_subset)

# Plot
plt.figure(figsize=(10, 4))
cs['ghi'].plot(label='Clear Sky GHI')
ghi.plot(label='Measured GHI')
plt.title('Ineichen, Brasília station GHI (Days 13 and 14)')
plt.ylabel('Irradiance $(W/m^2)$')
plt.xlabel('Time')
plt.legend(loc='lower right')
plt.show()

enter image description here enter image description here

When I got home to analyze more calmly and do tests, I noticed that there was another problem. The highest ghi values ​​recorded for this day exceed the value of 1000 W/m^2, however these values ​​are not being plotted on the graph.

I tried just removing the decimal points to see if that would solve the problem. However, making this change only the measured ghi data was plotted and the clear sky model resulted in 0.0

I tried creating a new separate variable in the dataframe to try to plot the ghi values ​​separately, but I was also unsuccessful. I'll leave a printout of what the expected result would be like for the ghi.

[enter image description here3


Solution

  • To understand what is wrong, try going step by step through the code and examining the value of each variable. By doing this, you can see that the ghi variable is messed up:

    In [5]: ghi
    Out[5]: 
    date
    2019-01-13 00:00:00    208.038157
    2019-01-13 00:02:00           NaN
    2019-01-13 00:04:00           NaN
    2019-01-13 00:06:00           NaN
    2019-01-13 00:08:00           NaN
                              ...    
    2019-01-13 23:52:00           NaN
    2019-01-13 23:54:00           NaN
    2019-01-13 23:56:00           NaN
    2019-01-13 23:58:00           NaN
    2019-01-14 00:00:00    229.454473
    Freq: 2T, Name: ghi, Length: 721, dtype: float64
    

    Why is this? Well, it's because the dates you're using don't have hour/minute values, only daily. You have to read and make use of the minuto column in the data file too. Here is a revision of your code that I think does what you want:

    import pandas as pd
    from pvlib.location import Location
    import matplotlib.pyplot as plt
    
    location = Location(latitude=-15.60083, longitude=-47.71306, tz="America/Sao_Paulo", altitude=1023, name="Brasilia station")
    data = pd.read_csv(r"C:\Users\ksande\Downloads\brjan19.csv", sep=';')
    data = data.rename(columns={'glo': 'ghi'})
    # GHI values are floats, but the same symbol is used for the decimal place and
    # the thousands marker, e.g. "1.262.000" == 1262.000
    has_multiple_markers = data['ghi'].str.count("\.") > 1
    data.loc[has_multiple_markers, 'ghi'] = data.loc[has_multiple_markers, 'ghi'].str.replace(".", "", n=1)
    data['ghi'] = pd.to_numeric(data['ghi'])
    
    data['date'] = pd.to_datetime(data['year'].astype(str) + '-01-' + data['day'].astype(str), format='%Y-%m-%d')
    data['date'] = data['date'] + pd.to_timedelta(data['minuto'], unit='min')
    
    data_subset = data[(data['date'].dt.day == 13) | (data['date'].dt.day == 14)]
    data_subset.set_index('date', inplace=True)
    ghi = data_subset['ghi']
    cs = location.get_clearsky(ghi.index)
    
    
    # Plot
    plt.figure(figsize=(10, 4))
    cs['ghi'].plot(label='Clear Sky GHI')
    ghi.plot(label='Measured GHI')
    plt.title('Ineichen, Brasília station GHI (Days 13 and 14)')
    plt.ylabel('Irradiance $(W/m^2)$')
    plt.xlabel('Time')
    plt.legend(loc='lower right')
    plt.show()
    

    enter image description here