How do I format Year-Month data with row = 'year' and columns = 'months'

I'm working with a data set from a url, which I've messily formatted to the following:

url = r"https://psl.noaa.gov/enso/mei/data/meiv2.data"
r = requests.post(url) #read url
s = r.text #url data to string
df = pd.read_csv(io.StringIO(s), sep=',', header=None, quoting=csv.QUOTE_ALL) #make pd dataframe
df2 = df.drop(index=[0, 46, 47, 48, 49]) #drop the unnessicary data
df3 = df2[0].str.split('\s+', expand = True) #reformat to new df
df3.columns = ['Year', 'DJ', 'JF', 'FM', 'MA', 'AM', 'MJ', 'JJ', 'JA', 'AS', 'SO', 'ON', 'ND'] #name the columns

Printing df3 results in the dataframe: that looks like this

This is fine to look at, but I want to create a time-series plot of MEI from 1979-2023 including each month. That is to say, the X-axis from Dec-Jan 1979 thru Nov-Dec 2023 with MEI values on the Y-axis.

So far, I've tried transposing the dataframe, but ran into the same issues when formatting a plot.

Solution

The first thing I would do, to simplify your code, is point out that read_csv() is capable of doing a bunch of the transformations you're doing, just by specifying options. For example, you can split fields by whitespace by including delim_whitespace=True.

import pandas as pd

months = ['DJ', 'JF', 'FM', 'MA', 'AM', 'MJ', 'JJ', 'JA', 'AS', 'SO', 'ON', 'ND']

df = pd.read_csv(
    "https://psl.noaa.gov/enso/mei/data/meiv2.data",
    delim_whitespace=True,
    names=['Year', *months],
    skiprows=1,
    skipfooter=4,
    na_values=[-999],
    engine='python'
)

The next step is to "unpivot" the data. I use a function called DataFrame.melt() to do this. By setting value_vars=months, I am telling it that each value in each row and column should become a new row, labeled by what row (year) and what column (month) it was in.

df = df.melt(id_vars=['Year'], var_name='month', value_vars=months)

This gives us a dataframe with year, seasonal code, and value:

   Year month  value
0  1979    DJ   0.47
1  1980    DJ   0.35
2  1981    DJ  -0.33

Next, I map this month code into an integer to represent the month of the year. I turn DJ into 1, JF into 2, and so on. This step is a little scientifically iffy - it sounds like this data represents a span of time, and those spans of time overlap. You should ask someone who understands the dataset if what I'm doing here makes sense.

df['month'] = df['month'].map({m: i + 1 for i, m in enumerate(months)})

Next, I take the year column and month column and combine it into a single date column. This allows me to use Pandas to plot the data as a time-series.

df['date'] = pd.to_datetime(df['Year'].map(str) + '-' + df['month'].map(str))

This next step is purely aesthetic: I find it nicer to work with sorted dataframes.

df = df.sort_values('date')

Finally, after preparing the data in the correct format, you can make a plot with:

df.plot(x='date', y='value')

Output: