I want to look at the prices of the S&P 500, one of the most commonly followed equity indices (roughly speaking, it tracks the performance of the stocks of the 500 largest American companies).
Using the Python library yfinance for this, which pulls data directly from Yahoo Finance, I did the following:
full_history = yf.Ticker("^GSPC").history(period="max", interval="1d")
This gives me the history of the S&P-500 index dating back to the 30th of December, 1927.
However, I noticed the following: When running
print(full_history.index[full_history["Open"] == 0])
one gets
DatetimeIndex(['1962-01-02', '1962-01-03', '1962-01-04', '1962-01-05',
'1962-01-08', '1962-01-09', '1962-01-10', '1962-01-11',
'1962-01-12', '1962-01-15',
...
'1982-04-05', '1982-04-06', '1982-04-07', '1982-04-08',
'1982-04-12', '1982-04-13', '1982-04-14', '1982-04-15',
'1982-04-16', '1982-04-19'],
dtype='datetime64[ns]', name='Date', length=5075, freq=None)
Indeed, it seems that the opening prices between 1962 and 1982 are all set to 0, i.e. they are missing.
Is this particular to Yahoo Finance, or is there some reason why the S&P-500-opening prices in this time period are unknown?
The content of the data gives a hint: before 1962, all data columns (OHLC) contain the same value. Only in 1962 do HLC start to differ (with open being 0).
This article provides another hint: the first computerized order recording of trades starts to emerge in the early sixties, with the next 'revolution' occurring around 1983. It's purely circumstantial evidence (and I'm not a historian), but there's a clear suggestion of a structural break in the way data was recorded (or even generated!)
Trading back in 1927 was nothing like it is today...