I'm attempting to complete a kernel density estimate of tornadoes in the CONUS by storm mode. I'm trying to recreate some figures from this paper (Smith et al., 2012) which is a KDE of tornado events by convective mode per decade on a 40km grid. Each event is a series of lats and lons marked by the scatter plot from 2003 to 2011. Talking with the authors, their estimates were made using an ARCGIS Density function. Looking at the info sheet for that function, I tried to recreate it, though I was only able to find a gaussian kernel (instead of quartic) and I had to develop my own scale factor as ArcGis automatically outputs events per unit area.
My KDE attempt
I was able to replicate the first figure but following figures appear to have a strange overestimate? There is a large hole in the middle of the density estimate much higher than it should be. Am I perhaps missing something with the bandwidth or the actual estimate? Im not super familiar with these types of statistics. For the scale factor, I calculated the area of each grid cell (1600 km2) and divided by the number of decades (0.9). Is there any reason im missing why the estimate would appear this way?
Here is my second KDE estimate and the following figure.
#Load 40km RAP Grid
f = np.load("/Users/andrewlyons/Downloads/pperf_grid_template.npz")
lon,lat = f['lon'], f['lat']
f.close()
#grid = np.zeros_like(lon)
#NDFD = pj.Proj("+proj=lcc +lat_1=25 +lat_2=25 +lon0=-95 +x_0=0 +y_0=0 +ellps=WGS84 #+datum=WGS84 +units=m +no_defs")
#X,Y = NDFD(lon,lat)
#data_crs= ccrs.LambertConformal(central_longitude=-95.0, central_latitude=25.0)
proj = ccrs.PlateCarree()
fig = plt.figure(figsize=(12, 10))
ax = fig.add_subplot(1, 1, 1, projection=proj)
# the extents of any plotted data
ax.set_extent([-125,-61,22, 49])
states_provinces = cfeature.NaturalEarthFeature(
category='cultural',
name='admin_1_states_provinces_lines',
scale='50m',
facecolor='none')
data = qlcs03
k = kde.gaussian_kde([data['slon'],data['slat']])
#set 40km grid
xi,yi =lon,lat
zi = k(np.vstack([xi.flatten(), yi.flatten()]))
# Make the plot
#Scale factor
zi=(zi*(1600*(10/9)))
c =ax.contourf(xi, yi, zi.reshape(xi.shape),colors='k',levels=[0.5,1,2,3,4,5,6,7,8],alpha=0.17,transform=proj)
cs = ax.contour(xi, yi, zi.reshape(xi.shape),colors='k',levels=[0.5,1,2,3,4,5,6,7,8],linewidths=2.5,transform=proj,zorder=9)
ax.clabel(cs, fontsize=16, inline=False, colors ='r')
ax.scatter(data['slon'],data['slat'],color ='k',marker = '.',s=0.1)
ax.add_feature(cfeature.BORDERS)
ax.add_feature(states_provinces)
ax.add_feature(cfeature.LAND,facecolor ='white')
ax.add_feature(cfeature.COASTLINE,zorder=11)
ax.add_feature(cfeature.OCEAN,facecolor='white',zorder=10)
#plt.colorbar(c)
#plt.colorbar(cs)
plt.title("Convective Mode KDE Test QLCS Tor 2003-2011 events per decade 40km grid N="+str(len(data)))
plt.show()
For anyone stumbling across this, I believe I found the answer. After talking with the authors, I think their original manuscript had an error. The original estimate was multiplied by the wrong scale factor which resulted in a density of events per year not per decade. Thus, my plot is most likely correct. I will close this thread.