Search code examples
rplotcontourkernel-density

Contour plot: density of sites with latitude and longitude locations


I have data of this format and want to make a contour plot. When I try to use the density(z) I get error message "x must be numeric". I'm not sure how transform my data to make it the correct format to generate the contour. I just want it to be based on the density of points as the two columns represent my long/lat.

z <- c(
  c(8.83,8.89),
  c(8.89,8.94),
  c(8.84,8.9),
  c(8.79,8.852),
  c(8.79,8.88),
  c(8.8,8.82),
  c(8.75,8.78),
  c(8.8,8.8),
  c(8.74,8.81),
  c(8.89,8.99),
  c(8.97,8.97),
  c(9.04,9.08),
  c(9,9.01),
  c(8.99,8.99),
  c(8.93,8.97)
)
z <- matrix(z, ncol = 2, byrow = TRUE)

Solution

  • density() is used for univariate density estimation. Since you have two independent variables: long and lat, you should use kde2d() from R's default package MASS.

    library(MASS)
    fit <- kde2d(z[,1], z[,2])
    contour(fit$x, fit$y, fit$z)
    ## show original data locations
    points(z, pch = 19, col = 4)
    

    contour


    Follow-up

    If you look at ?kde2d:

    Usage:
    
         kde2d(x, y, h, n = 25, lims = c(range(x), range(y)))
    

    The default number of cells along each of x and y are n = 25, which gives you a 25 * 25 grid. Density estimation is done on this grids. Perhaps you are wondering why estimation is done on a regular grid. Because such grid is like pixels of a digital photo. Grid/raster like object is convenient for visualization. Actually, if you want computer to proceed 3D graph, you have to give it a raster like object.

    In practice, you should choose n according to how many data you have. Note that a 25 * 25 grid has 625 cells, this is quite fair when you have 1000 data points. You can also try n = 50. Setting n is very similar to setting number of bins when you produce a histogram. As n increases, your resulting estimation is more jagged. Consider the histogram example if you are unclear:

    x <- rnorm(200)
    hist(x, breaks = 10)
    hist(x, breaks = 20)
    

    Precisely, density estimation is different to histogram; the former is a kernel smoother, while the latter is a primitive bin smoother. But the choice of n (refinement) does has equal effect.