Friday, June 24, 2011

Empirical confidence intervals, redux

Edit:
Evidently I've re-invented the wheel (yet again!).




There's nothing quite like spending an afternoon trying to get R to Plot What I Mean. In hopes of getting a paper off my desk and out the door, I got to revisit the overplotting conundrum previously presented. I wanted to plot this:

but allow the viewer some concept of the relative density of points near 0% error . There's way too much overplotting here to tell by eye. With only a little reflection, it was obvious solution I offered before was jingus, chiefly because of the sparsity the data (and their non-rectangular hull) prevented the 2D kernel estimate from working very well. You can see this in action if you a command that almost Plots What I Mean--smoothScatter, which uses a 2D kernel estimate to plot most of the data, then overplots outliers:


A kernel density estimator is too fancy for this problem. All that you really need to do is divide the abscissa into a grid that has enough points at each slice that taking the pth percentile of all the points in the slice gives you a decent estimate of the true value of the percentile there. This is easy to do with quantcut (make the slices with equal number of observations) or cut2 (make the slices with at least n observations). Then you can connect the pth percentile at each slice with linear or loess interpolation. I even cooked up a panel function that you can drop straight into your favorite lattice xyplot:

> xyplot(y~x, data=comp3, cex=.5, xlim=c(-5, 1e3), ylab="Percentage error", xlab=expression(lambda), ylim=c(-100, 100), panel=panel.confbars, npt=15, conf=.05)
Make npt larger to get a denser interpolation grid, or smaller to get smoother interpolation. conf gives the 100-2*conf % confidence interval you wish to bound, so the default is a 95% confidence interval.

panel.confbars = function(x, y, npt=25, conf=.025, ...){
require(gtools)
panel.xyplot(x, y, ...)

xcut = quantcut(x, q=seq(0, 1, length.out=npt))
xval = quantile(x, p=seq(0, 1, length.out=npt))
yval = matrix(unlist(tapply(y, xcut, quantile, p=c(conf, 1-conf))), nrow=2)

lconf = approxfun(xval[-1], yval[1,])
uconf = approxfun(xval[-1], yval[2,])

panel.loess(xval, lconf(xval), type="l", col="red", lwd=2)
panel.loess(xval, uconf(xval), type="l", col="red", lwd=2)

}

No comments:

Post a Comment