Comments on Random Fluctuations: Bayesian analysis: Comparing algorithms Part 1?

If you're trying to learn Bayesian data analys...

2011-10-06T21:52:16.749-04:00

If you're trying to learn Bayesian data analysis using MCMC methods, I can happily recommend a book. (I happen to be the author of the book :-) but apparently other people seem to like it too.) Here are links to the book's blog and web page:
http://doingbayesiandataanalysis.blogspot.com/
http://www.indiana.edu/%7Ekruschke/DoingBayesianDataAnalysis/
Happy Bayesian analyzing!

I usually use the metrop() function I mentioned ab...

2011-08-24T14:09:52.386-04:00

I usually use the metrop() function I mentioned above, but for some special applications I've written my own samplers (e.g., to sample different groups of variables in different blocks, each with their own proposal distribution). I've also worked with more advanced Monte Carlo algorithms (like Hamiltonian Monte Carlo, which uses gradient information to speed up convergence ... assuming that it's easy and fast to compute partial derivatives of your posterior).

By the way, for future reference, the chains you plotted for the Metropolis algorithm are symptomatic of either a too-small step size or a correlated posterior or both.

What you see in the traces is that the algorithm is taking very small steps to gradually approach a good region of probability space. That either means the step size is too small and can be simply increased, or else the posterior distribution is so correlated that with independent steps you have to take small steps.

The two solutions I mentioned are the typical remedies: either increase the step size, or take correlated steps. If there are a lot of correlated variables some people like to break them into small blocks of related parameters (as I mentioned above) and try to move them separately, so a failure to move in one block won't affect the parameters in other blocks.

By contrast, the graphical diagnostic of a step size that's too large is a chain that looks very "stepwise": you get lots of rejections, so the chain stays in the same place for a long time before finally moving.

One rule of thumb is to adjust the step sizes until you get your acceptance rate to about 20% or so. (This isn't a hard rule, and the effective sample size is a better measure of how much information is in your chain.) If you have a small acceptance rate, take smaller steps; if you have a high acceptance rate, take bigger steps. In your original example, I think it was producing around a 70% acceptance rate, suggesting that you could take bigger steps in at least one of the variables.

Thank you very much, Dr. Urban. I will definitely ...

2011-08-24T12:45:47.741-04:00

Thank you very much, Dr. Urban. I will definitely try your suggestions! I agree, the existing packages for R will work much better than my home-brewed code, but I do feel that I have a somewhat better understanding of the how the algorithms are supposed to work now that I have made some attempt at writing them from scratch.

In your research, do you use the existing packages, or have you written custom-tailored code for your use?

Thank you again!

Actually, the simplest thing you could do to impro...

2011-08-24T11:48:50.064-04:00

Actually, the simplest thing you could do to improve performance is just to make the theta step size much bigger. Try sd=10000 instead of sd=100. You can see from the Gibbs plot that the spread of theta is of order 10^4. (Again, in an adaptive sense, you could compute the standard deviation of the thetas you got in the first poorly converged Metropolis run, and use that as the standard deviation of your proposal distribution.)

The Metropolis algorithm isn't performing well...

2011-08-24T11:39:15.714-04:00

The Metropolis algorithm isn't performing well here because the joint posterior for alpha and theta is highly correlated (plot alpha against theta to see this). Your proposal distribution is proposing moves for alpha and theta that are independent of each other, but it would be better to propose moves in a correlated direction along the high probability surface.

In this case, it's probably better to just do Gibbs, but in cases where you can't write down the necessary exact conditional distributions, you can try an adaptive Metropolis approach. Compute the covariance of the poorly converged chain and use it as the proposal distribution to compute second, better converged chain. For example, once you've computed alpha and theta as above, you can compute the covariance of this "preliminary" chain:

precov = cov(cbind(alpha,theta))

Then run the Metropolis algorithm again. Except this time, in your proposal distribution, propose a new alpha and a new theta at the same time, drawn from a (correlated) bivariate normal distribution centered on the existing point:

new.point = rmvnorm(1, mean=c(alpha[i],theta[i]), Sigma=precov)
alpha[i+1] = new.point[1]; theta[i+1] = new.point[2]

(The 'rmvnorm' function may be found in the 'mvtnorm' package, or you can implement multivariate normal sampling yourself.)

It's not always optimal to have the proposal distribution be identical to the covariance of the target posterior, so you may have to scale the 'precov' matrix (e.g., multiply it by 0.5) to get a good acceptance rate (around 25%).

This won't work perfectly, since a multivariate normal proposal distribution doesn't look like the actual posterior (which is curved in alpha-theta space), but it will work much better than independent proposals. (I tried it on your problem and it looks a lot closer to the Gibbs output.)

See Roberts and Rosenthal, "Examples of adaptive MCMC" for more advanced methods.

Also, I know this was a teaching example for yourself, but you can get faster performance by using an R package for MCMC like the 'metrop' function in the 'mcmc' package.