Monday, August 22, 2011

Bayesian analysis: Comparing algorithms Part 1?

I recently had the opportunity to engage in some Bayesian analysis at work. I was able to state the problem in terms of the lognormal distribution, and took advantage of JAGS and its integration with "R" using the R2jags package. The client was very happy, and life moved on. However, as the underlying problem may be better described by a transformed beta than a lognormal undergoing transforms (say that five times quickly!), it got me to thinking about how to teach myself at least rudimentary Bayesian analysis using MCMC methods.As my main text, I have been using Dr. Scott M. Lynch's excellent text, "Introduction to Applied Bayesian Statistics and Estimation for Social Scientist". Starting small (I haven't worked on the four-parameter distribution yet), I decided to play with data for which I knew the answer, and see if I could write my own Gibbs or Metropolis-Hastings algorithms. I picked a gamma distribution as my test case.


I'm no expert on MCMC methods, so I'm not going to even attempt to explain how this works, but I did see how the different algorithms behave, and it is rather insightful. For the gamma distribution, while I always parameterize it using shape and scale, for the purposes of Gibbs sampling, I used shape/rate. The nice thging about Gibbs sampling is that as long as your can sample from a distribution which is proportional to the actual conditional distribution, you're fine, and do not have to calculate the constants of proportionality that bring the distribution back to an integral of 1.


In the case of the gamma distribution, and using a flat prior, it is the likelihood which dominates the conditional distribution. Using the shape/rate parameterization, , the likelihood over the observations is:


When looking for a conditional distribution for alpha, this does not resolve into anything remotely nice (although see Fink (1997) "A Compendium of Conjugate Priors" (pdf) for more) so Gibbs sampling will have to give way to a Metropolis algorithm. I'm not sophisticated enough to use a non-symmetrical proposal distribution, so it is not a Metropolis-Hastings algorithm, maybe in a future post. However, when looking at beta, and discarding the x's and alphas, the core of the likelihood is rather gamma-like. Actually, it is proportional to a gamma distribution with the rate equal to the sum of the observed data points and the shape equal to one more than the product of the number of observations and the underlying alpha.

With that observation, and properly writing the algorithm (which took more time than I care to admit), a comparison between using Gibbs sampling or straight Metropolis on the scale parameter can be made.

Here is a "pure" Metropolis algorithm to calculate the posterior distribution, assuming the input data is gamma (this and all "pretty" R code created by Pretty R at inside-R.org :

MHGam <- function (x, n.iter=10000, n.burnin=0){
alpha <- matrix(.5, n.iter)
theta <- matrix(10, n.iter)
acca<- 0
accq <- 0
lnpost <- function(a, q, z){
  NLL <- (a-1)*log(z)-(z/q)-a*log(q)-lgamma(a)
  return(sum(NLL))
}
for (i in 2:n.iter)
{
#sample theta
  acc <- 1
  theta[i] <- theta[i-1] + rnorm(1, mean=0, sd=100)
  if (theta[i]<=0) {acc=0; theta[i] <- theta[i-1]}
  LL <- lnpost(alpha[i-1], theta[i], x)
  if ((LL - lnpost(alpha[i-1], theta[i-1], x)) < log(runif(1))) {
    theta[i] <- theta[i-1]
    acc <- 0
  }
  accq <- accq + acc
#sample alpha
    acc <- 1
    alpha[i] <- alpha[i-1] + runif(1, min=-.5, max=.5)
    if (alpha[i]<=0) {acc=0; alpha[i] <- alpha[i-1]}
    if ((lnpost(alpha[i], theta[i], x) - LL) < log(runif(1))) {
      alpha[i] <- alpha[i-1]
      acc <- 0}
    acca <- acca+acc
    if (i%%100==0) {print(c(i, alpha[i], theta[i], acca/i, accq/i))}
    }
amean <- mean(alpha[n.burnin+1:n.iter])
qmean<-mean(theta[n.burnin+1:n.iter])
Fit <- structure(list(alpha=alpha, theta=theta, amean=amean, qmean=qmean))
return(Fit)
}

The above code will just hand off the parameter sampling between alpha and theta (which is the scale and not rate) and will, hopefully, eventually settle down. I added two extra parameters for iterations and burnin, and eventually, I could add another, outer, for-next loop for multiple chains.


The next code takes advantage of Gibbs sampling for the scale parameter:

MHGamGib <- function (x, n.iter=10000, n.burnin=0){
alpha <- matrix(.5, n.iter)
theta <- matrix(10, n.iter)
acca<- 0
lnpost <- function(a, q, z){
  NLL <- (a-1)*log(z)-(z/q)-a*log(q)-lgamma(a)
  return(sum(NLL))
}
for (i in 2:n.iter)
{
#sample theta
  theta[i] <- 1/rgamma(1, rate=sum(Test), shape=alpha[i-1]*length(x)+1)
#sample alpha
    acc <- 1
    alpha[i] <- alpha[i-1] + runif(1, min=-.5, max=.5)
    if (alpha[i]<=0) {acc=0; alpha[i] <- alpha[i-1]}
    if ((lnpost(alpha[i], theta[i], x) - lnpost(alpha[i-1], theta[i], x)) < log(runif(1))) {
      alpha[i] <- alpha[i-1]
      acc <- 0}
    acca <- acca+acc
    if (i%%100==0) {print(c(i, alpha[i], theta[i], acca/i))}
    }
amean <- mean(alpha[n.burnin+1:n.iter])
qmean<-mean(theta[n.burnin+1:n.iter])
Fit <- structure(list(alpha=alpha, theta=theta, amean=amean, qmean=qmean))
return(Fit)
}

Note that I'm still parameterizing using the scale by sampling the rate and using its reciprocal.

To set up, run, and compare the models, I used the following code:

set.seed(22)
 
Test <- rgamma(100, shape=2.5, scale=1e5)
 
MHT<-system.time(MH<-MHGam(Test, n.iter=50000))
 
MHGT<-system.time(MHG<-MHGamGib(Test, n.iter=50000))

The Gibbs version runs about 40% faster, comparing MHT with MHGT (at least on my laptop), which stands to reason as Gibbs sampling is Metropolis sampling where every variate is accepted. What is more fascinating is how fast the parameter values converge:

Using the following code to overlay the trace plots for theta:

plot(MHG$theta, type="l", col="red")
points(MH$theta, type="l", col="blue")


We see:
The Gibbs sampling (in red) gets to the right neighborhood much more quickly for theta than the Metropolis (blue), and gives a more robust (less serially correlated) answer.

What is also surprising is how the better estimates for theta affect the random-walk Metropolis algorithm for alpha. Using the following code:

plot(MH$alpha, type="l", col="blue")
points(MHG$alpha, type="l", col="red")

We get the following trace:
Personally, I found that, writing the algorithms and seeing the direct results helped me understand the concepts better than just reading, and I'll certainly try to use Gibbs sampling where I can!

Tuesday, June 07, 2011

Social media redux

I'm certain that by now, everyone knows about Congressman Anthony Weiner, his social media behavior, and yesterday's press conference. I'm not going to discuss Weiner's immoral and, dare I say it, despicable behavior in his online actions and the ensuing attempt at a coverup. What I do want to mention is that this serves to reiterate one of the important points I mentioned in yesterday's blog post: that anything you say or post is there forever. If you would not want something on the front page of your local tabloid, don't post it anywhere on the Internet. Of course, if you always attempt to behave appropriately, chances are you won't get into such a mess to begin with, but as we say in the business - that is out of scope for this discussion.

Monday, June 06, 2011

Social Media guidelines specific for (re)insurers

With social media here to stay, and many of us (actuaries, cat modelers, and other (re)insurance company personnel) actively involved in the social media on both personal and professional levels, it behooves us to remember certain social niceties. Everything that you post on the Internet is pretty much there for posterity; 50 years from now, the wayback machine will very likely be able to retrieve that one time you posted a picture of yourself at the holiday party—after spiked eggnog #7. My personal, primary rules, whether it be on twitter, linkedin, facebook, wikipedia, or some random bulletin board are:
  1. Never post anything on-line that you would be uncomfortable saying to someone's face in public.
  2. Never post anything on-line that you would be devastated seeing on the front page of the New York Times.
I am all for having a certain amount of silliness and fun on-line, I mean, where are you going to let your hair down, at work? However, there is a big difference between good-natured silliness and flaming. Too many people believe that there is anonymity on the Internet, and thus act selfishly, spitefully, and all-too-often just plain cruelly with the mistaken assumption that they will never be uncovered. Besides for this being awful and immature behavior, which speaks loads about the person demonstrating it, it can be a serious operational risk hazard for the person's employer, and thus potentially grounds for termination for the employee. For example, it has not even been a year yet since a Massachusetts schools' supervisor lost her job over a facebook post. To demonstrate Santayana's maxim still holds, a teacher in Brooklyn did not learn from history, and seems to be in danger of losing her job for the same reason.

Mairi Mallon, founder and managing director of rein4ce, and @reinsurancegirl on twitter, has recently authored a detailed post on her blog about this very topic (from which I shamelessly stole the idea). I very highly recommend anyone even tangentially involved in social media read it. Better yet, subscribe to her blog (which can be found in the blogroll at the side of this post). I think her summary is important enough that it bears repetition:

  1. Know and follow our (INSERT LINK TO OWN CORPORATE GUIDELINES) corporate guidelines . The same rules apply online.
  2. Users are personally responsible for the content they publish on blogs, Facebook LinkedIn, Twitter or any other form of user-generated media.
  3. Identify yourself—name and, when relevant, role within the organisation—when you discuss company or company-related matters. You must make it clear that you are speaking for yourself and not on behalf of the company.
  4. Respect copyright, fair use and financial disclosure laws.
  5. Don’t provide our or another’s confidential or other proprietary information. Ask permission to publish or report on conversations that may be deemed to be private or internal to the company.
  6. Don’t cite or reference clients, partners or suppliers without their approval. When you do make a reference, where possible link back to the source.
  7. Don’t use ethnic slurs, personal insults, obscenity, or engage in any conduct that would not be acceptable in our workplace (Mallon 2011).
Engaging the social media can be exciting and rewarding, both on a personal and a professional level; make certain that you don't hurt yourself, your image, your integrity, and your future when you do so. Everyone wants to be the next Richard Branson; I'm less certain about who wants to be the next Aleksey Vayner.

References:
Mallon, Mairi, “Social Media guidelines — a freebie for insurance and reinsurance bods.” Weblog entry. Reinsurance girl's blog. June 6, 2011. June 6, 2011 (http://www.rein4ce.co.uk/blog/?p=315).

Sunday, June 05, 2011

Solvency 2, what I don't understand

There has been much fanfare about Solvency II, and how it will be coming to the US now that it is pretty much fait accompli to be a requirement for UK and European insurance companies starting in 2013. Perhaps I am missing something basic, but blow are some of the issues that I have with Solvency II:
  1. Isn't Solvency II based on the same principles as Basel II/III? Now how well has that worked for Europe. Greece, Iceland, Ireland anyone?
  2. The first pillar of Solvency II (Article 75 (1)(b)) requires that "...liabilities shall be valued at the amount for which they could be transferred, or settled, between knowledgeable willing par­ties in an arm’s length transaction." I don't know about the rest of you, but there really is no liquid market for (re)insurer loss reserves. I've been involved in pricing a few loss portfolio attempts--none of which came to fruition mind you--and each one is really a bespoke transaction. Different counter-parties to the same transaction will arrive at different values for the reserves. So how is Solvency II going to handle this?
  3. At its heart, the capital requirement is still a Value at Risk (VaR) measure, albeit at the 99.5%-ile. When will people learn that VaR is a rather non-robust statistic? It is a point on the cumulative frequency distribution, with no "knowledge" of what is above or below it. It is very susceptible to discontinuities (think step function), and its components are non-additive. At the very least, the measure should be based on TVaR, or the expected value above a given point. As an expectation, it is additive in its components (Co-TVaR measures exist and are meaningful) and it, as a first moment, reflects to some extent the entire distribution in the tail above the point, not just a point. As we all know, actually seeing any particular result from a continuous distribution almost never happens, which is why we talk intervals and not points.
  4. While Operational Risk is certainly a significant factor in a company's risk profile, I have yet to see any good measure or process to quantify the expected value of said risk. 
Unlike the banking industry, the insurance industry, especially in the US, has been rather stable for the past century or so, and while I agree that certain elements of Statutory Accounting, such as the holding of nominal reserves, may seem somewhat extreme with respect to other businesses, it has done rather well in ensuring a viable and solvent insurance industry. Why are we rushing into something which has neither the provenance (Basel) nor the track record (not even officially implemented in Europe) to justify adding a new accounting and solvency standard.

    Wednesday, May 18, 2011

    R-bloggers

    As I decided to try and blog a little more often now, and touch on "R" every now and then, I decided to take R-bloggers up on their standing offer to include R-related feeds at their site. So, everything I tag with "rstats" (you can guess where that came from) should flow through to them. I've added them to my (tiny) blogroll at the side of the blog, but if you just cannot wait to see what they are all about, and I recommend it, you can just go here.

    Sunday, May 15, 2011

    Why method of moments doesn't always work

    A number of years ago, someone asked me "why does my company need actuaries to fit curves, once I have the mean and standard deviation of my losses, isn't that enough?" I explained to him that not every distribution is completely determined by its mean and standard deviation (as the normal and lognormal are), and as at that point, I did not have "R" installed on my laptop, I demonstrated it to him in Excel. Having wanted to start blogging about "R", even ever so infrequently, I figured I'd toss together a little code to demonstrate.

    The example I gave was to compare a gamma and a pareto distribution, each of which has mean 10,000 and a CV of 150% (making the standard deviation 15,000). I will spare all of you the algebra, but suffice to say, that using the Klugman-Panjer-Wilmot parameterization (which is used by most casualty actuaries in the past 20 years or so) the parameters of the gamma would be theta (R's scale) = 22500 and alpha (R's shape) = 4/9. The equivalent pareto would have theta (R's scale) = 26000 and alpha (R's shape) = 3.6.

    Graphing the two (and Hadley, please forgive me for using default R' plotting, I left my ggplot book in the office; mea culpa) you can easily see how the distributions are rather different.

     

    To make things easier for me, I used the actuar  package to do the graphing:

    library(actuar)
    curve(dpareto(x, shape=3.6, scale=26000), from=0, to=100000, col="blue")
    curve(dgamma(x, shape=4/9, scale=22500), from=0, to=100000, add=TRUE, col="green")
    Created by Pretty R at inside-R.org

    Obviously, the tails of the distributions, and thus the survival function at a given loss size, is different for the two, notwithstanding their sharing identical first two moments. So, this was just a brief but effective visualization as to how the first two moments do not contain all the information needed to find a "best fit," and why we like to use distributional fitting methods (maximum likelihood, maximum spacing, various minimum distance metrics like Cramer-von Mises, etc.) to get a better understanding of the potential underlying loss processes.

    Friday, May 13, 2011

    New look!

    I decided to change things around a bit, and applied a new template and color scheme. Maybe now my next post won't take another five months! Let me know if you like the new look.

    Wednesday, January 12, 2011

    It has been a while

    It has been quite a while since I posted an update here. Possibly, because the longer I work, the less time I seem to have. Just to share something, I did recently have some fun programming a random gamma variate generator in VBA for some copula work. While I'm certain there are much more elegant implementations out there, it was good to figuratively roll up my sleeves and dust off some VBA programming skills. Now I can keep that line on the resume!