Sunday, September 16, 2007 - Posts

Using log-transform to avoid underflow problem incomputing posterior probabilities

Last month while I was doing the science intern, I worked with latent-class models and most of the time, we used the posterior class-membership probabilities as the criteria to put users into cohorts.  At first, I implemented some Bayesian approach for computing the posteriors on Stata. It was pretty straightforward, and it seemed to work fine... until we ran into a huge dataset. I then spent roughly half an hour on the output message and the source of the error was found to be an underflow problem in computing the posteriors. In particular, it is a problem with multiple choices (repeated observations) under local independence, when the join probability is a product of the probabilities of many seemingly independent events and the product eventually becomes to small to be represented even using a double precision floating point number.  Surprisingly, such problem occurs more often than I would have thought. To deal with the new dataset, I had to add some code to check for this problem, and implemented a new algorithm that utilize the log-transform when the underflow/overflow is detected.

While I did wrote quite a few documentation about my code, I had always wanted to summarize the work in a technical way. So I wrote a short paper today. 
posted by wenyang with 0 Comments