In a project I’m working on I’ve come across the necessity to sample a vector of values from a Dirichlet distribution.
I don’t know you, but during my undergraduate degree we were never taught how to write code to do this type of stuff, i.e. sampling from different probability distributions, etc. Fortunately Wikipedia, being a good starting point for most topics, comes to the rescue.
After reading about random number generators it became sort of clear to me (yes, sorry, if you thought otherwise, no I’m not clever) how I should go about it.
According to Wikipedia generating random numbers from a probability distribution f(x) is equivalent to transforming, somehow, a random number generated from a normal distribution. Amongst all the possible methods, two methods are explicitly mentioned there: The inversion method and The acceptance-rejection method.
Bluntly said, the inversion method involves sampling a random number u from a uniform distribution and then finding x such that u = F(x) where F(.) is the cumulative distribution function (CDF) of the distribution we wish to sample from (always making sure that some conditions are met). Simple, right? Right.
Being the lazy sod that I am this method seemed all right for me so I went for it.
Even though the algorithm is straightforward there’s one issue which must be addressed: searching for that x which corresponds to u = F(x).
From the top of my head three different types approaches could be tried:
- Do a linear search for x so simply try F(i * epsilon) = u. As soon as you have a match, return i * epsilon (epsilon is the precision desired) — this approach is not too bad (and by bad I mean good) for generating a single random value
- Pre-calculate all the F(x) values and store them in a array. Then array[i] = F(i * epsilon). Do a search on the array. This approach is not too bad for generating several values (it improves on the previous approach)
- Do a binary search using F(x) = u as stopping criteria. So, do a binary search for x in [a, b) and test if |F(x) - u| < epsilon if so, stop and return x
In a follow up post I will describe all three approaches and include some java code for each of them. In the meantime, if you can come up with a better approach I’d love to hear from you.
PS: If you want to read more about sampling from probability distributions you can always read this free book: Non-Uniform Random Variate Generation.
Post a Comment