# Sampling from N data points based on a given probability distribution

#### winecoding

##### New Member
I have N data points, and each data points is associated with a probability. The sum of the N probabilities equals to one. I need to sample these N data points based on the given probability vector. Are there any algorithms that can allow me to do this kind of sampling?

#### BGM

##### TS Contributor
http://en.wikipedia.org/wiki/Inverse_transform_sampling

Suppose you want to simulate a discrete distribution with support $$\{x_1, x_2, \ldots, x_N\}$$ (can be countably infinitely many) with probabilities $$p_i, i = 1, 2, \ldots, N$$ respectively.

You can use the inverse transform sampling as usual (now the CDF is just a step function with jumps):

1. Generate $$U \sim \text{Uniform}(0,1)$$

2. Assign $$\text{Sum} = 0, i = 0$$
Repeat $$i = i + 1, \text{Sum} = \text{Sum} + p_i$$ Until $$\text{Sum} > U$$

3. Output $$x_i$$

Note that if $$p_i$$ is sorted in descending order, then this algorithm will be most efficient. Some well known discrete distributions have more efficient algorithm but the above algorithm is applicable to all discrete distributions.