Random draw from a restricted area of the Pareto distribution

#1
We using R to run a Pareto/NBD model. One step requires to draw random numbers from a restricted area of the pareto distribution, and the restriction is based on a given value. Does anyone know how to do this?
To explain this with a brief example (see Tables): Existing values to start from are
- Values 1: vector with n numbers (A,B,….N).
- Value 2: vector with n numbers (AA,BB,….NN).
Every value 1 has a corresponding value 2, whereby value 1 < value 2
- Parameter (s, β) of the Pareto distribution

Now we need n number of random draws (i.e. n= as many random draws RD as there are values in Value 2) from the area of a pareto distribution (and only there) where RD ≥ Value1 .

Example
Value 1; Value 2; Random draw from pareto dristribution pRD from area where
A= 10; AA = 20; pRDA ≥ A
B = 20; BB = 21; pRDB ≥ B
C=5; CC= 30; pRDC ≥ C
N= 30; NN = 35; pRDN ≥ N

Any suggestions how to do this in R?????


One suggestion is to run this “restricted random draw” in the following way:
a) calculate the cumulative distribution function (cdf) of the pareto distribution at Value 1
b) draw a uniform random number between this calculated cdf and 1.0
c) Then invert the drawn uniform random number using the cdf of Pareto to get a valid random draw from the Pareto in the desired region


(1) Value 1;
(2) Value 2 With Pareto Parameter (s, β)
(3) a) calculate cumulative distribution function (cdf) of the pareto distribution at Value 1;
(4) b) draw a uniform random number between this calculated cdf and 1.0 (actually I am not certain whether its indeed > or ≥);
(5) c) invert uRD using the cdf of Pareto to;
(6) arrive at desired pRD for every value;

(1); (2); (3); (4); (5); (6)
A= 10; AA = 20; cdfA; cdfA > uRDA> 1.0; invert uRDA by cdfA; 10 ≥ pRDA
B = 20; BB = 21; cdfB; cdfB> uRDB > 1.0; invert uRDB & cdfB; 20 ≥ pRDB
C=5; CC= 30; cdfC; cdfC > uRDB > 1.0; invert uRDC & cdfC; 5 ≥ pRDC
N= 30; NN = 35; cdfN; cdfN > uRDN> 1.0; invert uRDN & cdfN; 30 ≥ pRDN

Does anyone know how to write this in R code?
I hope the explanation is comprehensible.
Any advice is highly appreciated - Thanks a million!!!!!
 

JesperHP

TS Contributor
#2
I'm not sure I get what you are writing.

You can get an overview of what distributions are available in R here: http://cran.r-project.org/web/views/Distributions.html

There are several packages with Paretodistributions. However there are different types of paretodistributions so you need to figure out which you are supposed to be using.

On wiki it says that the conditional dist of a Pareto is itself a Pareto: http://en.wikipedia.org/wiki/Pareto_distribution#Conditional_distributions


So the way I understand what you are saying is that you start out with two columnvectors:
\( X=
\begin{bmatrix} x_{11} & x_{21} \\
\vdots & \vdots \\
x_{1n} & x_{2n}
\end{bmatrix}
\)
And it is the case that
\( \forall (i): x_{ij} < x_{ij}\)



Then you make a random vector \( x_{31},...,x_{3n} \) were each variable in the random vector is from a paretodistribution but with a minimum value specified by column 1 hence:
\(
x_{3j} \sim \bar{F}(x, \lvert x_m = x_{1j})
\)


If this conditional variable is just a new Pareto as it says on wiki I believe the problem is reduced to making 1 random draw on the properly defined Paretodistribution (one for each row of the matrix). So where you specify the minimum of the Pareto you might be able simply to give it a vector, the vector being column 1 of \(X\) the above matrix.

I would gladly give you some code but I do not have any previous experience using the Paretodistribution, do not know which type you want to use, and is generally unsure that I catch youre drift....
 
Last edited:
#3
Dear JesperHP,

thank you for your response. Sorry for the delay in my response; am just back from a business trip.
Concerning the pareto distribution: according to >http://stats.stackexchange.com/questions/27426/how-do-i-fit-a-set-of-data-to-a-pareto-distribution-in-r< we calculate the pareto distribution (parameters) as follows:
(1)
pareto.MLE <- function(X)
{
n <- length(X)
m <- min(X)
a <- n/sum(log(X)-log(m))
return( c(m,a) )
}
par.eto.par <- c(s=pareto.MLE(ta.data$T)[2], beta=pareto.MLE(ta.data$T)[1])

[NOTE: ta.data$T: the data in question]

(2) Current random draw from an unrestricted pareto distribution we wrote the following code:
library(VGAM)
random <- rpareto(l, location=beta, shape=s)

[Note: s and beta is calculated in (1)]

(3) What we need instead of (2) is a random draw from an RESTRICTED pareto distribution, using the pareto parameters calculated (1), where random ≥ ta.data$T

This means: I actually
- start with one value, i.e. ta.data$T.
- Then make a random draw from a pareto distribution using the parameters (s, beta) estimated in (1)
- and this new = randomly drawn numbers should be ≥ a given value from ta.data$T.

Problem:
If the area from which to draw a new number is not restricted, this new number can become smaller than ta.data$T.
- Thus one could loop and let R make random draws until the new number fulfills the requirement random ≥ ta.data$T. That is not efficient, it takes a lot of time.
- Alternative: restrict the area from which to draw the random numbers.
BUT: I don’ know how to restrict the area where the random draw occurs.

Thus the question is: how to tell R to use only the area from a given pareto distribution with estimated parameters (1) where random ≥ ta.data$T ?

What I have written under ‘one suggestion’ came from someone who calculated a restriction for this random draw in C++; I don’t know whether this is a way to go for R. If you think yes, I’ll describe it again.

Thanks again for any thoughts!