Hello,
I'm working on a problem where I have 30% of a full dataset and I have to estimate the generalization error.
To be more precise, let's say I have the information of the transactions of the clients of a bank which has 30% of the countries market.
I can easily get the mean, standard deviation and so on of my dataset but I can't figure out how to extrapolate.
I know all about basic statistics and so I went through all my lectures to try to find the answer.
I would like to calculate the sampling error of my dataset.
To explain clearly:
population: the information of all transactions in a certain country.
mean : m (unknown)
standard deviation: sigma (unknown)
my dataset : the information of the transactions of the clients of 1 bank owning 30% of the countries market. (so 30% of the whole big dataset)
mean : m* (known)
standard deviation : s (known)
The problem is that in all the examples I see, the formulas all include the standard deviation of the population, which i don't have.
I used this formula:
m=[m* ± 1.96*sigma/√(n)] n being the size of my sample
They use "sigma" which I don't know.
The second part would be to get the variance (or standard deviation) error but i'm not there yet.
Any help would be apreciated,
Thank you
Nicolas
I'm working on a problem where I have 30% of a full dataset and I have to estimate the generalization error.
To be more precise, let's say I have the information of the transactions of the clients of a bank which has 30% of the countries market.
I can easily get the mean, standard deviation and so on of my dataset but I can't figure out how to extrapolate.
I know all about basic statistics and so I went through all my lectures to try to find the answer.
I would like to calculate the sampling error of my dataset.
To explain clearly:
population: the information of all transactions in a certain country.
mean : m (unknown)
standard deviation: sigma (unknown)
my dataset : the information of the transactions of the clients of 1 bank owning 30% of the countries market. (so 30% of the whole big dataset)
mean : m* (known)
standard deviation : s (known)
The problem is that in all the examples I see, the formulas all include the standard deviation of the population, which i don't have.
I used this formula:
m=[m* ± 1.96*sigma/√(n)] n being the size of my sample
They use "sigma" which I don't know.
The second part would be to get the variance (or standard deviation) error but i'm not there yet.
Any help would be apreciated,
Thank you
Nicolas