Search results

  1. A

    How can I find an interval estimate for the mean of a Weibull distribution?

    I have a sample of n = 75 taken from a Weibull distribution and have computed mle estimates for the scale a and shape b parameter. The mean of a Weibull distribution (2 parameter) is given as u = a^(-1/b)*gamma(1+ 1/b) In which case I can find an estimate for u by simply plugging in the mle...
  2. A

    Is a kernel density estimation a good approach for small samples?

    If you have a relatively small sample of data points that has more than one mode and you want to estimate the distribution of the population it came from as well make inferences, is kernel density estimation the way to go?
  3. A

    How can I get the data-points from the cluster with smallest within-variance in kmeans?

    I have a kmeans object km <- kmeans(data, centers = k) The values of the within cluster variances can be found in km$withinss, and the smallest one is min(km$withinss). My question is how can I extract the data-points from this minimum cluster? I tried data[km$withinss == min(km$withinss)]...
  4. A

    Bootstrapping with sample(), what should the size of your sub-sample be?

    Hi I am currently bootstrapping my sample, x, using the sample function. sample(x, size = n, replace = T) My question is how do you know what size should be? Is there a standard procedure in determining the value of size?
  5. A

    Is the mean of a kernel density estimator a valid estimator of the population mean?

    Yes each sample point uses a standard normal dist as a kernel
  6. A

    Is the mean of a kernel density estimator a valid estimator of the population mean?

    Say you have an independent sample X_1, X_2, ..., X_n drawn from some population where each X_i have the same univariate density f(x). You estimate this density function using a non-parametric kernel density estimator f_kde(x). My question is that since, f_kde(x) is an estimate of f(x), can you...
  7. A

    Which clustering method can I use?

    Hi Yes they are all associated with time to breakdown. There were several other variables in the original data-set that were removed, and these are the ones remaining
  8. A

    Which clustering method can I use?

    I am sorry I had to edit my question. What I meant was I have 1 dependent continuous variable and 3 independent categorical variables. The continuous variable is the time until a factory machine breaks down. It is simply measured as the the time from when a machine is put in operation til it no...
  9. A

    Which clustering method can I use?

    I am sorry I had to edit my question. What I meant was I have 1 dependent continuous variable and 3 independent categorical variables. Since there is a independent/response variable, this becomes supervised learning. Some context: The response variable is the time until a factory machine...
  10. A

    Which clustering method can I use?

    I have a data-set which consists of 1 dependent continuous variable and 3 independent categorical variables. I need to find the cluster/group of data points with the smallest within-cluster variance of the independent variable. Any suggestions as to which clustering method I can use?
  11. A

    How to check for multicolinearity between two categorical variables when their contingency table contains many zero entries?

    Problem: I have to build a multiple regression model where most of my predictor (independent) variables are categorical (nominal) but I'm running into a few problems due to some of the predictors being (perfectly) colinear. So I need to check for multicolinearity and remove redundant predictors...
  12. A

    I have a data-set with 1 continuous independent variable and several categorical variables, how can I find the most important categorical variable?

    Hi Karabiner I'll use a generic example (since I cannot share my data-set), let Y be the lifetime of a certain type of machine. A given machine has a lot of other information associated with it such as material, the factory that runs it, the conditions it runs in etc. These serve as the...
  13. A

    I have a data-set with 1 continuous independent variable and several categorical variables, how can I find the most important categorical variable?

    Hi I have a data-set which consists of 1 continuous (although it can be discrete if I choose to round up the values) dependent variable Y and several categorical and discrete data columns that may or may not have an effect on Y. Y, in this case, is not normally distributed so to check if a...