Recent content by arkm25

  1. A

    How can I check model assumptions when all my residuals are zero?

    I'm doing a 2^4 full factorial experiment. I have 1 continuous response Y and 4 factors, each with 2 levels, yielding a total of 16 level combinations. The experiment is unreplicated and I have a total 16 observations, one for each level combination. The design matrix has only 1's and -1's...
  2. A

    How to interpret the intercept in a regression model which has a categorical covariate?

    I have a regression model with some response Y and covariates height, weight, age, sex. Sex is the only categorical covariate and it takes on the value Sex = 0, for female Sex = 1, for male I want to interpret the intercept of this model, that is the estimated mean value of Y when all the...
  3. A

    How to choose between two regression models when one has a higher adjusted R^2 and the other has a lower BIC?

    I have two regression models with different number of covariates/predictors. After performing a subset selection, the remaining two choices are Model 1, which has 7 covariates and a lower BIC. Model 2, which has 11 covariates and a higher adjusted R^2. Using the BIC criteria, you select the...
  4. A

    Deriving the Mahalanobis distance formula, where is the mistake in my reasoning?

    The squared Mahalanobis distance/length of an observation vector x from its mean (assuming its the zero vector) is given by x^T * S^-1 * x S is the covariance matrix for any given observation x. x^T is the transpose of x This is my reasoning for how its derived. The covariance matrix S is...
  5. A

    In statistical learning, is the learning function a random variable or a constant?

    Hi Consider a predictor x and a response Y, where the true relationship between them is given by Y = f(x) + e. e is a random error term. A training data set (x_1, Y_1), ..., (x_n, Y_n) is collected and from this an estimated learning function f_hat is fitted. Then Y_hat = f_hat(x) becomes...
  6. A

    How can I find an interval estimate for the mean of a Weibull distribution?

    I have a sample of n = 75 taken from a Weibull distribution and have computed mle estimates for the scale a and shape b parameter. The mean of a Weibull distribution (2 parameter) is given as u = a^(-1/b)*gamma(1+ 1/b) In which case I can find an estimate for u by simply plugging in the mle...
  7. A

    Is a kernel density estimation a good approach for small samples?

    If you have a relatively small sample of data points that has more than one mode and you want to estimate the distribution of the population it came from as well make inferences, is kernel density estimation the way to go?
  8. A

    How can I get the data-points from the cluster with smallest within-variance in kmeans?

    I have a kmeans object km <- kmeans(data, centers = k) The values of the within cluster variances can be found in km$withinss, and the smallest one is min(km$withinss). My question is how can I extract the data-points from this minimum cluster? I tried data[km$withinss == min(km$withinss)]...
  9. A

    Bootstrapping with sample(), what should the size of your sub-sample be?

    Hi I am currently bootstrapping my sample, x, using the sample function. sample(x, size = n, replace = T) My question is how do you know what size should be? Is there a standard procedure in determining the value of size?
  10. A

    Is the mean of a kernel density estimator a valid estimator of the population mean?

    Yes each sample point uses a standard normal dist as a kernel
  11. A

    Is the mean of a kernel density estimator a valid estimator of the population mean?

    Say you have an independent sample X_1, X_2, ..., X_n drawn from some population where each X_i have the same univariate density f(x). You estimate this density function using a non-parametric kernel density estimator f_kde(x). My question is that since, f_kde(x) is an estimate of f(x), can you...
  12. A

    Which clustering method can I use?

    Hi Yes they are all associated with time to breakdown. There were several other variables in the original data-set that were removed, and these are the ones remaining
  13. A

    Which clustering method can I use?

    I am sorry I had to edit my question. What I meant was I have 1 dependent continuous variable and 3 independent categorical variables. The continuous variable is the time until a factory machine breaks down. It is simply measured as the the time from when a machine is put in operation til it no...
  14. A

    Which clustering method can I use?

    I am sorry I had to edit my question. What I meant was I have 1 dependent continuous variable and 3 independent categorical variables. Since there is a independent/response variable, this becomes supervised learning. Some context: The response variable is the time until a factory machine...
  15. A

    Which clustering method can I use?

    I have a data-set which consists of 1 dependent continuous variable and 3 independent categorical variables. I need to find the cluster/group of data points with the smallest within-cluster variance of the independent variable. Any suggestions as to which clustering method I can use?