Search results

  1. A

    Comparing two survival curves

    Hi I have two survival curves/functions given by two sets of data points (t1, P(T>t1)), (t2, P(T>t2) ), ..., (tn, P(T>tn) ) (t1, P*(T>t1)), (t2, P*(T>t2) ), ..., (tn, P*(T>tn) ) where the time points are t1, t2, ..., tn and for each time point I have an estimated probability of survival...
  2. A

    How can I check model assumptions when all my residuals are zero?

    I'm doing a 2^4 full factorial experiment. I have 1 continuous response Y and 4 factors, each with 2 levels, yielding a total of 16 level combinations. The experiment is unreplicated and I have a total 16 observations, one for each level combination. The design matrix has only 1's and -1's...
  3. A

    How to interpret the intercept in a regression model which has a categorical covariate?

    I have a regression model with some response Y and covariates height, weight, age, sex. Sex is the only categorical covariate and it takes on the value Sex = 0, for female Sex = 1, for male I want to interpret the intercept of this model, that is the estimated mean value of Y when all the...
  4. A

    How to choose between two regression models when one has a higher adjusted R^2 and the other has a lower BIC?

    I have two regression models with different number of covariates/predictors. After performing a subset selection, the remaining two choices are Model 1, which has 7 covariates and a lower BIC. Model 2, which has 11 covariates and a higher adjusted R^2. Using the BIC criteria, you select the...
  5. A

    Deriving the Mahalanobis distance formula, where is the mistake in my reasoning?

    The squared Mahalanobis distance/length of an observation vector x from its mean (assuming its the zero vector) is given by x^T * S^-1 * x S is the covariance matrix for any given observation x. x^T is the transpose of x This is my reasoning for how its derived. The covariance matrix S is...
  6. A

    In statistical learning, is the learning function a random variable or a constant?

    Hi Consider a predictor x and a response Y, where the true relationship between them is given by Y = f(x) + e. e is a random error term. A training data set (x_1, Y_1), ..., (x_n, Y_n) is collected and from this an estimated learning function f_hat is fitted. Then Y_hat = f_hat(x) becomes...
  7. A

    How can I find an interval estimate for the mean of a Weibull distribution?

    I have a sample of n = 75 taken from a Weibull distribution and have computed mle estimates for the scale a and shape b parameter. The mean of a Weibull distribution (2 parameter) is given as u = a^(-1/b)*gamma(1+ 1/b) In which case I can find an estimate for u by simply plugging in the mle...
  8. A

    Is a kernel density estimation a good approach for small samples?

    If you have a relatively small sample of data points that has more than one mode and you want to estimate the distribution of the population it came from as well make inferences, is kernel density estimation the way to go?
  9. A

    How can I get the data-points from the cluster with smallest within-variance in kmeans?

    I have a kmeans object km <- kmeans(data, centers = k) The values of the within cluster variances can be found in km$withinss, and the smallest one is min(km$withinss). My question is how can I extract the data-points from this minimum cluster? I tried data[km$withinss == min(km$withinss)]...
  10. A

    Bootstrapping with sample(), what should the size of your sub-sample be?

    Hi I am currently bootstrapping my sample, x, using the sample function. sample(x, size = n, replace = T) My question is how do you know what size should be? Is there a standard procedure in determining the value of size?
  11. A

    Is the mean of a kernel density estimator a valid estimator of the population mean?

    Say you have an independent sample X_1, X_2, ..., X_n drawn from some population where each X_i have the same univariate density f(x). You estimate this density function using a non-parametric kernel density estimator f_kde(x). My question is that since, f_kde(x) is an estimate of f(x), can you...
  12. A

    Which clustering method can I use?

    I have a data-set which consists of 1 dependent continuous variable and 3 independent categorical variables. I need to find the cluster/group of data points with the smallest within-cluster variance of the independent variable. Any suggestions as to which clustering method I can use?
  13. A

    How to check for multicolinearity between two categorical variables when their contingency table contains many zero entries?

    Problem: I have to build a multiple regression model where most of my predictor (independent) variables are categorical (nominal) but I'm running into a few problems due to some of the predictors being (perfectly) colinear. So I need to check for multicolinearity and remove redundant predictors...
  14. A

    I have a data-set with 1 continuous independent variable and several categorical variables, how can I find the most important categorical variable?

    Hi I have a data-set which consists of 1 continuous (although it can be discrete if I choose to round up the values) dependent variable Y and several categorical and discrete data columns that may or may not have an effect on Y. Y, in this case, is not normally distributed so to check if a...