  1. AdrienC

    Reduce the dimension (rows) with clustering

    Hello everyone ! I have a data with n = 100 000 rows and p = 2 variables X and Y. There is a trend between these two variables however it is very blurry and we don't see anything (too many points). My strategy is to use a clustering algorithm (K-Means for example) on the 100 000 rows and to...
  2. AdrienC

    How to determine the abnormality of a specific variable by taking into account all the other variables in the data?

    Hello, I have an issue of machine learning/anomaly detection. Indeed, I have a variable Y and several other variables X. The purpose is to quantify the degree of abnormality of the data on Y but I have to take into account the values on the other variables (the relationship between Y and X)...
  3. AdrienC

    Hill's estimator for heavy tails

    Hello, I have a heavy-tail distribution and I was wondering what exactly does the Hill's estimator ? I am not sure to fully understand this notion. I know it estimates something about the tail of the distribution. Thank you so much Have a nice day
  4. AdrienC

    Clustering of variables with time and grouped data

    Hello ! I come to you because I have to help one of my colleague who is a plant biologist. The purpose of this study is to cluster 70 quantitative variables. Each of these variables represents a different protein (it is a measure done on it, I don't know how exactly it works). But here is the...
  5. AdrienC

    dissimilarity measure between categorical variables

    Hello, I have a problem. Indeed I have 9 categorical variables and I'm trying to do clustering variables. Could you tell me if I'm doing the right thing ? 1) I calculate for each pair of variables the Cramer's V. I represent those associations in a matrix. I call it X. 2) I calculate 1-X and...
  6. AdrienC

    Correlation between two set of variables

    Hello, i have a problem with a correlation study. I have to study the linear relationship between two set of variables X and Y. There are 12 variables in X and 4 in Y. I applied a Canonical Correlation Analysis and it seemed all the variables in X with 3 in Y were correlated. I also did a...