1. ### Reduce the dimension (rows) with clustering

Hello everyone ! I have a data with n = 100 000 rows and p = 2 variables X and Y. There is a trend between these two variables however it is very blurry and we don't see anything (too many points). My strategy is to use a clustering algorithm (K-Means for example) on the 100 000 rows and to...
2. ### How to determine the abnormality of a specific variable by taking into account all the other variables in the data?

Thank you for your answers. Indeed there is a big field about anomaly detection. I am doing my phd on this : I work on Isolation Forest, Local Outlier Factor,.... but all thoses methods like Mahabolis distance only measure an anormality score on a dataframe X with p columns and n rows. My...
3. ### How to determine the abnormality of a specific variable by taking into account all the other variables in the data?

Hello, thank you for your answers. Indeed it is a data with n = 100 000 individuals (rows) and p = 50 columns (where the first one is Y and the other 49 variables are X). All the variables are quantitatives and they are not times series. I can't go into details but the variables on X are just...
4. ### How to determine the abnormality of a specific variable by taking into account all the other variables in the data?

Hello, I have an issue of machine learning/anomaly detection. Indeed, I have a variable Y and several other variables X. The purpose is to quantify the degree of abnormality of the data on Y but I have to take into account the values on the other variables (the relationship between Y and X)...
5. ### Hill's estimator for heavy tails

Hello, I have a heavy-tail distribution and I was wondering what exactly does the Hill's estimator ? I am not sure to fully understand this notion. I know it estimates something about the tail of the distribution. Thank you so much Have a nice day
6. ### Clustering of variables with time and grouped data

Thank you so much ! It helps me out :) I'm used to clustering rows but never columns :D
7. ### Clustering of variables with time and grouped data

Thanks ! I think you are right :D ! Just a thing :), there is still repeated measures ? Is it ok to apply a simple clustering algorithm on the variables ?
8. ### Clustering of variables with time and grouped data

Hello ! I come to you because I have to help one of my colleague who is a plant biologist. The purpose of this study is to cluster 70 quantitative variables. Each of these variables represents a different protein (it is a measure done on it, I don't know how exactly it works). But here is the...
9. ### dissimilarity measure between categorical variables

Hello, I have a problem. Indeed I have 9 categorical variables and I'm trying to do clustering variables. Could you tell me if I'm doing the right thing ? 1) I calculate for each pair of variables the Cramer's V. I represent those associations in a matrix. I call it X. 2) I calculate 1-X and...
10. ### Correlation between two set of variables

Hello. The variables in X represent several optic tests in the light. And the variables in Y measurements we've done on the eyes. My compagny wants to know if there is a connexion between those two set of variables.
11. ### Correlation between two set of variables

Hello, i have a problem with a correlation study. I have to study the linear relationship between two set of variables X and Y. There are 12 variables in X and 4 in Y. I applied a Canonical Correlation Analysis and it seemed all the variables in X with 3 in Y were correlated. I also did a...