Search results

  1. AdrienC

    How to determine the abnormality of a specific variable by taking into account all the other variables in the data?

    Thank you for your answers. Indeed there is a big field about anomaly detection. I am doing my phd on this : I work on Isolation Forest, Local Outlier Factor,.... but all thoses methods like Mahabolis distance only measure an anormality score on a dataframe X with p columns and n rows. My...
  2. AdrienC

    How to determine the abnormality of a specific variable by taking into account all the other variables in the data?

    Hello, thank you for your answers. Indeed it is a data with n = 100 000 individuals (rows) and p = 50 columns (where the first one is Y and the other 49 variables are X). All the variables are quantitatives and they are not times series. I can't go into details but the variables on X are just...
  3. AdrienC

    How to determine the abnormality of a specific variable by taking into account all the other variables in the data?

    Hello, I have an issue of machine learning/anomaly detection. Indeed, I have a variable Y and several other variables X. The purpose is to quantify the degree of abnormality of the data on Y but I have to take into account the values on the other variables (the relationship between Y and X)...
  4. AdrienC

    Hill's estimator for heavy tails

    Hello, I have a heavy-tail distribution and I was wondering what exactly does the Hill's estimator ? I am not sure to fully understand this notion. I know it estimates something about the tail of the distribution. Thank you so much Have a nice day
  5. AdrienC

    Clustering of variables with time and grouped data

    Thank you so much ! It helps me out :) I'm used to clustering rows but never columns :D
  6. AdrienC

    Clustering of variables with time and grouped data

    Thanks ! I think you are right :D ! Just a thing :), there is still repeated measures ? Is it ok to apply a simple clustering algorithm on the variables ?
  7. AdrienC

    Clustering of variables with time and grouped data

    Hello ! I come to you because I have to help one of my colleague who is a plant biologist. The purpose of this study is to cluster 70 quantitative variables. Each of these variables represents a different protein (it is a measure done on it, I don't know how exactly it works). But here is the...
  8. AdrienC

    dissimilarity measure between categorical variables

    Hello, I have a problem. Indeed I have 9 categorical variables and I'm trying to do clustering variables. Could you tell me if I'm doing the right thing ? 1) I calculate for each pair of variables the Cramer's V. I represent those associations in a matrix. I call it X. 2) I calculate 1-X and...
  9. AdrienC

    Correlation between two set of variables

    Hello. The variables in X represent several optic tests in the light. And the variables in Y measurements we've done on the eyes. My compagny wants to know if there is a connexion between those two set of variables.
  10. AdrienC

    Correlation between two set of variables

    Hello, i have a problem with a correlation study. I have to study the linear relationship between two set of variables X and Y. There are 12 variables in X and 4 in Y. I applied a Canonical Correlation Analysis and it seemed all the variables in X with 3 in Y were correlated. I also did a...