Clustering of variables with time and grouped data

Hello !

I come to you because I have to help one of my colleague who is a plant biologist. The purpose of this study is to cluster 70 quantitative variables. Each of these variables represents a different protein (it is a measure done on it, I don't know how exactly it works).

But here is the difficulty :

We have n = 100 flowers, but we have another variable "drug". Indeed, 50 flowers have been "drugged" and the other 50 flowers have not (control group), and another variable "time", which allow us to follow the evolution in time of the presence of the 70 proteins (1 day, 7 days and 1 month).

Therefore, we need to cluster 70 quantitative variables but we have two conditions : the flower has a drug or not + a time effect.

Usually, it is quite easy to cluster variables : we use an agglomerative clustering based on the correlation between the variables, but if we add conditions on the data it is more complicated and I don't know how to proceed.

Thank you so much for your help (from France)

Bye !!


Active Member
I'm afraid, I do not see what the problem is. Testing subjects in different environments ensures variability in the data, which allows us to see even better which variables belong together and which variables do not. So the accuracy of cluster analysis is increased precisely because we have exposed subjects to different conditions.
Thanks !
I think you are right :D !

Just a thing :), there is still repeated measures ? Is it ok to apply a simple clustering algorithm on the variables ?


Active Member
If we have repeated measures, it is still ok to apply clustering to variables. It would not be ok to apply clustering to observations.

Variables are like people, going through all kinds of related and unrelated situations. And some people remain friends through life, and some people don't.