Hierarchical clustering? Ward's linkage? K-mean clustering? anova?

#1
Hi all, I am very confusing on clustering.
My problem facing now: there are multiple clustering types and methods in hierarchical clustering, and I don’t which method is suitable for my experiment.

I want to know the differences between different linkage methods (simple, group average, ward’s linkage, etc.) and also distance methods (e.g. range, standard deviations, manhattan, etc.).

My data have 614 data in total. Someone suggested that it is less 800s, which not suitable to do hierarchical clustering, and the reliability is low. Is it true?
If so, what method I can used to do the grouping? K-mean clustering? repeated anova?

Moreover, my data is time -related, which is collected in 4 months in a year. Someone suggested me to do time de-trend, but I worried that it will lose my target. Since I want to group the items together by their growing trends.

Please anyone can help?
 

Karabiner

TS Contributor
#2
Could you please describe the topic of this research, and the research questions?
Which variables were measured, and what did the design of the data collection look
like?

With kind regards

Karabiner
 
#3
The topic of research is spontaneous plants on green roof. The research questions are:
What are the growing trends of spontaneous plants on green roof?
What special are the species in each growing trend and can there be further sub-group?

Variables collected: species, coverage rate(%) and height in 4 months (with 2-3 months as intervals)
 

Miner

TS Contributor
#5
Hi all, I am very confusing on clustering.
My problem facing now: there are multiple clustering types and methods in hierarchical clustering, and I don’t which method is suitable for my experiment.

I want to know the differences between different linkage methods (simple, group average, ward’s linkage, etc.) and also distance methods (e.g. range, standard deviations, manhattan, etc.).

My data have 614 data in total. Someone suggested that it is less 800s, which not suitable to do hierarchical clustering, and the reliability is low. Is it true?
If so, what method I can used to do the grouping? K-mean clustering? repeated anova?
I did some research on linkage and distance methods in the past. The attached is a distilled summary of that.

Regarding the sample sizes. Reliability will depend on the separation between clusters. If they are widely separated, a relatively small sample will suffice. As the separation between clusters decreases, a larger sample size is required to maintain the reliability.

Use hierarchical clustering when you have no prior knowledge of what the clusters may be. Use K-means when you do have prior knowledge of the clusters and you want to group your data into those clusters. An example is if you had data on a group of bears and you had pre-determined clusters of small, medium and large. You provide a seed bear for each cluster.
 

Attachments

Last edited: