hclust function-cluster analysis-text/document-function creation

excelmac

New Member
Hi guys

Im working on a text mining/clustering project and am trying to create a table which contains number of clusters as rows and 6 columns representing the following 6 metrics: max.diameter, min.separation, average.within,average.between,avg.silwidth,dunn.

I need to create the tables for 3 methods - kmeans, pam and hclust.

I was able to create something for kmeans

Code:
dtm0.90Dist = dist(dtm0.90)

foreachcluster = function(k) {
kmeans.result = kmeans(dtm0.90, k);
kmeans.stats = cluster.stats(dtm0.90Dist,kmeans.result$cluster); c(kmeans.stats$min.separation, kmeans.stats$max.diameter, kmeans.stats$average.within, kmeans.stats$avearge.between, kmeans.stats$avg.silwidth, kmeans.stats$dunn) } rbind(foreachcluster(2), foreachcluster(3), foreachcluster(4), foreachcluster(5), foreachcluster(6), foreachcluster(7),foreachcluster(8)) OUTPUT Code:  [,1] [,2] [,3] [,4] [,5] [1,] 3.162278 30.19934 5.831550 0.5403872 0.10471348 [2,] 2.236068 28.37252 5.006058 0.3923446 0.07881104 [3,] 1.000000 28.37252 4.995478 0.2496066 0.03524537 [4,] 1.000000 26.40076 4.387212 0.2633338 0.03787770 [5,] 1.000000 26.40076 4.353248 0.2681947 0.03787770 [6,] 1.000000 26.40076 4.163757 0.1633954 0.03787770 [7,] 1.000000 26.40076 4.128927 0.2676423 0.03787770 OUTPUT END I need similar output for hclust and pam methods but for the life of me can't get the same function to work for either of the two methods OK, so I was able to make the function for HCLUST Code: forhclust=function(k){dfDist = dist(dtm0.90); hclust.result = hclust(dfDist); hclust.cluster = (cutree(hclust.result, k)); cluster.stats(dfDist,hclust.cluster);c(cluster.stats$min.separation)}
But I get an error when i run this

Error in cluster.stats$min.separation : object of type 'closure' is not subsettable What I need is for it to print "min.separation" output and other 5 measures like in the kmeans code. I would really appreciate all the help and perhaps some guidance in understanding why my approach is failing in hclust. Also, is there a good source that can explain the functioning and application of these methods, step by step, in detail? Thank You Last edited: trinker ggplot2orBust I haven't tried but I don't think your example is reproducible. Can you provide a data set of some sort (minimal) and code that will reproduce the error? Also... When you're posting code, dataframes or computer output it's helpful to wrap this information in code tags by: 1. either clicking the pound (#) sign icon or 2. wrap with [NOPARSE] Code: some code [/NOPARSE] which produces: Code: some code For more see this (LINK) Indenting code is also considered kind. excelmac New Member Thank you Trinker - I'll take care of the codes and formatting in my posts in the future. I ran the code in R and it works -i'm not sure what you mean by reproducible. Basically, there is an XML file that is converted to a corpus then converted to an R readable dataframe and then cleaned for sparse words etc and then a document term matrix is created and then the Kmeans, Hclust and PAM methods are applied. That is what I have followed so far to get that output. Is it possible to create a function that picks up certain values from the list of values that one receives when running "cluster.stats" command under hclust option? for example, when i used kmeans i was able to specify Code: c(kmeans.stats$min.separation, kmeans.stats$max.diameter, kmeans.stats$average.within, kmeans.stats$avearge.between, kmeans.stats$avg.silwidth, kmeans.stats\$dunn)
to pick the 6 options i needed from cluster.stats option

EDIT: i have attached the XML file and a text file containing the code I have so far. Perhaps that is what you were referring to when you said 'reproducible'.

Last edited: