# Cluster analysis with conjoint analysis data

#### Thaosen

##### New Member
Hello everybody,

My team and I have an assignment to run a conjoint analysis with SPSS. So far, we were able to make the orthogonal design to generate 16 cards and run a survey with respondents. Then, we were able to run the conjoint analysis in SPSS to get the Utility for each attribute along with its level.

Our issue though, is that we also have to do a cluster analysis and a chi-square test. For now, we've put the chi-square test on the side to focus on the cluster analysis. My problem is that all the tutorials I was able to find online did not have the data similar to ours in any way. For example, if they had a "color" attribute, it was a single column with different values below it.

The way our data is though is more like 3 columns (color1, color2, color3) with the utility for each respondent below. I ran a K-Means Cluster Analysis (k = 2) in SPSS with this data, but I am not too confident about how to interpret the result. More precisely, I would like to know if I can read for example, cluster 1 highest color is the color of the cluster.

For example:
Code:
Cluster     1      2
-------------------------
Color1      3      -0.2
Color2      -1.3   3
Color3      1.1    1.2
Weight1     1      -0.1
Weight2     0.2    1.3
Weight3     3      0.2
Would it be okay to read it as Color1, Weight3 = Cluster 1 and Color2, Weight2 = Cluster 2?

Thanks!

#### spunky

##### Doesn't actually exist
Cleared from moderators queue!

#### Thaosen

##### New Member
hi,
I think the right approach would be to tidy up your data before the analysis: https://ramnathv.github.io/pycon2014-r/explore/tidy.html

regards

According to the site you linked, the data is already tidy as it answers the 3 criteria (obs. in rows, var in col, single data sheet) no?

For example, if I swap the ID with the labels in SPSS, the result of my conjoint analysis looks like this:
Code:
Respondent     Blue     Red     Green     Light     Medium    Heavy      Very heavy
------------------------------------------------------------------------------------------
1              13.25   -15.63   2.37      19.81     1.06      -20.44      -.44
(The values are the utility estimate)

However, the Cluster Analysis output places the variables on the rows and the cluster on the columns, which is normal I think. Our issues lie in the interpretation of the result, as in, is it okay to compare the attributes by groups, so if the cluster analysis is higher in "Blue" and "Medium", we would say this cluster represents the "Blue & Medium" package best

#### rogojel

##### TS Contributor
hi,
maybe I misunderstand: is one line one observation? E.g. there was one measurement with Blue 13.25, Red -15.63..etc?

#### Thaosen

##### New Member
hi,
maybe I misunderstand: is one line one observation? E.g. there was one measurement with Blue 13.25, Red -15.63..etc?
Sorry for my lack of clarity, I will try to explain our process so far more clearly.

Step 1 - Orthogonal Design
Basically, we started by finding ideas of attributes and different values for each (For example, Attribute = Color, Values = {Blue, Red, Green}). With this, we ran an orthogonal design in SPSS which gave us 16 "cards" (those were combinations of different values per attribute, giving us a kind of package).

Step 2 - Survey
After, we did a survey where people had to put a rating between 0 and 100 the 16 cards that we generated. The results from the survey looked like this :
Code:
SURVEY RESULTS
-----------------------------------------------------------------------------------
Respondent          Package1          Package2          ...          Package16
-----------------------------------------------------------------------------------
1                   16                100                            23
2                   25                0                              76
...
Step 3 - Conjoint Analysis
With this, we ran a conjoint analysis in SPSS using the CONJOINT function with our Orthogonal plan designed at Step 1 and our Database (the result from the survey) from Step 2

This gave us the utility data like this:
Code:
CONJOINT ANALYSIS RESULTS
--------------------------------------------------------------------------------------------
Respondent     Blue     Red     Green     Light     Medium    Heavy      Very heavy
--------------------------------------------------------------------------------------------
1              13.25   -15.63   2.37      19.81     1.06      -20.44      -.44
...
Step 4 - Cluster Analysis
With these utilities, I ran a K-Mean Cluster Analysis with 2 as the K value to try to find two clusters to find out which combination of attribute would be the best. The result looked like this:
Code:
K-MEAN CLUSTER ANALYSIS RESULTS (FINAL CLUSTERS CENTER)
------------------------------------------------------------------------------
Cluster     1      2
------------------------------------------------------------------------------
Color1      3      -0.2
Color2      -1.3   3
Color3      1.1    1.2
Weight1     1      -0.1
Weight2     0.2    1.3
Weight3     3      0.2
...
So, as you can see, up until the cluster analysis 1 line represented values associated with a respondent. For the cluster analysis though, 1 line was showing 1 attribute.

If this is still confusing please let me know, I will try to explain our situation better

Thank you again for the time you are spending trying to help us!

Last edited:

#### rogojel

##### TS Contributor
hi,
thanks for the explanation, I think I understand now what you did. However, it seems to me to be a bit complex, if your goal is to find out which factors or combinations thereof contribute most to the ratings. Did you look at simply analysing the experiment as a DoE?

regards

#### Thaosen

##### New Member
Hello,

Hmm I am not too sure about DoE, looking it up online brought me "Design of Experiment", however, it seems to be a bit confusing to me right now so I will have to read more about it.

Otherwise, we have mainly been following the assignment up to the cluster analysis, where we encountered our biggest issue so far. In essence, the assignment was to make us use the conjoint analysis to see a decomposition model in action and how it could benefit us in a new product environment. However, while the conjoint analysis gave us the Importance for each attribute & its different values, we were hinted to run a cluster analysis to see if we could detect additional packages that could be profitable for our simulation.

P.S. I forgot to mention it, but the numbers below the Cluster1 and Cluster2 columns were the "Final Cluster Centers". I will add it to my previous post.

#### rogojel

##### TS Contributor
hi,
I think you can break down each package as a list of attribute values, so you have data in the form of:

resp attr1 attr2 rating
1 x1 x2 v1
2 x1 x2 v2
1 x3 x4 v3
2 x3 x4 v4

So, if you are interested in the effect od attr1, attr2 on the ratings, this can be analysed directly.

regards