Distinct counts / How many different products did a client buy?

#1
Hi all,

I'm still pretty new to SPSS and trying to find my way around the different analyses that are possible. I'm a bit stuck on replicating one analysis I tend to do in Excel, involving distinct counts for a pair of string variables.

Imagine that I have the following data set:

Client - Product - Price
Adam - Apples - $2.00
Berta - Apples - $1.80
Doug - Apples - $1.50
Doug - Apples - $1.80
Carl - Bananas - $1.50
Adam - Oranges - $2.20
Doug - Oranges - $2.50


I would like to know:
How many different products did each client buy?
So in this case, I would expect the following results:

Client - Distinct Product Count
Adam - 2
Berta - 1
Carl - 1
Doug - 2
(Note that for Doug it should be 2, not 3, because he bought Oranges once and Apples twice but I am interested in distinct products)

In Excel, this is possible by using pivot tables and distinct counts, but I have no idea how to implement this in SPSS (or if it is even possible).

Any pointers will be highly appreciated :)
 
#3
Hi Berley,

Thanks for your reply. I have tried to use crosstabs, but it does not seem to be able to do distinct counts or at least I did not manage to find the correct options. Please see below for the output that I am getting.

The problem is that for Doug, it is showing "3" in the total column, while I would like to see "2", as in he bought two different products, although he did make three transactions.

It would be great if you could help me achieve that. I have pasted the syntax as well, because maybe that is where I am going wrong.

Thanks again :)

Edit: I spelt Oranges wrong in one of the data lines, which is why it shows twice in the attached image, sorry about that
 
Last edited:

Karabiner

TS Contributor
#4
[untested:]

First sort by client and product.

Use Aggregate with break variables client and product
and check the "count number of cases" box. This will give you a new
variable containing the frequency for each combination of client
and product.

Then create a new variable:
IF(name ne LAG(name) | product ne LAG(product)) flag = 1.

You can then select or filter by flag.

With kind regards

K.
 
#5
Hi Karabiner,

Thanks for your suggestion, seems to work perfectly well. Very elegant approach to the solution!

In case that anyone else is stuck with a similar problem, this syntax following Karabiner's hints got me to the point where Crosstabs is showing the desired results for the sample data mentioned above:

SORT CASES BY Client(A) Product(A).

AGGREGATE
/OUTFILE=* MODE=ADDVARIABLES
/BREAK=Client Product
/N_BREAK=N.

IF (Client NE LAG(Client) OR Product NE LAG(PRODUCT)) flag=1.
EXECUTE.

USE ALL.
COMPUTE filter_$=(flag=1).
VARIABLE LABELS filter_$ 'flag=1 (FILTER)'.
VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'.
FORMATS filter_$ (f1.0).
FILTER BY filter_$.
EXECUTE.

CROSSTABS
/TABLES=Client BY Product
/FORMAT=AVALUE TABLES
/CELLS=COUNT
/COUNT ROUND CELL.