Chi-Square Triangle Beer Test with weird answers...

I am trying to help develop a test for deciding if two different beer recipes taste statistically different under a blind triangle test. I created an equation in Excel to calculate the CHI-Square value and it works most of the time (which is worrying because it should work all of the time.) Upon further inspection, I've managed to confuse myself thoroughly by trying to double check my work.

If I test 50 people, I need 23 of them to pick the different beer for it to be considered statistically significant at 95% (or .05 risk which gives me a Chi-Squared Value of 3.841 to beat.) When I use my equation it works correctly from 15 correct responses and upward. When I tell it that there are less than 15 correct responses, it starts to increase the chi-squared value until around 8 people when it begins to say those are statistically significant answers which makes no sense.

Here is the equation (A22 is the total number of participants (50) and C22 is the number of correct responses (which should be 23 or higher to be significant):

I tried to double check my work below and now this says that 23 people are not enough to be statistically significant (even though I found that number in a table in a math book.)

Outcome Class Observed outcomes Probability of each class Expected Occurances (obs-exp)2*/ exp
correct beer 23 0.333333333 16.66666667 2.406666667
wrong beers 27 0.666666667 33.33333333 1.203333333
Sum 50 1 50 3.61

3.61 is less than 3.84 yet the table says 23 people is the correct number for a sample of 50 participants...

Can someone please help me make sense of this? Sorry if it's an obvious answer...
Thank you for your help!


Super Moderator
Can you explain what actually occurred in your experiment?

Also, where do you get this 23/50 cutoff for significance from? That doesn't sound at all right.
Sorry for the confusion. I have not run the experiment yet; I'm just trying to design it properly so we don't end up with a bunch of useless data.

This is the equation that I copied from an online book to get the 23 out of 50 significance level. =ROUNDUP((0.4714*(NORMSINV(.95))*SQRT(A22))+(((2*A22)+3)/6), 0) where A22 is the total number of participants (at this point 50). The only thing I didn't understand was the constant of .4714 they used to devise the table (the link is here on page 131 : triangle test table calculation&f=false ).

I calculated the Chi-Square value to be 3.841 by entering =CHIINV((1-.95), 1) for a risk level of .05.

I can upload the excel document too if that helps!

Thanks again everyone for your help!


Super Moderator
Even if you haven't run the experiment, describing what will actually happen in it could be useful - a triangle test is not a concept that will be familiar to most of us. From there we can offer guidance on statistical testing.
Sure! I just didn't want to bog down the thread with info that people might not have needed. It's an experiment to see if we can brew the same batch of beer with cheaper hops without a noticeable change in taste or smell. Here is the experiment in full detail:

How To Conduct This Study:
Have each participant try 3 samples of beer and without knowing the point of the experiment or how it affects the beer in any way. Two of the three samples will be the batch made with normal hops, the third will be the new hops. For consistency reasons, it would be best to create a mash and then split it into different batches for the hop additions. Ask them to identify the different beer. Once they have picked one, have them describe how it is different (taste, smell, malt, hops, spices/additives).

How To Avoid Experimental Bias In This Study:
Each beer must be in a opaque container so color or clarity won't affect their choice. The 3 samples should be placed in a random order as people have a bias to pick the middle cup (AKA Green Blue Red, GRB, RGB, RBG, GBR, GRB). Mark the bottom of each cup with a different colored sticker and have two different colors be the same beer (AKA beer #1 is red, beer #2 is blue, and beer #3 is green. Red and blue are the same beer poured into different cups). This is important because human psycology can create a bias for samples labeled "A" or "1". Each participant should be alone for the duration of the experiment to avoid outside influence from others. Each Sample should be large enough for the participant to test them multiple times (1-2oz recommended).

Experiment Details:
Beer 1 and 2 are normal (null) and Beer 3 is made with the new hops (hypothesis)
The point of this experiment is to prove that the beer made with different hop is not discernible from other beers with a 95% accuracy (p=.05)
The experiment could be conducted with random people or certified BJCPs. It would be interesting to see if one group could tell a difference the other could not.
For this experiment to be valid we need at least 15 people. The more people you can test the more accurate the data will be. A statistical penalty is be paid for having fewer samples.
The other Tab in this document should be printed off and filled out by the participant for data collection. Number the sheets so you don't loose track of your place or double enter the data.

The dynamic calculator I'm trying to build for it needs to calculate:
  • # of people that must choose Beer 3 to be considered different than random chance = =ROUNDUP((0.4714*(NORMSINV(D22))*SQRT(A22))+(((2*A22)+3)/6), 0) where D22 is .95 and A22 is total number of participants
  • Chi-Squared Value to Beat =CHIINV((1-.95), 1)
  • Calculated Chi-Sqauared Value Derived from Test = =(((INT((A22-C22)-(A22*(2/3)))-0.5)^2)/(A22/3))+(((INT((A22-C22)-(A22*(2/3)))-0.5)^2)/(A22*(2/3))) where C22 is the # of participants that correctly identified the different beer
  • Probability of Beer Being Different (Not really necessary its just for info sake. Based on Chi Squared value derived from test) =1-(CHIDIST(F22, 1)) where F22 is the Chi Squared value.


Super Moderator
Sure! I just didn't want to bog down the thread with info that people might not have needed. It's an experiment to see if we can brew the same batch of beer with cheaper hops without a noticeable change in taste or smell.
On this forum, you can never say too much about beer :)

Can I just clarify a couple of things:
1) Are you just trying to work out how to do the statistical analysis once the data is collected? Or are you trying to use statistical methods (e.g., power analysis) to figure out the appropriate sample size for your study?
2) Is there a compelling reason that you need to use Excel? There are much easier and more reliable ways to go about doing your analysis.
Haha awesome! I'm basically trying to replicate the test companies use to see if the same food produced at different factories taste the same or a new cheaper recipe vs an older and more expensive one.

I am trying to discern a statistically sound sample size and I'm trying to automate it within Excel. I'm using Excel because A) I already own it B) I can automate certain parts of it and C) I lost my student version of MiniTab after college.

I may have to get another program here soon when I need to run a multi-level, multi-variable experiment but for now I wanted to start with something easier (especially since I haven't exercised this part of my brain since I graduated college 6 years ago haha.)
I am trying to help develop a test for deciding if two different beer recipes taste statistically different under a blind triangle test.
A triangle test (or a duo-trio test) is the name used for the food and beer industry for what statisticians call the non-parametric binomial test.

Each test has three glasses of beer. Two of them are identical and the third glass is possibly different. The judge is asked to pick the glass with the different beer. The null hypothesis is that there is no difference between the beers and that there is just a labeling difference. The judge can choose "correct" or "wrong", "success" or "failure". This makes each triangle of three glasses a Bernoulli experiment. And thus the sum of "correct" answers will be binomially distributed.

But it is only under the null hypothesis of no difference, that the sum will be binomially distributed. Each test, each triangle of three glasses har the success probability of p =1/3; the guessing frequency.

If you have 50 triangles then the "n" is 50. Of course the expected number of "correct"answers for the binomial distribution is n*p= 50 *1/3 which is about 16.66. So you would expect to get about 17 correct answers under the null. The variance is of course n*p*(1-p). The binomial distribution is well approximated by the normal distribution (for large n*p). So simple margins of error can be constructed by n*p +/- 1.64*sqrt(n*p*(1-p) = 16.66 +/- 1.64*sqrt(50*(1/3)(1-1/3)) = 16.66 + 5.46 = 22.13.So the table with 23 correct answers for significance seems correct.

But the above calculations is based on the normal approximation. The square of a standard normal variable is chi-2-distributed, so that can be used for tests.

But that just makes some unnecessary complications. It is better to base the calculations directly on the binomial distribution.

In the R program below it is shown that it is needed 23 or more correct answers for it to be statistically significant at the 5% level.

# set the size of the experiment
n <- 50

# under triangel test set p = 1/3
p <- 1/3

# under doutrio  test set p = 1/2
# p <- 1/2

# plot the probabilities under the null hypothesis
plot(0:n, dbinom(0:n, size=n, prob=p ))

# It looks very much like the normal distribution

# plot the distribution function - the ackumulated probabilities (under the null hypothesis)
plot(0:n, pbinom(0:n, size=n, prob=p ))
abline(h=0.95, col="red")

# compute probability of 23 or fewer correct anwers under the null hypothesis
pbinom(23, size=n, prob=p, lower.tail = TRUE )
# [1] 0.977811

# compute probability of more than 23  correct anwers under the null hypothesis
#i.e. 24, 25 or 26 etc
pbinom(23, size=n, prob=p, lower.tail = FALSE )
# [1] 0.02218901

# compute probability of 22 or fewer correct anwers under the null hypothesis
pbinom(22, size=n, prob=p, lower.tail = TRUE )
# [1] 0.9576113

# compute probability of more than 22  correct anwers under the null hypothesis
#i.e. 23, 24, 25 or 26 etc
pbinom(22, size=n, prob=p, lower.tail = FALSE )
#[1] 0.04238871

# compute probability of 21 or fewer correct anwers under the null hypothesis
pbinom(21, size=n, prob=p, lower.tail = TRUE  )
#[1] 0.9244261

# compute probability of more than 21  correct anwers under the null hypothesis
#i.e. 22, 23, 24, 25 or 26 etc
pbinom(21, size=n, prob=p, lower.tail = FALSE )
#[1] 0.07557392

Install R. It is free and the best statistical program. (You can also install the free program RStudio, but that will take another 15 minutes.)

Copy each line of code above into R to rerun the program. Change the "n" and see what happens.
Ah, thank you very much! Now it's all coming back to me! I'll have to try that new program you suggested but for now I got it to work in Excel.

Thanks again for all of your help! :yup::tup:
In post 1 the question was how to reject the null hypothesis and conclude that the new product is different.

It's an experiment to see if we can brew the same batch of beer with cheaper hops without a noticeable change in taste or smell.
In post 5 the purpose is to accept the null hypothesis and "prove" that the products are the same. Of course, that is not possible.

It is possible to show that two products are "essentially the same" (to use a fuzzy expression) by the the methods of "bio equivalence". But that is another story.
Whoops I must have mixed myself up. I understand what you mean (it has been quite a while since I did this; thank you for your help and double checking me!)

Now I'm starting to second guessing myself haha... would you mind checking the survey that I'm planning on distributing to each participant?

Survey #___________
Beer Experiment

Please take a few minutes to taste and smell all three beers.

1. Choose the beer that is different than the other two:

Green Blue Red

2. How certain are you that you correctly identified the different beer?

1) Absolutely 2) Somewhat 3) Not At All

3. Does the different beer taste better than the other beers?

1) It Tastes Better 2) No Difference 3) It Tastes Worse

4. Does the different beer smell better than the other beers?

1) It Smells Better 2) No Difference 3) It Smells Worse

5. How does the different beer vary in taste compared to the other two?

6. How does the different beer vary in smell compared to the other two?

Thank you again for all of your help! :tup::yup: