Determining which data set is most like another

#1
I am new to this forum - so let me apologize in advance if this is the wrong location. Given a specific multivariate data set, and numerous other data sets, I want to find the one most "like" the original. So, for example (this is dumbed down for the sake of space), which set (B or C) is most like Set A? I am having trouble finding any information on how to do this, but also feel like it should be straightforward. I have done a chi squared test on (A,B) and (A,C) and have the p-values from both, but I'm unclear on how to compare the two, or if you even can compare them. Any help would be greatly appreciated!

Set A
50 hot dogs
30 hamburgers
20 veggies

Set B
55 hot dogs
25 hamburgers
20 veggies

Set C
110 hot dogs
50 hamburgers
40 veggies
 
#3
Of course I wasn't thinking simply enough! Thank you! But squared Euclidean distances don't take into account the different sizes of the data sets.
 
#4
I'm still struggling with this problem based on total size of data sets. Is there no way to say "A is an 80% match, B is a 77% match, etc.". I really feel like I need to include a statistical comparison.
 

Dason

Ambassador to the humans
#5
To deal with the sample size issue you could just normalize all of your data so you're looking at the proportion of the total instead of the raw counts.

So instead of having A = {50, 30, 20} convert it to {50/100, 30/100, 20/100} = {.5, .3, .2} so that all the elements sum to 1.
 
#6
I started out with using percentages but then you lose the fact that one data set is much larger than the others, which in this case, counts.
 

Dason

Ambassador to the humans
#7
Ok counts for what though? Ideally what would you see as a distance between these three data sets? Can you describe some properties you would want this distance measure to have for this data? It's hard to suggest anything else without knowing what you want.