1. There are 20 independent groups. (A1, A2, ..., A20).

2. Each group has 20000 samples. (s1, s2, ...., s20000).

3. Each sample has 7 features. (x1, x2, ..., x7).

If two samples from different groups have similar features, i.e. |x1-x1'|<0.01 ... |x7-x7'|<0.01, we consider that these two groups have similar behavior at that point.

After study some basic statistic text books, I know that if each sample has only one feature, then I can use hypothesis test to compare two groups. But I don't know how to do the test on the sample that has 7 features. I can only come up the idea that compare each pairs. For example, if A1 has 4000 samples are similar with the samples in A2, then I will say the similarity between A1 and A2 is 4000/20000= 0.2.

I know some people in my field use the mean values of the 20000 samples, so each group has only 7 mean values. Then they apply PCA and K-NN clustering algorithm. If two groups are in the same cluster, they will think these two groups are similar. But it is a very rough result. I think similarity can be defined more precisely in this problem. I need some suggestions about how to define and compute the similarity.

Thanks for your help