# Comparing means of frenquencies

#### everylittlestep

##### New Member
Hi all,

Doing a comparison of keyword search frequencies and need some help determining which test to use to compare means:

I was given a list of 200 keywords, and the average number of times they were searched in every month of 2014 and 2015. Took the yearly average for all 200 and want to compare the annual means.

Which test do I use for this? Unclear, as I don't have a base-size (presumably all internet users?). My first thought was that my 'n' would be the number of keywords (200) and I would use a paired t-test, but that doesn't seem right... Any help would be much appreciated!

Thanks--

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Well if you are looking at a word that would be count data and could fall under the poisson distribution. Though with high counts and sample sizes it approximates the normal distribution. So ttests may be applicable. Are the two dataset dependent or related in any regards?

#### everylittlestep

##### New Member
The datasets are the same list of 200 keywords; basically, the mean number of searches for any of the keywords was higher in 2015 than in 2014, so trying to figure out how to tell whether the growth was significant.

Thank you!!

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Can you look at them as rates per all searches?

#### everylittlestep

##### New Member
Can you look at them as rates per all searches?
Oh that's interesting; so 'n' would be the total number of searches that year, and my 'data' would be the average quantity -- so in that case would it be a paired t.test, then?

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Why do you reference "paired "? Back to my question if observations are related or if you can match a person's searches.

#### everylittlestep

##### New Member
That's the trouble I'm having; I don't have any data about the people searching, all I have is the list of keywords and the number of times those words were searched per month.

#### hlsmith

##### Less is more. Stay pure. Stay poor.
So you could have a scenario where some dude shows up every and all day searching for a topic. I really want to put a crude example here from a billion dollar online enterprise, but refrained. Now you are unable to control for this dude and his predilections and weight his disproportionate searches in the dataset. Would that be an issue for you? In most stats models you want independence in observations, it is a common assumption.