Undergrad Research Ideas

I'm an undergrad studying Linguistics and Computer Science, and I'm hoping to do some stats-related research this quarter.

The problem is, my stats knowledge is very limited. I've only taken AP Stats (way back in high school) and I just finished an upper-division Intro to Probability Theory course. I find the subject really interesting though and would like to find a way to apply what I've learned.

I was thinking of doing something with WordNet, or some other language-related data set, but I'm not sure what research questions would be feasible given my background. Any ideas? Or is it not possible to complete a good, thorough, and interesting research project with so little stats experience?

Thank you in advance.


Probably A Mammal
From an applied perspective, you're going to need at least 2 things: data and methods. A third would be tools, but that sort of goes with how you approach your methods or handle your data. If you have a computer science background, I'd recommend the free open-source R statistical software (an interactive programming language like Python with emphasis on data manipulation, analysis, and visualization). Linguistics isn't my area, but Trinker should chime in around here for word analysis as he's developing an R package that does just that. A basic research question in statistics might be something to do with variation, error, and frequency. For instance, how often does a certain word appear in a transcript, publication, speech, or twitter feed, for instance. These sorts of analyses feed into interesting research questions I've watched lectures on, such as how frequently does "god" get invoked in political or presidential speeches? Alternatively, you might record frequency of a commercial topic, phrase, or word, the channel it appeared on (or radio broadcast), the time it occurred, etc. This can pose a model to predict how likely it occurs. If you have data on purchasing frequency, you can then run analyses on marketing efficiency. Of course, I don't know what data you can get access to or how big the project you want to do will become. I'm just spit balling some ideas here.

Your hypothesis should drive your choice of data and methods. It begs the question, however, what your hypothesis should be! Some exploratory analyses should feed into that. Go out and see what data is out there. Your methods will test the extent of your knowledge and what you might have to learn. We can always help you on your methods here, but the grunt work is of course on your end.