Correlation based on multiple measures / observations

stmaurice

New Member
Hello everyone!

I am hoping someone could offer me some wisdom and a few R tricks.

I was lucky enough to get a pretty interesting data set for a study that I am doing. There is approximately 5000+ rows of data spanning 5 years for 100 different sales people. The data is structured per below.

User || Month || NumberOfClientsSeen || AverageSatisfactionScore || Sales || Leads

I want to see if there are correlations between the NumberOfClientsSeen, AverageSatisfactionScore, Sales and/or Leads. My first inclination would be to load up a Matrix into R, and run RCORR from the Hmisc libary.

> library(Hmisc)
> cordata <- data[NumberOfClientsSeen, AverageSatisfactionScore, Sales, Leads]
> results <- rcorr(as.matrix(cordata), type="pearson")
## hourray for R, N and P-values! ##

I realize that this probably isn't correct. I know my users have a ton of variability and that there is seasonality in my data from month to month. The correlation values I'm seeing could be representations of my months or my weird users, since these groupings / clusterings aren't being taken into consideration in my correlation analysis; the way I am analyzing my data right now, I am effectively saying that all months are equal (which they aren't) and that all users are pretty much the same (which they aren't). I need to take users and months into consideration in my analysis.

My hypothesis / goals still relates to investigating interactions / correlations between the NumberOfClientsSeen, AverageScore, Sales and/or Leads.

Is there an way to do a correlation for my measures that take the clustering/variances of users / months into consideration?

Should I be adjusting my data before running rcorr, and removing monthly / user variances?

Perhaps I should be averaging my data somehow to group my data points?

Any insights are welcome, and appreciated.

Thanks!

- JSM