Help: Correlation between independent variables; general comments on methodology and approaches

scrubfest

New Member
Hey!

I am trying to assess win percentages in League of Legends based on just creep score differences. If you don't know the game or what I'm talking about, it's probably fine. My hypothesis is th at a high positive difference in CS increases your chances of winning - intuitively this is true.

My regression looks like this: WinPct = α + β*CSD0-10 + γ*CSD10-20 + δ*CSD20-30 + ζ*CSD30-end+ ε

Problem 1: There is correlation between the variables because of the game mechanics snowball effect. That is: a difference between cs in minutes 0-10 is positively correlated with the later difference 10-20 etc. Is this autocorrelation, or am I misunderstanding the term?
*CSD20-30 is negative, but it is positive when it is measured alone. This has to do with the correlation I assume?
What should I do about that?

Problem 2: Some games end before 20 minutes. Some end before 30. Some end after 40 minutes. What value should I assign the games that end so early? I cannot assign 0 values as it means the difference is gone. It is quite possible that the game ends early _because_ the lead generated in 0-10 and 10-20 was so significant.

* I have focused on <20 minutes because of these issues and look at accumulated CSD at 20 minutes: SUM(CSD0-10+CSD10-20). Is this smart? Are there some immediate issues that you recognize?

PS: I have taken no courses on Regressions. Thank you for your help and sorry if this is horrible work.