Sports data set


New Member
I am looking at sports team level data (summarized by average in each season) over several seasons and would like to predict/classify the winner of the championship. In a single season, the data has many more variables than observations (for example, 30 variables but only 10 teams). Would it be reasonable to treat the same team across two seasons as two different teams?
How many observations do you really have though? If you have 10 teams and stats on each of those teams over several seasons, you should have a row for each of the teams each year, correct?


New Member
Currently I have data on 20 teams over four seasons. Each team has a separate row for each season (total of 80 rows). Since it is the same team at a different time point (season), is it statistically appropriate to treat the same team across the four seasons as independent observations?


You can't treat the same team as "different" teams because they're correlated and mess up your analyses. You'd need to do some type of longitudinal modelling line time series or mixed-effects models to account for the repeated instances over time.