I'm very new to the world of statistics.
Could someone explain to me in layman's terms the following:
I have complete data on a population size of ~37,000. I can show, across the entire population, that there are trends between certain variables and the amount of debt (e.g. age, length of tenure etc).
What I'm trying to do is to establish whether there is a statistically significant variation in the effect of these independent variables on debt between geographical location.
Partitioning the data into geographical location results in sample sizes of between 300 and 3000 records. Again dividing these data into age ranges e.g. 25 to 35, 36 to 45 years old etc further reduces the size of each sample.
I've produced a number of graphs using seaborn/matplotlib e.g. regression/scatter plots, grouped column graphs and the results fluctuate especially with smaller sample sizes (e.g. smaller geographical locations).
What I'm trying to establish is is there a statistical test I can carry out that indicates whether a change between geographical location (e.g. average debt for 25 to 35 year olds between location A and B) is statistically significant or due to random error introduced by the size of an individual sample.
I hope that makes sense. Any advice would be gratefully received.
Many thanks
Could someone explain to me in layman's terms the following:
I have complete data on a population size of ~37,000. I can show, across the entire population, that there are trends between certain variables and the amount of debt (e.g. age, length of tenure etc).
What I'm trying to do is to establish whether there is a statistically significant variation in the effect of these independent variables on debt between geographical location.
Partitioning the data into geographical location results in sample sizes of between 300 and 3000 records. Again dividing these data into age ranges e.g. 25 to 35, 36 to 45 years old etc further reduces the size of each sample.
I've produced a number of graphs using seaborn/matplotlib e.g. regression/scatter plots, grouped column graphs and the results fluctuate especially with smaller sample sizes (e.g. smaller geographical locations).
What I'm trying to establish is is there a statistical test I can carry out that indicates whether a change between geographical location (e.g. average debt for 25 to 35 year olds between location A and B) is statistically significant or due to random error introduced by the size of an individual sample.
I hope that makes sense. Any advice would be gratefully received.
Many thanks