Predicting population composition

Bormon

New Member
#1
I feel like I should already know how to do this, but I can't find anything on the internet. I am hoping this forum can help. What I trying to do is predict the ratios of a population composition (all independent) given a history of that composition.

To simplify the problem I will use shirts as an example. I have an unknown (though predictable) group of people. They each where a shirt of one of eight colors. I have historical information of the percentage worn for each color for each group. (Lets say 6 groups)

What is the best method of predicting the composition of the next group?
 

hlsmith

Not a robit
#2
So you have historic sample data that you want to use. Well crudely you can just use those proportions. Do you have secondary characteristic data, age, sex, etc. If so you taken your prior data and sample from it using characteristics of new sample or just randomly to get the variability for a new sample. So bootstrap to get percentile confidence intervals.
 

Bormon

New Member
#3
So you have historic sample data that you want to use. Well crudely you can just use those proportions. Do you have secondary characteristic data, age, sex, etc. If so you taken your prior data and sample from it using characteristics of new sample or just randomly to get the variability for a new sample. So bootstrap to get percentile confidence intervals.
I m not sure I follow, or maybe I am making it more complex than it really is. I have data for 6 previous groups. The groups are of different sizes (specifically declining in total number). Each group has a different mix of "colored shirts". The choice of the shirt is an independent choice of no other variables. (Although that is not completely true, but to simplify the predictive model I am assuming it is true). I am trying to predict the "most likely" mix of "colored shirts' for the next group.

My simple example is a substitute for what I am really working on, which is a tuition and fees revenue prediction model for a community college. I have 20 points of data. I have a mix of In county, Out of county, Out of State and International crossed by rates for part-time and full-time assurance rates for 4 years . So for example, I have a percentage of each of the previous six year of In-County Part-Time, In-County Full-Time Current Students, In-County Full-Time retained 1-year students, ... , International Full-Time retain 3-year students.

The administration wants to give me the number of credits per term and I provide them with a revenue prediction. There is a lot to it (which I have mostly figured out). The main thing I am having a problem with is divining a good way of predicting the mix of each category.

My current prediction method seems crude to me, and I am hoping for a more elegant solution.

What I am currently doing is predicting the most likely number of students in each group by taking the median value and trending it by one standard deviation in the direction it is currently moving in. I use this "predicted" number to calculate a total and "predicted" ratio for each group, then apply that ratio to the enrollment totals that administration is providing me.