Winning Boxes Statistics


I am trying to look at the statistics for the wins as a proportion of starts from the 8 start boxes (numbered 1-8) in greyhound races. There are 2 different distances I am comparing so I want to know whether the 385m versus 390m has any significant difference in the proportion of wins per starts in both individual boxes and box combinations.

The problem I have is that there are 8 start boxes in every race but not every race starts with 8 dogs - some could start with as low as 4. I have not not been given the data for box fill structure or patterns.

For comparing individual boxes I thought I could compare via Chi-Sq the number of wins/starts from Box 1 in the season using a 385 race v the season using the 390 race. Then carry the test out 7 more times for the remaining boxes?

I am finding combinations more tricky. I have been asked to compare 1-4 wins/starts in 385 v 390, 5-8 wins/starts in 385 v 390, outside boxes (1,2,7,8) over the two distances and inside boxes (3,4,5,6) over the two distances.

If I were to calculate the number of wins in a box combination as a proportion of the number of races, I would not taking into account that box fill may have been distributed unevenly between the 385 and 390 (ie. if in the 385 4/4 started in box 1-4 for every race, I expect to have more winners than if in the 390 only 1 dog started in 1-4 each race)

So in preference I need to calculate wins/starts from the box combination. But because only 1 dog can ever win, if you account for the total number of starters in each box combination without knowing the number of dogs in each race, each outcome is not really independent (ie. if you have 8 starts in boxes 1-4 from 2 races, only 2 dogs can ever win … if you have 8 starts in box 1-4 from 4 races, you can have 4 winners or if you had 8 starts in box 1-4 from 3 races, you could have 3 winners).
Basically that would mean that if a dog in a "box group" wins then having other dogs in its box group in that race will only make the proportion of winners per start smaller. I cannot guarantee that the box fill pattern would be the same over both distances.

Am I overcomplicating things or is there a way around this without knowing the composition of the box starts and number of races? My suggestion was to exclude all races without full fields as the easiest way of overcoming this issue?
There are 8,7,6,5, and 4 boxes. Box # starts with 1. Count, for each, # wins. So, with 8 boxes, #1 box had n1 wins, #2 box had n2 wins, …; and the sum of n1-n8 = tries.
Then I'd use a chi sqrd test, where with 8 boxes, Exected = 1/8 * tries. If #tries = 96, tries per box = 96/8 = 12 = Expected vs Observed.
So, 8,7,6,5,4 = 5 tables and 5 chi squared ctiticals, and you know a lot about box# and wins. Then make 2 tables for 2 distances. My guess is that the # boxes is distributed Pareto.
If you don't know boxes/race, I'm stumped.