Can a ratio of performance be used to measure dependence and assign weights in a weighted average?

#1
With only a modest grasp of inferential stats and maths in general, I've been wondering how one might derive the weights for a weighted average from a variable that isn't as straightforward as a frequency score. Consider the following scenario.

Let's say I have measurements of effectiveness for a set of wheat fertiliser products (such as a growth rate), along with measurements of various associated behaviours in the crop (improved absorption levels, nutrient conversion rates etc.) Observations are made at many farms across the globe and throughout the year so conditions vary, hence the need for historical averaging in an analysis of the products. Also, notably, the effectiveness of a product and its behaviour profile often vary a lot from one wheat variety to the next with some products working better on variety A, others on variety B, or variety C etc.

Therefore, if I want averages for the various readings of a particular product on variety A, I could simply exclude all observations made on other varieties from the analysis. However, let's assume the amount of data is limited so that excluding those observations would be too costly. In that case I wouldn't want to treat all of the observations for the product in question with equal relevance since observations on non-A varieties would carry less 'weight'. Taking weighted averages would of course address this obstacle, but how to go about calculating the weights here?

My first idea would be to simply use the ratio of a given product's average effectiveness between variety pairs. For example, if a product's average effectiveness is 50 units on variety A and 40 units on variety B, and I want averages of the behaviour readings of this product on variety A, could the weight assigned to observations on variety B be calculated as follows: 40 / 50 = 0.80 (where observations on variety A are assigned a weight of 1)? But then what if the scores were reversed (40 units on variety A and 50 units on variety B) - would the following work:

50 / 40 = 1.25 ,
1.25 - 1 = 0.25 ,
1 - 0.25 = weight of 0.75 ..?

My problem (confusion) with this method is in understanding how a difference in effectiveness (as presented here) can be directly used as a measure of dependence between datasets. If a product is 20% more effective on variety A than on variety B, is there any mathematical reasoning that leads to the conclusion that observations on variety B are 20% less relevant in an analysis of the product on variety A?

Compare this with an alternative approach. I believe I could find the correlation between the average effectiveness scores for each pair of varieties across all of the products (assuming there are enough products) and use the correlation scores directly as the weights, since correlation certainly is a measure of dependence. But wouldn't this technique blanket over the individual differences between the products in a way that the former approach would not, leading to less accurate weighted averages for a given product?

Sorry if some major rookie errors are overcomplicating things here but my head really hurts! Any pointers would be appreciated.
 
Last edited:

noetsi

Fortran must die
#2
I think the classical way to do this is not through weights but block designs, but I am guessing that did not occur. I have not seen variables not in the model controlled for by weights this way.
 
#3
Thanks for your input, I hadn't come across 'blocking' before.

Please correct me if I'm wrong but from what I gather this would involve splitting the data. If so, the purpose of assigning weights in the scenario in the OP would be to make use of a larger amount of data as splitting into subgroups by wheat variety would be too costly (i.e. insufficient observations leading to averages that are less accurate than weighted averages using all available data). Can you think of any fundamental problem with weighting the data for this purpose?

I've tried to find examples of various weighting techniques but with little luck. Which is very weird to me because I would've thought that weighted averages are far more versatile and useful than this would suggest.
 

noetsi

Fortran must die
#4
Blocking is a design common in ANOVA and used with crops a lot. An example of it is growing something in different types of soils, the differences are the basis for the blocks. I guess that would be splitting the data, it deals with a variation in the data. I am not enough of an expert in such designs to comment on them.

https://www.statsdirect.com/help/analysis_of_variance/randomized_blocks.htm

I have never seen what you propose discussed in the literature so I do not feel competent to comment on it.
 
#5
I see - thanks for the info and I appreciate the help, although I'm afraid the parallels are purely coincidental! I don't have a much of a statistical background so I'm unfamiliar with a lot of the common knowledge. On the plus side, I'm learning a lot from my self-help stats books.

It seems to me that averages are more or less what statistical analysis is all about, and I'm finding it hard to move on without getting the weighting business in the OP straight, even if it's a just a theoretical exercise. If I can clarify anything for anybody please let me know.