Weightings Problem

I have been given an EXCEL model and my assigned task is (only) to fix it for problems that I might call "unintended consequences" of the underlying math. Currently, the model tries to predict how good a manager is likely to be, based on a weighted average of total scores from 5 categories. The 5 categories, in turn, contain multiple variables each which are related to goodness. The weights (‘importance factors’) of the categories have been pre-selected based on intuition as to what should produce a good manager, having nothing to do with how well the variable can discriminate between managers, for example if it has low variance among managers. Also, the variables within the categories (sometimes 5 variables, sometimes 7, 10, etc) have been pre-selected and are not to be second guessed. This is the way they want to do it for now, so please let that lie!

As will become obvious below, the numerical value of the resulting weighted average scores are pretty meaningless. But, in my mind, if one manager scores 60 and another 50, it is OK to conclude that the former is (likely to be) 20% better than the latter. This is what my end user cares about: who in the population is better (even if they are all bad, he will choose the better ones) and by how much (‘betterness’). Still, we should recognize that the betterness is a derived quantity, based on variables that measure only goodness. So that we may focus on ‘unintended consequences’ due to poor math only, I hope you can take all the above as a given! So, let’s assume that the weights (which have nothing to do with ability to discriminate, mind you) and the variables have been surmised divinely and are perfect!

The scores in the five categories are summed from the scores of multiple variables. The latter are always non-negative and a higher score is always better than a lower score. Each is intended to represent some characteristic that is likely to produce a 'good manager'. So, none of these variables have explicit weighting or ‘importance’. Obviously, points from one variable are just as helpful as points from another, within a category. Currently, each variable can have a score of 0 or 1, 0, 1, or 2, or 0, 1, 2 or 3, actually sometimes only 0 or 2, or 0 or 3. They think that the maximum amount (not average, mind you) of points allowed for each variable can account for the implied 'importance' of each variable. I wish they could all be put on the same scale (say from 1 to 10), but my end user also, at least for now, is inflexible on this. Some of these variables are subjective things, and, in some cases, he only sees two results (hence 0 or 1, or 0 and 2, or 0 and 3) and in other cases, he thinks there could be 4 results (0, 1, 2 and 3). The model’s creators felt that, by allowing a maximum (not average, mind you) of 10 points in each category, intuitively spread across the variables, it would be as if they had pre-specified the importance of the variables as well. And, if this worked, the category weightings, which represent ‘importance’ at the higher level would produce the intended results as well.

My qualms with this model, as it exists, are that, though the max points in each category is the same, the average points may not be. Even, if the averages were the same, the standard deviations may not be. Also, the ranges for the variables differ from variable to variable, some 0 to 1, some 0 to 3.

I have concluded, perhaps erroneously, that, it would be better if they would pre-assign the weightings of each variable, just as they have already pre-assigned the weightings of each of the 5 categories. Then, if we want them to truly represent importance, we need to divide each variable by its average value. But, even if I can’t get them to switch to explicit weightings, if I divided these variables by their average value, I figure that they will still get the ‘importance’ expected (which, right now, they think they are getting by specifying maximum points for each variable). What I don't like about this approach is that, if I change the score of one manager, it will slightly change the score of all managers and my end user may not like that.

I have also concluded, perhaps erroneously, that I should NOT attempt to control for standard deviation. I figure that, if a highly important variable (in terms of producing a good manager) had almost zero variance, then that variable, however important, would contribute almost nothing toward the ultimate goal of deciding who is the best manager (and by how much). Nevertheless, I have concluded that this is how it should be, since we (only) DERIVE the decisions as to who is a better manager (and, how much better in %) from assumptions about what produces a good manager. For example, if all managers score the same on a variable that is 99% important in predicting the goodness of a manager, the conclusion should be that all managers are virtually the same, with small distinctions being made from the other variable that is 1% important (and presumably has positive variance).

In order to power through this confusion, I figured there must be some analogy to how this would be done via linear regression, Also, someday, I hope to convince my end user to do this rigorously, rather than intuitively. I googled a bit and found stuff about standardized regression. It suggested that, with standardized regression, your multipliers really do represent relative importance, which would be great, though that importance may also reflect how good a discriminator the variable is, not just how important the variable is toward being a good manager. One URL talked about dividing by sigma, in another, it talked about dividing by sigma after first subtracting the mean, a familiar concept. Are these two different things or do you think the first was just being lazy with their wording? Even if not, they are both different from my approach which divided by the mean value but does not normalize for standard deviation.

Can anyone help me through this confusion, please? What I am permitted to do, behind the scenes, is cause the model to normalize, scale, or translate things. Sorry, this post was so long. Please ask questions if it is confusing.

Last edited:
Comments of Reply

The adjusted rank is not a crazy idea at all. I had thought about it myself. My reason for not selecting it is that ranking (as compared to scoring) sacrifices some precision. However, given that most of the scores are merely points, I'm not sure there is that much precision to begin with! Still, I don't see that it is any better than dividing by the mean ...

More importantly, this approach will be no less confusing (to the typically uninformed user) in that changing one manager's score will also change another manager's score. If it didn't, it couldn't change the score of the manager in question. Once again, no better than dividing by the average.

Substracting the mean or not have no influance on that 'multipliers really do represent relative importance'.
I'm not sure if you've inadvertently left out a "not" after the word "do" in this above quote. Nor am I sure if you're talking about standardized regressions only. I know that, with regression, the multipliers do not represent importance, except perhaps in one of these standardized regressions. I know that in the existing system, the chosen multipliers (at category level) are thought to be the relative importance (of that factor toward being a good manager, but not a good discriminator) but that, if the mean values are different, I think it is flawed thinking.

I'm still hoping for answers to the following questions:

(1) Do people agree, given that the variables measure goodness (not betterness) only, that I should not try to 'control for' the standard deviations?

(2) Do people agree that dividing 'importance' by the average value is a good way to ensure that the desired importance finds it's way to the final scores which are supposed to measure 'goodness' of the managers? Stat3's adjusted ranking would work too.

(3) If somehow I wanted to recast a (standardized, I think)correctly-done regression into the kind of formulation I have, could that be done? If so, would that answer all my questions above? Would the regression insist on choosing variables that were good discriminators, not just good indicators? For example, if a potential variable, intuitively, were a good predictor of manager goodness but, as it turned out, every manager had the same score on that variable, would it be tossed out of a (well-done) regression?

Thanks for any help!
Last edited:
In what follows, assume that a variable can have choices of 0 or 1 if there are only two logical scores (red hair which is bad, not red hair which is good); 0, 1, or 2 if there are three logical choices (bad, medium, good) and 0, 1, 2, and 3 (worst, bad, medium, good) if there are 4 natural choices. So while more points is always better, the points of one variable may need to be weighted to reflect the true importance of the variable. The score choices really only matter in terms of the natural divisions of the variable in question. Getting a 3 in one vairbale ins't necessarily better than getting a 2 in another, if both are the top scores.

<< The multipliers represent importance when you divide variables values by std. deviation regardless of mean values. >>

Can you explain how/why this is true? What you really are saying is that if variable 1's value is 3 times its standard deviation that's just as good as variable 2's value being 3 times ITS standard deviation, assuming these are to be equally weighted. I would think that one would need to first subtract the mean before this would have any chance of making sense. Don't you agree?

Similarly, for dividing by mean values, I am saying that, if variable 1's value is 3 times its mean that is just as good as variable 2's value being 3 times its mean, assuming they are to be equally weighted. Clearly, if variable 2 has a low variance, this is next to impossible. So this may be a bad system. Yet, as I noted before, if all managers score about the same in a variable (even an important one), the variable won't discriminate much after turning the crank, but perhaps that is the way it should be. What dividing by the mean does assure me is that the total number of points awarded in each variable equals its weight ('importance'). This seems to make sense when all one is doing is collecting merit points toward goodness. I will ultimately use goodness scores to DERIVE how much better one manager is than another, so I don't think the choice of the best variables needs to worry about how good they will be as discriminators.

Regarding your example, I'm glad you suggested it as it underscored the difference between my ideal system and normalizing by average values. My intuition is that, if all variables were scored on a scale of 0 to 10 and all had an average score of 5 (possibly it would be sufficient that their averages all be the same, though not necessarily 5), this would be the perfect ('ideal') system for THEN applying weights. This is what I really have in mind. However, my client doesn't find 0 to 10 ranges intuitively workable for some variables (e.g., ones that are of a binary nature), so I can't do that. But what I can do is rescale his raw scores behind the scenes, and then weight those!

Using raw scores, when they are on different scales (I'm assuming that such is the case in your example, as opposed to it just being some odd small sample results) makes no sense whatsoever to me, so I reject that solution easily.

I figured that, by dividing by the mean, it should be similar to my ideal solution above but, since the average for each variable is still different (in this case x1's average would be 2 if on a scale of 0 to 10, x2's average would be 8), this is not a good proxy for that, I now realize; unless I can force the inputters to think about keeping their averages close to the mid-points and, the way they are choosing some of these variables, I doubt I can do that.

If I assume that 1 is the highest score you can get on variable 1; and 4 is the highest score one can get on variable 2 in your example, then if I scaled them both to a common scale of 0 to 10, I would conclude that every manager was the worst in one variable (score of 0) and the best (score of 10) in the other and, since they are equally weighted, all managers are equal, That's what my intuition tells me should be the right answer.

Surely there is some way to suggest that various characteristics (variables) contribute (never negative) points toward goodness and, if we somehow normalize the scores, we can then apply weightings to suggest which are more important toward determining goodness and which are less important. The question is then one simply of choosing the best normalization scheme. I just can't seem to figure out the most analytically sensible way to do this. It seems like there should be a "right answer" (or one agreed upon by 9 out of 10 experts) to this question, without even having to ask what the variables are, as opposed to telling me to decide which makes the most sense to me. What am I missing? Is a weighted average scoring system just not a well-defined structure that any experts have studied?

What are we talking about here?:) coefficients in linear regression? 'multipliers' DO NOT DEPEND ON MEANS! proof yourself using simple example.
I'm just making the observation that an equation of x1/sd1 +x2/sd2 = score (here, I'm assuming equal weighting to the 2 variables, x1 and x2), in effect says that, if x1 is, say, 3 times its standard deviation sd1, then it contributes the same amount of score points as if x2 is three times its standard deviation sd2. I then ponder why this formulation makes any sense if one variable ranges from 0 to 10 and the other ranges from, say, 0 to 100.

Adjusted ranks will give 'right' result here.
Yes, they would. And my heart of hearts tells me that this is the way to do it. Then I have to grapple with the sd issue. If one variable has a tiny sd, do I assign scores from 0 to 10 (using the tiny differences), or do I confine my scores to, say, only 5 and 6, so that the variable is almost never an ultimate discriminator? Also, do I try to make sure the average score (if the entire population was used) is about 5, in each variable, or is that unimportant.

Maybe someone who designed this ranking method did the same what you want, with 100 experts.
My experience tells me that this kind of system is used often because it has rudimentary appeal, and perhaps it is often done poorly. I think that, if you score every variable on the same scale, and perhaps use your weightings to compensate for low standard deviations (something I'm trying to leave out of my weightings), you can sidestep many of the nasty issues I've raised.

Does anyone else out there want to give their thoughts? Stat3 shouldn't have to bear the brunt of all my insanity!

I'm not sure my standard deviation query was that far off the mark after finding the following link:


From this and some other similar links, it appears that the naysayer experts believe that the comparative predictive importance of various variables cannot be well-defined out of context, that you can't really consider a one sd change in one variable to be equivalent to a one sd change in another. The standardized regression coefficient advocates apparently aren't bothered by this, and it is acknowledged that most people aren't either.

Last edited: