# Normalizing traits/statistics based on trait count for each trait type

#### wenigi4535

##### New Member
My goal is to create an app that calculates rarity scores of traits.
The rarity score is a formula like so:

Rarity score = 1/(%Chance Of occurrence)

Let's say I have a trait that has 10% chance of occurring.

The rarity score for this trait will be:

10 = 1/(10%).

This score will be without trait normalization.

What I am trying to find out is how the process of trait normalization (or rarity normalization) is done.

From my research the normalization takes into account the amount of traits in a specific trait type.

Let's say we have two trait types:

Trait_Type: Hair-Color
Value: Green 1% Score 100
Value: Blue 99% Score: 1

Trait_Type: Shirt-Color
100 traits all having 1% chance of occurrence.

When we use the rarity calculator above all values of shirt colors will get the same 100 score as the score of a green hair-color.

This is not accurate, when we have 100 traits (or many traits) obviously they will have lower percentages granting each trait a higher score.

In reality each shirt-color isn't really worth because all have a 1% chance of occurring.

On the other hand the Green background color is really worth.

My goal is to introduce these differences and add trait count for each trait_type into account so when we score those traits the green will show way higher than a shirt-color.

The information I know is:

The chance of a trait happening.
The rarity score of it.
All the data about trait count (Trait type count, traits amount inside the trait etc..)

The farthest I got is:

Vanilla_score = 1/(%Chance of trait happening)
Normalized_score = (Vanilla_score*Avg number of traits per trait_type)/traits in category

This will not result in an accurate enough score.

If we take a trait_type called: Flair
Value: hijab
Avg Trait_count per category: 13.1875
Trait_category_count: 16
Trait_count_for_flair_category: 40

The trait has a 0.44~% chance of occurring.
With the vanilla score it will give it a value of: 243.87

With this method the normalized score will be: 80.4

On the site I want to replicate the score is: 35.87

What are other calculations that can be done to take into consideration the traits per trait_category into account?

(If any data is missing let me know and I will add it.)