Hi. I'm working with data that may get too large.
I'm wondering if there is a way to keep the data relatively accurate and manageable by deleting "insignificant" data.
For simplicity, let's say it's a top 100 list of all-time (from when our list was started).
Users enter their favorite song.
The vote is ongoing.
Saving multiple data sets (such as by year) is not an option.
My first thought was capping the total unique songs at a certain number.
Yet, once the cap is reached, new unique songs would never make it on the list.
So I wondered, when the cap is reached, cutting the lowest ones to make room.
For example, when unique songs reach 1000 (cap), the lowest 500 (cut) would be removed.
This cycle would repeat every time the cap is reached.
Yet, I do not know if this gives a nearly un-surmountable advantage to the surviving songs?
Thus, making the list statistically worthless.
Is there a formula to know what sort of "cap" and "cut" is needed
(to give new entries an equal chance to climb their way to the top)?
Thanks for any insights on how to tackle this.
I'm wondering if there is a way to keep the data relatively accurate and manageable by deleting "insignificant" data.
For simplicity, let's say it's a top 100 list of all-time (from when our list was started).
Users enter their favorite song.
The vote is ongoing.
Saving multiple data sets (such as by year) is not an option.
My first thought was capping the total unique songs at a certain number.
Yet, once the cap is reached, new unique songs would never make it on the list.
So I wondered, when the cap is reached, cutting the lowest ones to make room.
For example, when unique songs reach 1000 (cap), the lowest 500 (cut) would be removed.
This cycle would repeat every time the cap is reached.
Yet, I do not know if this gives a nearly un-surmountable advantage to the surviving songs?
Thus, making the list statistically worthless.
Is there a formula to know what sort of "cap" and "cut" is needed
(to give new entries an equal chance to climb their way to the top)?
Thanks for any insights on how to tackle this.