# MLE for the number of samples given k largest values

#### Xodarap

##### New Member
I have the views on the top 100 videos using a tag in TikTok and want to estimate the total number of videos in that tag. I know the distribution for other tags so I can make a guess as to what it is for this one, but I don't know for sure.

One formal model of this: I'm given the k largest of n i.i.d. samples, and I wish to estimate n.

Let F(v)=p(X≤v). Then I believe that the probability of having exactly k of n samples greater than v is binomially distributed as B(n,k,F(v)).

So I think one way I could solve this is to set v to be the smallest of the k samples, and then find the value of n which maximizes the likelihood. Is this the best way to solve the problem? I feel like I'm not using all of the relevant information.

Also, empirically it doesn't seem to give a very good estimate: you can see that there is some correlation here, but it's pretty weak: Code to produce image:
Python:
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import norm
from scipy import stats
from scipy import optimize
from scipy.special import comb
import pandas as pd

def run_test(size, k = 10):
data = norm.rvs(size = size)
biggest = sorted(data)[-k]

def score(n):
l = 1 - norm.cdf(biggest)
return -binom.logpmf(k, n, l)

fitted = optimize.minimize(score, x0 = 10, method = 'Nelder-Mead',
options={'xatol': 1e-8})
return fitted.x

sizes = np.random.randint(low = 100, high = 10000, size = 500)
estimates = [run_test(s) for s in sizes]
plt.scatter(sizes, estimates)
plt.xlabel('true size')
plt.ylabel('estimated size')