If I have a sparse data set and wanting to bin the data points such that there are enough data points in a bin, is there theories/journal which states statistically which is the best way to determine the best bin size?

From the left figure showing the sparse data points, I have initially set the minimum number of data points to be at least 7 in order to form one bin as shown in the right figure. After binning the data points, I had determined the variance of each bin which is shown in the colour plot. However, the number of data (7) to be set in each bin was purely random. I would like to how could I determine the optimum number of data points in each bin, such that the variance determined would be substantial to be true.

Thank you!