Consider each x,y as an individual data point. Not each x,{y}, which was your original idea, but each x,y. So if you have 10 different x-values, each with 100 y-values, you will have 1000 data points, not 10.
Suppose your parameterized distribution is \(p(x,\theta; y)\), where x is the x-value, y is the y-value, and \(\theta\) represents the unknown regression parameter(s). The the probability (density) of each data point is \(p(x_i, \theta; y_i)\). The log-likelihood function for your whole data set is then
\(
\log L = \sum_{i} \ln p(x_i, \theta; y_i)
\)
Regression consists of finding \(\theta\) to maximize this function.
Here is a concrete example. Suppose your model is that your data is normally distributed, with the variance proportional to the mean. For various values of the mean (x), you have taken different samples (y), and you want to do a regression to determine the proprotionality constant (a).
\(
p(x,a;y) = \frac{1}{\sqrt{2\pi a x}}
\exp \left\{ -\frac{1}{2} \left( \frac{y - x}{\sqrt{a x}} \right)^2 \right\}
\)
What I am suggesting you do is: (i) for a given assumed value of a, compute p for each x,y data point. (ii) construct a log-likelyhood function by summing ln(p) over all data points. (iii) adjust a to maximize the function.