Ideally I'd like to use SPSS, but I'm also familiar enough with R that I could use it if I need to. I'd really appreciate it if anyone could help me.

- Thread starter JeffTheGreen
- Start date

Ideally I'd like to use SPSS, but I'm also familiar enough with R that I could use it if I need to. I'd really appreciate it if anyone could help me.

Is that right? If it is, I can tell you how to solve your problem.

Suppose your parameterized distribution is \(p(x,\theta; y)\), where x is the x-value, y is the y-value, and \(\theta\) represents the unknown regression parameter(s). The the probability (density) of each data point is \(p(x_i, \theta; y_i)\). The log-likelihood function for your whole data set is then

\(

\log L = \sum_{i} \ln p(x_i, \theta; y_i)

\)

Regression consists of finding \(\theta\) to maximize this function.

Here is a concrete example. Suppose your model is that your data is normally distributed, with the variance proportional to the mean. For various values of the mean (x), you have taken different samples (y), and you want to do a regression to determine the proprotionality constant (a).

\(

p(x,a;y) = \frac{1}{\sqrt{2\pi a x}}

\exp \left\{ -\frac{1}{2} \left( \frac{y - x}{\sqrt{a x}} \right)^2 \right\}

\)

What I am suggesting you do is: (i) for a given assumed value of a, compute p for each x,y data point. (ii) construct a log-likelyhood function by summing ln(p) over all data points. (iii) adjust a to maximize the function.

How exactly is y representing a pdf? Is it a collection of parameters? Or is it just a random sample and you're considering the empirical distribution to be the pdf?

ichbin, you'll have to excuse my ignorance, but I'm not sure I understand what you're saying. Most of my experience with statistics is just with data analysis and experiment design, not with the mathematics.

It seems like what you're suggesting I could do a least-squares regression--as I might in Excel or SPSS or R--but calculate a log likelihood score instead of using the p-value the program calculates. Is that correct? How would I then turn that into a p-value? (Or is that even possible?) What about if the model is y = mx + b, rather than just y = mx?