Regression analysis research project help.


I have a project i have to do for my regression analysis course. Basically I need to gather a bunch of data and plot it and find a regression line using software.

Now for my idea i chose average movie gross revenue can be determined with these factors:

x1 = amount spent on marketing/promotion.
x2 = amount spent on making the movie
x3 = if the movie is a sequel (binary variable)
x4 = who is directing the movie
x5 = if it got largely positive reviews (binary variable)
x6 = who is starring in the movie
x7 = how many theaters the movie was released in worldwide

Now i'm an avid movie goer and i have a general idea of where i can get the data for most of these.

However i have a question regarding x4 and x6.
I dont know if those variables should be binary because there are many different possibilities, and i haven't really learned this in my course yet but basically if they are, should i just leave it as:

whether the director is well known?
whether the stars are well known?

So what i was thinking of doing for my B4 and B6 constants was trying to get an average revenue for all the well known directors movies and well known actors movies and just putting them in there.


Ninja say what!?!
For x4 and x6, you're really opening a huge can of worms if you want to account for the stars and directors specifically. You'd need a huge data set and plenty of observations for each star and director. In addition, you'd need to create dummy indicators for each of them since these are categorical values.

My suggestion is to stick with what you suggested. Just have a binary indicator. Of course, you'd have to clearly define what is meant by "well known" to the point that if another person were answering, they'd choose the same value as you.