Multiple Regression to receive cost for a single item of an order

#1
Hi everybody,

I appreciate any kind of help and tips:

Problem:
- I have a dataset with 1600 orders
- For each of these orders I know the total costs & products included (e.g. 7975€ & A, C, D, H)
- I don't know the individual prices for the range of products (in total 80), which would, for instance, be helpful to calcuate new offers .
--> please see the attached pic

Potential Solultion: Multiple regression
- no of observations = no of orders
- independent variables = all products (= ~80)
- dependent variable = total cost (per order)

Questions:
- Is a multiple regression a suitable statistical tool?
- If not: Is there an alternative?
- If yes: Are there any specialities I have to consider (e.g. multicollinearity --> how can I reduce it, etc.)?

Many thanks for your help! :)
Chris
 

hlsmith

Less is more. Stay pure. Stay poor.
#2
Yeah, I could see multiple linear regression working or getting you in the ballpark, then you confirm results based on a few future orders. Are you going to treat products as binary variables (1/0)? That is what I would do. I don't think MC should impact it.


This also seems solvable in matrix algebra. Though I do not know MA that well. Which regression uses MA.
 
#3
Can you de-duplicate the rows and find the inverse matrix of the Product Matrix?

Linear algebra would be as follows:
1) AB = C where A = Product Matrix, B = Price Vector, and C = Order Price Vector
2) Inv(A)AB = Inv(A)C where Inv(A)A = Identity Matrix
3) IB = Inv(A)C
4) B = Inv(A)C

If you cannot invert matrix A then the regression solution might be good enough. But if it works, then B should yield a vector of the prices for each Product.
 
#4
Thanks a lot for your instant answer, hlsmith & Oberon!
1) Linear Algebra:
- Based on my current state of knowledge, I think I cannot build a quadratic (= invertible) product matrix, which is necessary to use the approach of Oberon, right?

2) Multiple regression
- Sounds good that a multiple linear regression is a valid approach
- @hlsmith: yes, I want to treat products as binary variables
- When I conduct a multiple regression with Stata, I receive improper values, e.g.
- negative coefficients (= cost per product)
- low R^2 (0.72)
- high standard error (e.g. 638 for coefficient 1108)
- 30% of coefficients not statistically significant
- omitted variables because of multicollinearity​
- a potential reasons might be: costs for individual product calculated within an order vary strongly from order to order
- what are possible countermeasures: clean up data? My first approaches would be:
- combine correlated variables to factor?
- de-duplicate rows (orders)?
- exclude those orders that contain products with strongly varying cost per product over different orders?​
- Do you have more ideas?

Thanks a lot!
Cheers
Chris​
 

hlsmith

Less is more. Stay pure. Stay poor.
#5
Can you just solve the easier products first? Find some one who only bought a couple of product and different combinations of those? Or if a person who just bought a single product - kind of like a word decipher!
 

hlsmith

Less is more. Stay pure. Stay poor.
#7
Yes, that would be another approach. If there are 80 possible products, for observations, what is the fewest products selected.
 
#8
Hello again,
I have worked on my data record and regression for the last few days, but unfortunately I have not received the results I expected. Please check out the excel file attached:

  • Some orders seem to be calculated accurately (e.g. order 156 - 172); Those orders contain mostly the same products)
  • For other orders the deviation (regression – actual, column BB) is partly pretty high; those orders contain often a variety of different products (e.g. order 103 -155).
  • Some products have negative costs (e.g. ca, cc,..)

-->Considering the results and the structure of the data record, is a multiple regression the right tool?

-->Will the results be more accurate if I drop more datasets (with outliers) or focus on those with more reasonable results first?
Many thanks!!