Database Assistance

#1
I need to create a customer profiling programme around a database which we have already created and will now be populating.

The database will comprise several categories – retailers, products, media etc. and, for each entry, in each category, there will be expected profiles – age, gender, race, language, (lifestyle measurement – being rated between 1 and 10) LSM etc.

A sample database excerpt follows. The descriptors and probabilities will be provided by the media, manufacturers and retailers.

Media Category
Age Gender LSM
Range Prob. Prob. Range Prob.
Vogue 35-60 80% Female 90% 8-10 90%
Playboy 20-40 80% Male 90% 6-8 80%
NY Times 35-65 70% Male 65% 8-10 85%
People 25-60 70% Female 70% 5-7 90%

Product Category
Age Gender LSM
Range Prob. Prob. Range Prob
Levi 15-40 80% Female 50% 3-7 80%
Armani 35-60 80% Female 60% 10 90%
DeoderantA 15-50 70% Male 90% 5-8 85%
DeoderantB 15-60 70% Female 95% 5-8 90%
Max Cosmetics 20-40 65% Female 95% 6-8 90%

Retailer Category
Age Gender LSM
Range Prob. Prob. Range Prob
Bloomingdales 35-70 80% Female 65% 8-10 90%
Sports Shop 20-50 70% Male 80% 7-10 80%
Neiman Marcus 40-65 90% Female 90% 10 95%
Gap 20-60 80% Female 60% 5-8 90%
Wal Mart 30-70 65% Female 65% 3-8 90%

The product we are developing will track a consumers behaviour in each category and must then create an expected profile. In this regard, I note that the database will be dynamic, both in interactions and category participants. Thus, new media and retailers may be included over time and one consumer may purchase new products and media into the future. Both activities will amend the profile.

Assume that a consumer (i) purchases Vogue buys Armani Jeans at Neiman Marcus. Then, the same consumer (i) purchases New York Times, buys Levi Jeans also at Neiman Marcus and finally, (iii) buys Max Cosmetics, also at Bloomingales.

Based on the noted activity, I require a formula that gives me an expected range for age and LSM with probability, as well as the probability of gender and the probability of language. There will be some confounding variables. For example, if a standard distribution model is to be applied, the total possible LSM range must be considered. The range peaks at 10 and therefore, if the probability is 80% that the LSM is 9 to 10, then the other 20% must be from 1 to 8, whereas if the LSM range is between 4 and 6, with a 60% probability, then 20% can be expected to fall between 1 and 3 and 20% between 7 and 10.

Can anyone help?

I am also willing to change the database if necessary to ease the problem.

Thank you in advance

Jonny Fenster
fencorp@global.co.za