Combining two datasets into single file

#1
Hi,
I am trying to combine two datasets - each datafile contains data for two countries and each file contains different observations. I wish to combine them into a single file in order to run a pooled country model for the four countries.
I found this website helpful http://www.cpc.unc.edu/research/tools/data_analysis/statatutorial/example9
and the stata manual (p. 388) describes one-to-one merges but I am unsure how to proceed.


I would be grateful for any advice you could give. Thank you
 
#2
Do the two datasets share any common variables? If you are going to want country as a variable in a model (with say, 4 levels) you may actually want to append the datasets
 
#3
Thank you for your help. Yes the two datasets do share common variables and I want country as a variable in a model. I appended the datasets and checked that the appended dataset is the sum of the original datasets. However, when I ran the regressions it reported that there were no observations for certain countries on certain waves (e.g. country 4 on wave 2008). I ran a crosstab of the country and wave variables:

wave
country 2 3 4 2008 Total
1 1,500 1,000 1,200 0 3,700
2 0 0 0 1,509 1,509
3 1,001 1,996 2,022 0 5,019
4 0 0 0 1,013 1,013

Might the fact that the variables were coded differently in the original datasets be the problem here? Thank you very much
 
#4
If you are going to combine two datasets you need to make sure the variables are coded the same way. I.e. a value of 1 should mean the same thing if the two varables are going to be merged. If two variables are in two datasets but have the same meaning you need to give them the same name so Stata knows to stack them on top of each other (assuming you are using append rather than merge)
 
#5
I simplified things a bit by dropping the earlier waves (which are not relevant to the study) and recoding the wave variable in one of the datasets from a number to a year. When I ran a crosstab between wave and country I got:
wave
country 2000 2008
1 1,200 0
2 0 1,509
3 2,022 0
4 0 1,013

But when I ran the regressions stata still says there are no observations for 2008 even though it reports 2,522 observations in the table. Do you know what is going on? Thank you
 
#6
The formatting makes your table hard to read, but the way I read it country 1 has no observations for 2008 is that correct? The error you are getting is possbily because although you have observations in 2008, there are some countries which have no observations. What kind of regression are you running?
 
#7
Thank you very much. Yes country 1 has no observations for 2008. Only country 2 and 4have observations for 2008. I am running ologit regression, sorting by country and wave. But even when I run the regressions for the countries with observations stata reports no observations. I will go back and check that the variable names are consistent across countries.
 
#8
I dot know about ordered logisitc regression, but in normal logistic regression this could be due to an issue known as seperation which you might want to look up. A zero cell will cause issues.
 
#9
I checked the coding and naming of variables for consistency across the original and appended files and the regressions run fine now. Thank you for your time and effort, which I appreciate a lot.