I reshaped long and now there are more categories in my encoded variables!

#1
Hi :)
I reshaped my dataset and now my encoded variables have more categories than they had. What may be the cause? (PERS_ID is the identifier for the person while STU_ID identifies the identification number and then change if the student switches between majors in the university system of my country)

Syntax

reshape long facolt_ cod_cds_ des_pds_ cod_classe_ normativa_ anno_corso_ tipo_iscr_ PT_ f_ eso_cod_ data_AL_ tasse_ Iratapagata_ Iratanonpagata_ IIratapagata_ IIratanonpagata_ IIIratapagata_ IIIratanonpagata_ dataIratapagata_ dataIIratapagata_ e_c_ f_en_ cod_en_ des_en_ cod_cl_en_ normativa_en_ iscr_en_ , i( PERS_ID STU_ID) j(year)


Mistery Tables

**Before**

tipo_iscr_2 |
008 | Freq. Percent Cum.
------------+-----------------------------------
FC | 105 1.42 1.42
IC | 7,268 98.58 100.00
------------+-----------------------------------
Total | 7,373 100.00

** After reshaping**
ta iscr_en_

iscr_en_ | Freq. Percent Cum.
------------+-----------------------------------
FC | 4,059 10.79 10.79
IC | 28,058 74.62 85.41
3 | 5,485 14.59 100.00
------------+-----------------------------------
Total | 37,602 100.00

Thanks very much for the attention!:wave:
 
#3
Hi Bukharin, thanks for your answer :)
Before reshaping I had six variables, that is one for each year (iscr_en_2006-iscr_en_2011) while after i get one (iscr_en_). The problem is that I get a 15% of 3, which is a new category that seems to me to come out of nothing.
 

bukharin

RoboStataRaptor
#4
Okay, in that case what do you get from:
tab iscre_en_2006
tab iscre_en_2007
tab iscre_en_2008
tab iscre_en_2009
tab iscre_en_2010
tab iscre_en_2011
 
#5
Yeah, I tried. no sign of this mysterious 3. I'm so confused :(

ta iscr_en_2006

tipo_iscr_2 |
006 | Freq. Percent Cum.
------------+-----------------------------------
FC | 52 1.59 1.59
IC | 3,220 98.41 100.00
------------+-----------------------------------
Total | 3,272 100.00

ta iscr_en_2007

tipo_iscr_2 |
007 | Freq. Percent Cum.
------------+-----------------------------------
999 | 1 0.02 0.02
FC | 56 1.01 1.03
IC | 5,485 98.97 100.00
------------+-----------------------------------
Total | 5,542 100.00

ta iscr_en_2008

tipo_iscr_2 |
008 | Freq. Percent Cum.
------------+-----------------------------------
FC | 105 1.42 1.42
IC | 7,268 98.58 100.00
------------+-----------------------------------
Total | 7,373 100.00

. ta iscr_en_2009

tipo_iscr_2 |
009 | Freq. Percent Cum.
------------+-----------------------------------
FC | 767 8.34 8.34
IC | 8,434 91.66 100.00
------------+-----------------------------------
Total | 9,201 100.00

. ta iscr_en_2010

tipo_iscr_2 |
010 | Freq. Percent Cum.
------------+-----------------------------------
FC | 1,289 18.30 18.30
IC | 5,753 81.70 100.00
------------+-----------------------------------
Total | 7,042 100.00

ta iscr_en_2011

tipo_iscr_2 |
011 | Freq. Percent Cum.
------------+-----------------------------------
FC | 1,845 35.67 35.67
IC | 3,327 64.33 100.00
------------+-----------------------------------
Total | 5,172 100.00
 
#6
NOW I see it! Thanks!! In iscr_en_2007 (label list iscr_en_2007) IC is 3 because of the 999 that is 1! Then that 3 is nothing but the IC of 2007.
\(◎o◎)/!
Many thanks Bukharin!! :)
 
#8
Hi :)
yesterday I realized that I had the same problem with the whole set of variables and I've found this solution that seems to work just fine for the entire dataset :)

I used this syntax before reshaping in order to copy all the labels.

foreach v of var * {
local l`v' : variable label `v'
}


Then I used this one, after reshaping, to retrieve them.

foreach v of var * {
local L`v' : variable label `v'
if `"`L`v''"' == "" & `"`l`v''"' != "" {
label var `v' `"`l`v''"'
}
}

as it is exlplained here http://www.stata.com/statalist/archive/2003-01/msg00711.html

best regards ^_^