I'm not sure how to use "egen sum by rows"

#1
Hello
i have the following problem. I need to sum by rows. For example:
Code:
v1	v2	v3
1	1	1
2	5	12
2	6	12
2	1	12
3	2	9
3	7	9
1	2	21
1	9	21
1	8	21
1	2	21
So, i need to sum variable v2 by grouping variable 1 and therefore i obtain v3.
I ran egen v3=sum(v2), by(v1), but stata summed all rows from the database. What i did wrong? Thanks for your time and help.
 

bukharin

RoboStataRaptor
#2
Generally speaking you should use -egen, total- rather than -egen, sum- which is not documented. The reason that -egen, sum- is not documented is to prevent confusion with the function sum() which generates a running sum.

Why shouldn't v3 be 22 for v1==1? Does the order of v1 matter? If not, just use -egen, total- and you should get the correct result. If this doesn't work, you need to post an example (like the above) with your output.

If the order does matter then you could create a new grouping variable prior to running -egen-:
Code:
gen group=sum(v1!=v1[_n-1])
egen v3=total(v2), by(group)
list, clean noobs

    v1   v2   group   v3  
     1    1       1    1  
     2    5       2   12  
     2    6       2   12  
     2    1       2   12  
     3    2       3    9  
     3    7       3    9  
     1    2       4   21  
     1    9       4   21  
     1    8       4   21  
     1    2       4   21
This is a "trick" in the sense that it's a running sum of the true/false (1/0) comparison of v1 with the preceding v1 - so it goes up by 1 whenever v1 changes, creating your groups.
 
#4
Now it emerged another problem. Here is my data
Code:
v1	v2	v3
1	150	150
3	137	350
3	103	350
3	110	350
3	117	364
3	127	364
3	110	364
2	139	289
2	150	289
Now i need to obtain v3, which is the sum of v2 according to the rows indicated by v1. Look when v1 is 3. I must count from that row 2 rows down and sum the total. If appears another 3, that's other sum, like i did in the example posted above. i don't have any variable to differentiate than going down arrays and jump depending on the first value found in the row. Did i myself clear? Again, thanks for your time and patience.
 

bukharin

RoboStataRaptor
#5
That's a bit trickier. Here's one solution (not very well tested):
Code:
clear
input v1	v2
1	150
3	137
3	103
3	110
3	117
3	127
3	110
2	139
2	150
end

gen n=_n
gen group1=sum(v1!=v1[_n-1])
bysort group1 (n): gen byte group2=mod(_n-1, v1[1])==0
gen group3=sum(group2)
egen v3=sum(v2), by(group3)
Result:
Code:
    v1    v2   n   group1   group2   group3    v3  
     1   150   1        1        1        1   150  
     3   137   2        2        1        2   350  
     3   103   3        2        0        2   350  
     3   110   4        2        0        2   350  
     3   117   5        2        1        3   354  
     3   127   6        2        0        3   354  
     3   110   7        2        0        3   354  
     2   139   8        3        1        4   289  
     2   150   9        3        0        4   289
 
Last edited: