right join with dplyr make rows columns

#1
I would like to do right join data1 and data2 by ProductCode and I need to get desired output table below

Code:
data1=data.frame(ProductCode=c(1,1,1,2,2,3),region=c("A","A","A","B","B","C"))
data1
ProductCode region
1      A
1      A
1      A
2      B
2      B
3      C


data2=data.frame(ProductCode=c(1,1,1,2,2,3),Period=c("promo1","promo2"
,"promo3","promo2","promo3","promo1"),promosales=c(15,12,7,18,20,2))
data2

ProductCode Period promosales
1     promo1         15
1     promo2         12
1     promo3          7
2     promo2         18
2     promo3         20
3     promo1          2
Desired output table below
Code:
ProdcutCode region     number_ofpromo Promo1_sales Promo2_sales Promo3_sales
       1        A                3            15       12             7
       2        B                2            18       20             0
       3        C                1            2         0             0
If I do it with sql, I have to group by each row and it needs a lot of work. Is there another way to do this such as dplyr or something else.
Thank you.
 
Last edited:
#2
This can work.

Code:
library(tidyr)
library(dplyr)

my_output = data1 %>%
                     group_by(productcode) %>% 
                     mutate(n_promo=n()) %>% 
                     right_join(data2) %>% 
                     distinct(productcode, region, period) %>% 
                     spread(period, promosales, fill=0)
 
Last edited:
#3
Actually this throws me an error. Below is my error message.
Code:
my_output = data1 %>%
             right_join(data2) %>%
             distinct(productcode, region, period) %>%
             spread(period, promosales, fill=0)


Joining, by = "ProductCode"
Error: `var` must evaluate to a single number or a column name, not a function
Can you also show me to add columsn that counts number of promosales. Thank you
 
Last edited:
#4
I've updated the previous post with that bit.

cheers,


I'm not sure why it's throwing that error. Can you check that your variable names are correctly named and of proper case?

I'm using R 3.1.2 (Pumpkin Helmet) with dplyr 0.4.1 and tidyr 0.2.0 and it works fine. Granted my system here at work is very out of date, so if you're using a newer version of those packages it might not be compatible?
 
Last edited:
#5
Actully thanks but still getting an error
Code:
my_output = data1 %>%
  group_by(ProductCode) %>%
  mutate(n_promo=n()) %>%
  right_join(data2) %>%
  distinct(Productcode, region, Period) %>%
  spread(Period, promosales, fill=0)
Joining, by = "ProductCode"
Error in eval_tidy(enquo(var), var_env) : object 'promosales' not found
 
#6
Code:
my_output = data1 %>%
  group_by(ProductCode) %>% 
  mutate(n_promo=n()) %>% 
  right_join(data2) %>% 
  distinct(ProductCode, region, Period, promosales, n_promo) %>% 
  spread(Period, promosales, fill=0)
Will get the desired output. Looks like the distinct() function changed from the iteration at my work to this (implicit select?)