Today I Learned: ____

So far I've just been trying to scrape basic things to get a hang of it. I've tried scrapping the number of youtube views for a particular video, ratings from Rotten Tomatoes/Meta Critic and about any site that has a real time counter (e.g. http://www.nationaldebtclocks.org/debtclock/canada).


When you install rvest, be sure to to execute vignette("selectorgadget") . It will show you how to identify the exact elements you wish to scrape from a web page.

Anyway, here is a toy example I did with Rotten Tomatoes. I just wanted to scrape the data for the movies opening this week (name and scores). If you head over to www.rottentomatoes.com you can view them first on the top left.


Code:
library(rvest)
library(ggplot2)


#Load site of interest
rt <- read_html("https://www.rottentomatoes.com/")


#Scrape the names of the movies opening this week
allnames <- rt %>%
          html_nodes(".right a , .middle_col a") %>%
		  html_text() %>%
		  as.character()


#note the SelectorGadget helped identify the key ".right a , .middle_col a" was what I needed to scrape.

#The scrape actually extracts a lot of extras, must inspect the vector to identify the elements of interest
top5_names <- allnames[c(18,20,22,24,26)]


#Scrape the corresponding scores for the movies opening this week
allscores <- rt %>%
          html_nodes(".tMeterScore") %>%
		  html_text() %>%
		  as.character()

		  
 
#Again, the scrape has a lot of extra info. Identify and select just the scores for those movies opening
#remove "%" and make numeric so i can plot them
top5_scores <- as.numeric(gsub("%","",allscores[19:23]))




#store to a data frame for use in ggplot
top5_ratings <- data.frame(title=top5_names, ratings=top5_scores)
top5_ratings


#make a plot
p <- ggplot(top5_ratings, aes(weight=ratings, x=title, fill=title, label=ratings)) + 
geom_bar() +
geom_text(aes(y=ratings/2)) +
guides(fill=FALSE) +
theme_bw()

p
 
Last edited:
Here is the extension of the above to a self contained shiny application.

Every 10 seconds (10000 milliseconds):
-read the data in, store it
-record the system time/date for the observation
-append it to an existing data frame
-plot it using GGPLOT and facet_grid by the time variable
-this example won't be interesting (i.e. doesn't change at all since yesterday) and maybe every 12 - 24 hours might be a better interval to refresh for this particular site.

It's a bit rough, but it does in concept what I wanted to do.

Code:
library(shiny)
library(magrittr)
library(ggplot2)
library(rvest)


ui <- shinyServer(fluidPage(
  plotOutput("plot")
))

server <- shinyServer(function(input, output, session){
  # Function to get new observations
  get_new_data <- function(){

  #Load site of interest and store the time you retrieved the data
  rt <- read_html("https://www.rottentomatoes.com/")
  systime <- Sys.time()

#Scrape the names of the movies opening this week
  allnames <- rt %>%
          html_nodes(".right a , .middle_col a") %>%
		  html_text() %>%
		  as.character()


#The scrape actually extracts a lot of extras, must inspect the vector to identify the elements of interest
   top5_names <- allnames[c(18,20,22,24,26)]


#Scrape the corresponding scores for the movies opening this week
   allscores <- rt %>%
          html_nodes(".tMeterScore") %>%
		  html_text() %>%
		  as.character()

		  
 
#Again, the scrape has a lot of extra info. Identify and select just the scores for those movies opening
#remove "%" and make numeric so i can plot them
top5_scores <- as.numeric(gsub("%","",allscores[19:23]))




#store to a data frame for use in ggplot
data <- data.frame(title=top5_names, ratings=top5_scores,time=systime)
  
  
    return(data)
  }

  # Initialize my_data
  my_data <<- get_new_data()

  # Function to update my_data
  update_data <- function(){
    my_data <<- rbind(get_new_data(), my_data)
  }

  # Plot the 30 most recent values
  output$plot <- renderPlot({
    invalidateLater(10000, session)
    update_data()
   
   p <- ggplot(my_data, aes(weight=ratings, x=title, fill=title, label=ratings)) + 
        geom_bar() +
		facet_grid(.~time) +
        geom_text(aes(y=ratings/2)) +
        guides(fill=FALSE) +
        theme_bw() +
		theme(axis.text.x = element_text(angle = 45, hjust = 1))

   print(p)
   
   
  })
})

shinyApp(ui=ui,server=server)
 
Last edited:

bryangoodrich

Probably A Mammal
So uh ... yeah TIL there is a toString function in R (seriously, when did this get added?!)

Code:
toString(letters)  # "a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z"
Basically it's syntactic sugar for paste(x, collapse = ", "), though it says it's a helper function for format. It does have a nice width parameter so it will give a shorter representation (kind of goofy, though)

Code:
toString(letters, width = 5)  # "a,...."
In any case, I had a use-case of efficiently unpacking a vector into components of a list for a SQL statement and this makes it a lot easier

Code:
x <- sample(letters, 3)
sql <- "SELECT * FROM foobar WHERE foo IN (%s)"
sprintf(sql, toString(x))
Only problem here is it won't work for characters because it'd produce something like "IN (a, b, c)" when you really need "IN ('a', 'b', 'c')". So there is still some work to be done!

Another approach is to do something for an expression like

Code:
x <- sample(LETTERS, 3)
sql <- "INSERT INTO foobar VALUES (%s, %s, %s)" 
sprintf(sql, x[1], x[2], x[3])
It would be nice to have some way to just say sprintf(sql, x) and "unpack" x into the remaining arguments, but I'll probably just have to create my own wrapper that uses paste and collapse and quotes appropriately (I already have something like this). I just wanted a base R way. Maybe stringr has something? Would also be nice if one of the database libraries has a way to just say "convert this data.frame into a series of sql INSERT statements for each row" because in the insert case, that's ultimately what I'm trying to have a wrapper for.
 

Dason

Ambassador to the humans
Code:
> sQuote(letters)
 [1] "‘a’" "‘b’" "‘c’" "‘d’" "‘e’" "‘f’" "‘g’" "‘h’" "‘i’" "‘j’" "‘k’" "‘l’"
[13] "‘m’" "‘n’" "‘o’" "‘p’" "‘q’" "‘r’" "‘s’" "‘t’" "‘u’" "‘v’" "‘w’" "‘x’"
[25] "‘y’" "‘z’"
> toString(sQuote(letters))
[1] "‘a’, ‘b’, ‘c’, ‘d’, ‘e’, ‘f’, ‘g’, ‘h’, ‘i’, ‘j’, ‘k’, ‘l’, ‘m’, ‘n’, ‘o’, ‘p’, ‘q’, ‘r’, ‘s’, ‘t’, ‘u’, ‘v’, ‘w’, ‘x’, ‘y’, ‘z’"
> # Stupid 'fancy' quotes...
> options(useFancyQuotes=FALSE)
> toString(sQuote(letters))
[1] "'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'"
>
 

bryangoodrich

Probably A Mammal
Found some interesting behavior

Code:
insert_values <- function(x, table = "tbl", quote = TRUE)
{
    sql <- "INSERT INTO %s VALUES (%s)"
    sprintf(sql, table, ifelse(quote, toString(sQuote(x)), toString(x)))
}
Had to do that because this doesn't work (just quotes the first element of the vector ('a')

Code:
insert_values <- function(x, table = "tbl", quote = TRUE)
{
    sql <- "INSERT INTO %s VALUES (%s)"
    sprintf(sql, table, toString(ifelse(quote, sQuote(x), x)))
}
Code:
x <- c('a', 'b', 'c')
insert_values(x)  # "INSERT INTO tbl VALUES ('a', 'b', 'c')"
insert_values(x, "foobar", FALSE)  # "INSERT INTO foobar VALUES (a, b, c)"
This works for simple all or none use cases, but a more complex solution will need to provide quoted mappings for each record data type--e.g., quote strings, factors, and dates, unquote numerics and be able to supply a complex row tuple into the values list. Fun exercise nonetheless.
 

Dason

Ambassador to the humans
ifelse will always produce a result that has the same length as the first input. So if you put in a single logical value you'll get something of length 1 as the output. It's kind of annoying but it makes sense for it's particular use case. You could just use if/else directly to avoid that issue.
 

bryangoodrich

Probably A Mammal
Good point. True, I can just use if else, but it's like nails on a chalkboard to see that you can return a value from a control flow block! I think the approach I used is sufficient and falls within the paradigm of what ifelse is doing (one return for one input). I was just trying to localizing the conditional logic to the exact part that is being tested.
 
TIL: Reordering of a factor within Shiny using the 'shinyjqui' package. This was motivated by a common question (I myself had earlier this year too) of how to change the order of say the x-axis categories on a graph.

I have a self-contained Shiny App that demonstrates that here.

Code:
library(shiny)
library(shinyjqui)
library(ggplot2)			 

### CREATE A DATA FRAME OF SAMPLE DATA
mydat <- data.frame(group=c("Apples","Bananas","Oranges","Peaches"),
                     value=c(3,2,5,1)
				 )
				 
### CREATE SERVER.R	
		 
server <- function(input, output) {

  #Text Output
  output$txt <- renderPrint({input$foo_order})
  
    
  #Plot Output
  output$graph <- renderPlot({
  
    mydat$newp <- factor(mydat$group, levels=input$foo_order)
	
	g<-ggplot(mydat, aes(y=value,value,x=newp,color=newp)) + geom_point() + theme_bw() 
  
	print(g)
  
  
							})
}


###CREATE UI.R

ui <- fluidPage(
  orderInput(inputId = 'foo', label = 'change ordering by moving the squares', items =   levels((mydat$group))),
  verbatimTextOutput('txt'),
  plotOutput('graph')
)

shinyApp(ui, server)
 

trinker

ggplot2orBust
TIL `cat` has a `sep` argument. All these years of using `cat(paste(x, collapse = '\n'))` could have been `cat(x, sep = '\n')`:

Code:
x <- c('I like it', 'Me too')
cat(x, sep = '\n')
 
TIL

Today is 2 fold:

(1) integrate an e-mail submission form in R-Shiny using the sendmailR package.
(2) modify and manipulate existing html objects from an html template (which relies on js/css) in shiny.

Example: https://html5up.net/prologue

I've essentially brought to life the 'contact form' without having to specify my own shiny objects (i.e. textInput, textAreaInput). The sendmailR package paired nicely with this for me which was a huge help, because I don't know php or how it works.
 
@jamesmartinn Nice. Do you have the source code for this available that you'd be willing to share?
For sure! I will post an example using that exact html template and give an overview/annotation of what I did. I will upload the entire app as a zip later today.

I have it working using google's g-mail service. It would be helpful if you can register a g-mail account to test out the app yourself (even if you have a primary g-mail account, register a separate test account for the sake of this)

Cheers!
 
Hey guys,

Here is a bare bones example that will work both locally and on shinyapps.io.

Just a few things:

-The g-mail settings work as is. The only thing you need to change is the actual account and password for the test account you setup.

-On the g-mail website, you may first need to enable the account to be used by 'less secure apps'.

-I have the app setup so that it sends an e-mail to the dummy account from the dummy account.

-I'm not sure how secure this is. I felt a bit uncomfortable hard coding my password for my primary gmail account into the app, so I just made a test account. When I deployed this to shinyapps.io and did a right click->view source, I didn't see any of these details displayed (they are obviously done on the server side), but still went the cautious route. As an alternative, the send.mail function has a CC parameter, so you can always request a copy of an e-mail be sent to your primary account.


Code:
#Load Packages
library(shiny)
library(mailR)

#UI
ui <- fluidPage(
textInput('name','Your name',''),
textInput('contact','How can I reach you?',''),
textAreaInput('message','Enter your message here',''),
actionButton('send','Submit Message')
)

#Server Logic
server <- function(input, output) {
  
  observeEvent(input$send, {
    send.mail(from = "blahblahblah@gmail.com",
              to = "blahblahblah@gmail.com",
              subject = "New Consult!",
              body = paste("Name:",input$name, '\n', "Contact:" ,input$contact, '\n', "Messsage", input$message),
              smtp = list(host.name = "smtp.gmail.com", port = 465, user.name = "blahblahblah@gmail.com", passwd = "password", ssl = TRUE),
              authenticate = TRUE,
              send = TRUE)
   
  })
}

#Run App
shinyApp(ui = ui, server = server)
 

Attachments

Last edited:
TIL a little about standard and non-standard evaluation when attempting to create a function using dplyr.

Code:
library(dplyr)

#Make some data
dats <- data.frame(type=rep(c("a","b"),c(4,5)),
                   category=rep(c("apples","oranges"),c(5,4)),
                   amount=round(runif(9,min=1,max=10),1))


#Manual approach to doing some summaries
man = dats %>% group_by(type) %>% mutate(total=sum(amount))  %>% distinct(type, total)


#Function approach
fun <- function(groupby, var) {

  groupby <- enquo(groupby)
  var <- enquo(var)

  dats %>% group_by(!!groupby) %>% mutate(total=sum(!!var))  %>% distinct(!!groupby, total)


    }

#Function approach results in identical results to manual approach
auto <- fun(groupby=category, var=amount)
auto

#Function approach is flexible
auto2 <- fun(groupby=type, var=amount)
auto2
Bonus: Using multiple variables (here, grouping by both category and type)

Code:
fun <- function(..., var) {
 
  groupby <- quos(...)
  var <- enquo(var)
 
  dats %>% group_by(!!!groupby) %>% mutate(total=sum(!!var))  %>% distinct(!!!groupby, total)
 
 
}

auto <- fun(type, category,  var=amount)
auto
 
Last edited: