Creating an Expression Tree

hlsmith

Less is more. Stay pure. Stay poor.
#1
I am trying to get this not to look like garbage. Any suggestions? Stole most of the code from the internet (partially from some raptor sympathizer named @trinker )

Code:
parseTree <- function(string, ignore=c('(',')','{','}'), ...) {
  dat <- utils::getParseData(parse(text=string))
  g <- parser2graph(dat[!(dat$text %in% ignore), ])
  leaves <- V(g)[!degree(g, mode='out')]                             # tree leaves
  preds <- sapply(leaves, neighbors, g=g, mode="in")                 # their predecessors
  vertex_attr(g, 'label', preds) <- vertex_attr(g, 'label', leaves)  # bump labels up a level
  g <- g - leaves                                                    # remove the leaves
  gaps <- V(g)[!nchar(vertex_attr(g, 'label'))]                      # gaps where ()/{} were
  nebs <- c(sapply(gaps, neighbors, graph=g, mode='all'))            # neighbors of gaps
  g <- add_edges(g, nebs, color='blue')                              # edges around the gaps
  g <- g - V(g)[!nchar(vertex_attr(g, 'label'))]                     # remove leaves/gaps
  plot(g, layout=layout.reingold.tilford, ...)
  title(string, cex.main=2.5)
}

Code:
string = "(2*Na) + (Glucose / 18) + (BUN / 2.8)"

library(igraph)
string <- "(a/{5})+(2*b+c)"

parseTree(string,  # plus some graphing stuff
          vertex.color="#FCFDBFFF", vertex.frame.color=NA,
          vertex.label.font=2, vertex.label.cex=2,
          vertex.label.color="darkred", vertex.size=25,
          asp=.8, edge.width=2, margin= -0.9)
1629318366164.png
 
Last edited:

hlsmith

Less is more. Stay pure. Stay poor.
#2
I created one of these using daggity as well, but that isn't what it is for and I couldn't use the same text in more than one node, so two '/' would cause issues.

P.S., obviously the code does some weird sizing formatting that I don't understand in the code from Post #1. And that output is from me making the plot window in RStudio shaped like a long rectangle.

P.S.S., This is actually a tame version of some of the other expressions I would like to display.
 

hlsmith

Less is more. Stay pure. Stay poor.
#3
a little better

1629320386118.png


Code:
parseTree <- function(string, ignore=c('(',')','{','}'), ...) {
  dat <- utils::getParseData(parse(text=string))
  g <- parser2graph(dat[!(dat$text %in% ignore), ])
  leaves <- V(g)[!degree(g, mode='out')]                             # tree leaves
  preds <- sapply(leaves, neighbors, g=g, mode="in")                 # their predecessors
  vertex_attr(g, 'label', preds) <- vertex_attr(g, 'label', leaves)  # bump labels up a level
  g <- g - leaves                                                    # remove the leaves
  gaps <- V(g)[!nchar(vertex_attr(g, 'label'))]                      # gaps where ()/{} were
  nebs <- c(sapply(gaps, neighbors, graph=g, mode='all'))            # neighbors of gaps
  g <- add_edges(g, nebs, color='black')                              # edges around the gaps
  g <- g - V(g)[!nchar(vertex_attr(g, 'label'))]                     # remove leaves/gaps
  plot(g, layout=layout.reingold.tilford, ...)
}


string = "(2*Na) + (Glucose / 18) + (BUN / 2.8)"


parseTree(string,  # plus some graphing stuff
          vertex.color="white", 
          vertex.frame.color="black",
          vertex.label.font=1, 
          vertex.label.cex=1.3,
          vertex.label.color="black", 
          vertex.size=40,
          asp=1, 
          edge.width=2, 
          margin= -0.46,
          main=NULL)
 

Attachments

Last edited:

hlsmith

Less is more. Stay pure. Stay poor.
#4
Any suggestions on how to turn Na into Na^+ or Na**ion, with the later parts obviously being superscripts. The above code treats any mathematical operator as a mathematical operator in the tree.
 

Dason

Ambassador to the humans
#6
I think I tracked down where you were pulling from. I modified the parseTree function to take in a named vector of expressions which will do some replacements. There's probably a better way to attack the problem but this seems like a simple enough solution.

C-like:
library(igraph)

parser2graph <- function(y, ...){
  y$new.id <- seq_along(y$id)
  h <- graph.tree(0) + vertices(id = y$id, label= y$text)
  for(i in 1:nrow(y)){
    if(y[i, 'parent'])
      h <- h + edge(c(y[y$id == y[i, 'parent'], 'new.id'], y[i, 'new.id']))
  }
  h <- set_edge_attr(h, 'color', value='black')
  return(h)
}

parseTree <- function(string, ignore=c('(',')','{','}'), replace_values = NULL, ...) {
  dat <- utils::getParseData(parse(text=string))
  g <- parser2graph(dat[!(dat$text %in% ignore), ])
  leaves <- V(g)[!degree(g, mode='out')]                             # tree leaves
  preds <- sapply(leaves, neighbors, g=g, mode="in")                 # their predecessors
  vertex_attr(g, 'label', preds) <- vertex_attr(g, 'label', leaves)  # bump labels up a level
  
  #### This if statement is what I added. 
  #### It's not particularly clean and there's probably a better way 
  #### to do the replace. Just not thinking great today
  if(!is.null(replace_values)){
    lab <- vertex_attr(g, 'label', preds)
    idx <- match(lab, names(replace_values))
    replace_id <- which(!is.na(idx))
    lab[replace_id] <- replace_values[idx[replace_id]]
    vertex_attr(g, 'label', preds) <- lab
  }
  
  g <- g - leaves                                                    # remove the leaves
  gaps <- V(g)[!nchar(vertex_attr(g, 'label'))]                      # gaps where ()/{} were
  nebs <- c(sapply(gaps, neighbors, graph=g, mode='all'))            # neighbors of gaps
  g <- add_edges(g, nebs, color='black')                              # edges around the gaps
  g <- g - V(g)[!nchar(vertex_attr(g, 'label'))]                     # remove leaves/gaps
  plot(g, layout=layout.reingold.tilford, ...)
}


string = "(2*Na) + (Glucose / 18) + (BUN / 2.8)"
#### vals is the named vector of expressions.  The name in the vector will
#### be replaced with whatever expression is given.
vals <- c(Na = expression("Na"**"ion"), Glucose = expression("Glucose"[2]))


parseTree(string,  # plus some graphing stuff
          replace_values = vals,
          vertex.color="white", 
          vertex.frame.color="black",
          vertex.label.font=1, 
          vertex.label.cex=1.3,
          vertex.label.color="black", 
          vertex.size=40,
          asp=1, 
          edge.width=2, 
          margin= -0.46,
          main=NULL)
 

hlsmith

Less is more. Stay pure. Stay poor.
#7
Yeah, I think I was using and defining the function, because the package would not install. I'll check this out this afternoon. I had went to a network analysis workshop once awhile back and remember kind of playing with igraph.
 

hlsmith

Less is more. Stay pure. Stay poor.
#8
P.S., I got the code to work in Python when using a server account. I couldn't get a package originally added in Python due to admin rights. But I was able to due what the above was a work around for.
 

spunky

Can't make spagetti
#9
Sure. Here is the function and a brief snippet of how it works and what it used for:

\(Y(g,h) = \frac{e^{gZ}-1}{g}e^{\frac{hZ^{2}}{2}}\)

The function above is the stochastic representation of a family of probability distributions discovered by Tukey in 1977 known as the g-and-h distributions. They're a very flexible, powerful family of distributions that can approximate other (difficult) distributions very efficiently. And are ideal for computer simulations (which, as you know, it's kind of my thing). They exist in 3 varieties: g distributions (when h=0), h distributions (when g=0) and the general form posted above. As you can imagine, for g or h distributions, the definition changes a little bit to bypass the awkwardness of dividing by 0. But I'm sharing the most general form.

The g parameter can be any real number, but h is constrained to be positive. And, as expected, \(Z \sim N(0,1)\).

Hope...that...helps?
 

hlsmith

Less is more. Stay pure. Stay poor.
#10
@spunky - did I get this right? The hist(Y) looks lackluster, but obs=41 is pretty large.
Code:
set.seed(261)
n=100
g=rnorm(n, 15, 1)
h=rgamma(n, shape=50, rate=3)
hist(h)
Z=rnorm(n, 0, 1)
Y = ((exp(g*Z) - 1) / g)*(exp((h*(Z**2))/2))
summary(Y)
Y
hist(Y)
Se = data.frame(g,h,Z,Y)
Se
1639152909667.png
 

hlsmith

Less is more. Stay pure. Stay poor.
#11
@spunky What do you think of this? I had to change the seed to make sure I didn't get that huge "outlier", which I ain't proud of.

Code:
set.seed(26128)
n=1000
g=rnorm(n, 0.12, 0.001)
g
hist(g)
h=rgamma(n, shape=0.4, rate=10)
hist(h)
Z=rnorm(n, 0, 1)
Y = ((exp(g*Z) - 1) /g)*(exp((h*(Z**2))/2))
summary(Y)
Y
hist(Y)
TS = data.frame(g,h,Z,Y)
TS
ob g h Z Y

1 0.1208209 1.940753e-03 0.7383277 0.7726692

2 0.1185718 2.007186e-02 0.9144758 0.9740368

3 0.1183718 9.014531e-03 -0.4701954 -0.4578057

4 0.1192093 8.113346e-09 0.2867066 0.2916624

5 0.1183407 8.035741e-02 -0.5705224 -0.5589513

6 0.1200523 2.445499e-02 -0.6304769 -0.6101657

1639586206372.png
 

hlsmith

Less is more. Stay pure. Stay poor.
#12
@Dason @spunky

Thank you, I better understand now. I have seen such things, but had 'now' experience in this area!
Code:
n=10000
#if g = h = 0, normal
g=0.25 
h=0.05 #when equal 0, log-normal
Z=rnorm(n, 0, 1)
b=0.5 #sigma
a=1.0 #mu
Y = a+(b*((exp(g*Z) - 1) /g)*(exp((h*(Z**2))/2)))
summary(Y)
Y
hist(Y)
TS = data.frame(a,b,g,h,Z,Y)
TS
1639615390777.png
 
Last edited: