Creating an Expression Tree

hlsmith

Less is more. Stay pure. Stay poor.
I am trying to get this not to look like garbage. Any suggestions? Stole most of the code from the internet (partially from some raptor sympathizer named @trinker )

Code:
parseTree <- function(string, ignore=c('(',')','{','}'), ...) {
dat <- utils::getParseData(parse(text=string))
g <- parser2graph(dat[!(dat$text %in% ignore), ]) leaves <- V(g)[!degree(g, mode='out')] # tree leaves preds <- sapply(leaves, neighbors, g=g, mode="in") # their predecessors vertex_attr(g, 'label', preds) <- vertex_attr(g, 'label', leaves) # bump labels up a level g <- g - leaves # remove the leaves gaps <- V(g)[!nchar(vertex_attr(g, 'label'))] # gaps where ()/{} were nebs <- c(sapply(gaps, neighbors, graph=g, mode='all')) # neighbors of gaps g <- add_edges(g, nebs, color='blue') # edges around the gaps g <- g - V(g)[!nchar(vertex_attr(g, 'label'))] # remove leaves/gaps plot(g, layout=layout.reingold.tilford, ...) title(string, cex.main=2.5) } Code: string = "(2*Na) + (Glucose / 18) + (BUN / 2.8)" library(igraph) string <- "(a/{5})+(2*b+c)" parseTree(string, # plus some graphing stuff vertex.color="#FCFDBFFF", vertex.frame.color=NA, vertex.label.font=2, vertex.label.cex=2, vertex.label.color="darkred", vertex.size=25, asp=.8, edge.width=2, margin= -0.9) Last edited: hlsmith Less is more. Stay pure. Stay poor. I created one of these using daggity as well, but that isn't what it is for and I couldn't use the same text in more than one node, so two '/' would cause issues. P.S., obviously the code does some weird sizing formatting that I don't understand in the code from Post #1. And that output is from me making the plot window in RStudio shaped like a long rectangle. P.S.S., This is actually a tame version of some of the other expressions I would like to display. hlsmith Less is more. Stay pure. Stay poor. a little better Code: parseTree <- function(string, ignore=c('(',')','{','}'), ...) { dat <- utils::getParseData(parse(text=string)) g <- parser2graph(dat[!(dat$text %in% ignore), ])
leaves <- V(g)[!degree(g, mode='out')]                             # tree leaves
preds <- sapply(leaves, neighbors, g=g, mode="in")                 # their predecessors
vertex_attr(g, 'label', preds) <- vertex_attr(g, 'label', leaves)  # bump labels up a level
g <- g - leaves                                                    # remove the leaves
gaps <- V(g)[!nchar(vertex_attr(g, 'label'))]                      # gaps where ()/{} were
nebs <- c(sapply(gaps, neighbors, graph=g, mode='all'))            # neighbors of gaps
g <- add_edges(g, nebs, color='black')                              # edges around the gaps
g <- g - V(g)[!nchar(vertex_attr(g, 'label'))]                     # remove leaves/gaps
plot(g, layout=layout.reingold.tilford, ...)
}

string = "(2*Na) + (Glucose / 18) + (BUN / 2.8)"

parseTree(string,  # plus some graphing stuff
vertex.color="white",
vertex.frame.color="black",
vertex.label.font=1,
vertex.label.cex=1.3,
vertex.label.color="black",
vertex.size=40,
asp=1,
edge.width=2,
margin= -0.46,
main=NULL)

Attachments

• 16.8 KB Views: 2
Last edited:

hlsmith

Less is more. Stay pure. Stay poor.
Any suggestions on how to turn Na into Na^+ or Na**ion, with the later parts obviously being superscripts. The above code treats any mathematical operator as a mathematical operator in the tree.

Dason

I think I tracked down where you were pulling from. I modified the parseTree function to take in a named vector of expressions which will do some replacements. There's probably a better way to attack the problem but this seems like a simple enough solution.

C-like:
library(igraph)

parser2graph <- function(y, ...){
y$new.id <- seq_along(y$id)
h <- graph.tree(0) + vertices(id = y$id, label= y$text)
for(i in 1:nrow(y)){
if(y[i, 'parent'])
h <- h + edge(c(y[y$id == y[i, 'parent'], 'new.id'], y[i, 'new.id'])) } h <- set_edge_attr(h, 'color', value='black') return(h) } parseTree <- function(string, ignore=c('(',')','{','}'), replace_values = NULL, ...) { dat <- utils::getParseData(parse(text=string)) g <- parser2graph(dat[!(dat$text %in% ignore), ])
leaves <- V(g)[!degree(g, mode='out')]                             # tree leaves
preds <- sapply(leaves, neighbors, g=g, mode="in")                 # their predecessors
vertex_attr(g, 'label', preds) <- vertex_attr(g, 'label', leaves)  # bump labels up a level

#### This if statement is what I added.
#### It's not particularly clean and there's probably a better way
#### to do the replace. Just not thinking great today
if(!is.null(replace_values)){
lab <- vertex_attr(g, 'label', preds)
idx <- match(lab, names(replace_values))
replace_id <- which(!is.na(idx))
lab[replace_id] <- replace_values[idx[replace_id]]
vertex_attr(g, 'label', preds) <- lab
}

g <- g - leaves                                                    # remove the leaves
gaps <- V(g)[!nchar(vertex_attr(g, 'label'))]                      # gaps where ()/{} were
nebs <- c(sapply(gaps, neighbors, graph=g, mode='all'))            # neighbors of gaps
g <- add_edges(g, nebs, color='black')                              # edges around the gaps
g <- g - V(g)[!nchar(vertex_attr(g, 'label'))]                     # remove leaves/gaps
plot(g, layout=layout.reingold.tilford, ...)
}

string = "(2*Na) + (Glucose / 18) + (BUN / 2.8)"
#### vals is the named vector of expressions.  The name in the vector will
#### be replaced with whatever expression is given.
vals <- c(Na = expression("Na"**"ion"), Glucose = expression("Glucose"[2]))

parseTree(string,  # plus some graphing stuff
replace_values = vals,
vertex.color="white",
vertex.frame.color="black",
vertex.label.font=1,
vertex.label.cex=1.3,
vertex.label.color="black",
vertex.size=40,
asp=1,
edge.width=2,
margin= -0.46,
main=NULL)

hlsmith

Less is more. Stay pure. Stay poor.
Yeah, I think I was using and defining the function, because the package would not install. I'll check this out this afternoon. I had went to a network analysis workshop once awhile back and remember kind of playing with igraph.

hlsmith

Less is more. Stay pure. Stay poor.
P.S., I got the code to work in Python when using a server account. I couldn't get a package originally added in Python due to admin rights. But I was able to due what the above was a work around for.

spunky

Can't make spagetti
Sure. Here is the function and a brief snippet of how it works and what it used for:

$$Y(g,h) = \frac{e^{gZ}-1}{g}e^{\frac{hZ^{2}}{2}}$$

The function above is the stochastic representation of a family of probability distributions discovered by Tukey in 1977 known as the g-and-h distributions. They're a very flexible, powerful family of distributions that can approximate other (difficult) distributions very efficiently. And are ideal for computer simulations (which, as you know, it's kind of my thing). They exist in 3 varieties: g distributions (when h=0), h distributions (when g=0) and the general form posted above. As you can imagine, for g or h distributions, the definition changes a little bit to bypass the awkwardness of dividing by 0. But I'm sharing the most general form.

The g parameter can be any real number, but h is constrained to be positive. And, as expected, $$Z \sim N(0,1)$$.

Hope...that...helps?

hlsmith

Less is more. Stay pure. Stay poor.
@spunky - did I get this right? The hist(Y) looks lackluster, but obs=41 is pretty large.
Code:
set.seed(261)
n=100
g=rnorm(n, 15, 1)
h=rgamma(n, shape=50, rate=3)
hist(h)
Z=rnorm(n, 0, 1)
Y = ((exp(g*Z) - 1) / g)*(exp((h*(Z**2))/2))
summary(Y)
Y
hist(Y)
Se = data.frame(g,h,Z,Y)
Se

hlsmith

Less is more. Stay pure. Stay poor.
@spunky What do you think of this? I had to change the seed to make sure I didn't get that huge "outlier", which I ain't proud of.

Code:
set.seed(26128)
n=1000
g=rnorm(n, 0.12, 0.001)
g
hist(g)
h=rgamma(n, shape=0.4, rate=10)
hist(h)
Z=rnorm(n, 0, 1)
Y = ((exp(g*Z) - 1) /g)*(exp((h*(Z**2))/2))
summary(Y)
Y
hist(Y)
TS = data.frame(g,h,Z,Y)
TS
ob g h Z Y

1 0.1208209 1.940753e-03 0.7383277 0.7726692

2 0.1185718 2.007186e-02 0.9144758 0.9740368

3 0.1183718 9.014531e-03 -0.4701954 -0.4578057

4 0.1192093 8.113346e-09 0.2867066 0.2916624

5 0.1183407 8.035741e-02 -0.5705224 -0.5589513

6 0.1200523 2.445499e-02 -0.6304769 -0.6101657

hlsmith

Less is more. Stay pure. Stay poor.
@Dason @spunky

Thank you, I better understand now. I have seen such things, but had 'now' experience in this area!
Code:
n=10000
#if g = h = 0, normal
g=0.25
h=0.05 #when equal 0, log-normal
Z=rnorm(n, 0, 1)
b=0.5 #sigma
a=1.0 #mu
Y = a+(b*((exp(g*Z) - 1) /g)*(exp((h*(Z**2))/2)))
summary(Y)
Y
hist(Y)
TS = data.frame(a,b,g,h,Z,Y)
TS

Last edited: