Error bars on log-transformed plots

#1
Hello - I am a genetics researcher. I have a series of data points with errors (standard errors), that I wish to plot as a graph plot:

GENE, AVG FOLD CHANGE, SE

Gene1, 2193.10, 1200.74
Gene2, 96.28, 9.08
Gene3, 39.02, 22.51
Gene4, 5.88, 0.82
Gene5, -0.68, 0.33
Gene6, 1.14, 0.02
Gene7, -1.46, 0.16
Gene8, -1.56, 0.50
Gene9, -1.58, 0.10
Gene10, -1.88, 0.45
Gene11, -2.04, 0.45
Gene12, -6828.82, 975.41

Positive values are up-regulated genes; negative values are down-regulated genes (re: gene expression levels).

I wish to plot this as a column plot on a log scale (y-axis) with negative values below the zero baseline, positive values above, and with the errors indicated.

Something like:

1000
100
10 *
1 *
0-------------------
-1 *
-10 *
-100
-1000

but with bars instead of the asterisks - you get the idea. I can do this easily enough using MS Excel, by taking the log of the absolute value, multiplying the result by +1 or -1 (to restore the original "directionality" - i.e. up- or down-regulated).

A couple of questions:

(Q1) Is it "better" to use log (base 10) or ln (natural) log transformations?

(Q2) How would I present the error bars - would I log (or ln) -transform the standard errors, for example, and plot these [or the absolute values of these, since the log of numbers <1 are negative; e.g. log(0.5) = -0.301)]?

I tried finding the answer to these questions in Google, but I wasn't very successful. ...

I would very much appreciate any comments regarding the log-transformation of data and plots of log-transformed data, particularly regarding error bars!

Thank you!

Sincerely,

Greg :)
 
#2
Hello - I think I have this right ...

Referring to the attached MS Excel spreadsheet, I first log-transformed my data,

x = log( |x| + 1)

using the absolute values (to avoid taking the log of negative numbers) and adding 1 (to avoid taking the log of zero).

Next, I multiplied these log-transformed values by +1 (to indicate up-regulated genes) or -1 (to indicate down-regulated genes).

Last, I calculated the mean and standard error of these log-transformed data, and plotted the results.

I think that this is correct - please comment, if I am mistaken.

Thank you! Greg :)
 

Mean Joe

TS Contributor
#3
I don't really know about error bars on log-transformations, but your reasoning is very sound. But I would maybe do the transformation a little differently.

First off, to answer a question in your first post: pick the base for the logarithm that makes your data fit into a graph nicer. What I mean is, if you have large values like 10 trillion, then you'd probably want to do a log10 transformation than a natural-log transformation. If you have numbers in a range of 6-7, then doing a log10 transformation would transform into numbers < 1, which may be hard to visualize on a graph.

Onto your plot, instead of transforming the values and then taking the mean (which you did), you could take the transform of the mean. Look at gene2: the untransformed mean is -0.150 and the standard error is 1.21. So the untransformed interval would be -0.15 +/- 1.21 = -1.36 to +1.06.

I'm suggesting you could transform this interval instead, e.g. log(|-1.36|+1) to log(|1.06|+1) yields -0.373 to +0.314; and your transformed mean would be log(|-0.150|+1)=-0.061.

You'll notice that now the bars are not symmetric about the transformed mean (which you may find undesirable), but it is possible to back-transform from this interval to find the untransformed interval. Also, the back-transform for the mean would be (10^0.061)* -1 = -0.15 [note the signs used for the exponent, and the answer; it's the same as how you figured to make the up/down-regulation correction]

With the way you've done it, mean of transform = 0.144 and stdev of transform = 0.170, you cannot back transform, e.g. (10^0.144) -1 = 0.393 (not -0.150), and the interval is also not exactly back-transformed.
 
#4
Is this correct (refer to updated attachment - MS Excel sheet)?

I'm pretty certain that I am doing this correctly, but I am troubled by the wide dispersion of the error bars in the log plot - it looks like none of my data points are significantly different from one another?

Thanks, Greg :)
 
#5
Hello - I realized what my "error" was ... The log-transformed SE endpoints that I calculated, as directed, are in fact defining the SE *range* - not the amount of the error above and below the mean.

Furthermore, both the sign (positive or negative) of the log-transformed means and standard error range (interval) endpoints are dependent on the sign of the means and range endpoints in the untransformed (raw; linear) dataset. Hence, I needed to keep track of this, to assign the correct sign (+ or -) in the final data values (means and upper and lower SE's of the log-transformed) data.

I updated and annotated my Excel spreadsheet (uploaded), that I'm now satisfied with, in case anyone wants to comment on it, or for future reference.

Thanks four your help, appreciated!

Sincerely, Greg :)
 
#6
Your numbers seem wierd to me though my experience is limited to one bioinformatics class.

A 2000 fold increase is like saying if there was 1 before then now i have more than there are number of atoms in the universe. Genes 1,2, and 12 have average fold changes that do not make sense. Gene 3 probably should be on the list too.