Today I Learned: ____

bryangoodrich

Probably A Mammal
The unlist(res) would throw all the different columns into one list to sample from. I distinctly wanted them to be distinct samples so I can see an example of what sort of data items are in each field meeting my criteria. If I'm going to use an anonymous function, then it defeats the purpose. I'd just wrap sample myself. What I would need to do is take each element of res and unlist it to make it a vector. Then it would feed into sample just fine. This would be avoided if I could run sqlQuery (RODBC) and return a single field as a vector. I could have tweaked it or pre-processed res so that each item was converted to a vector before running lapply. However, I wanted to work with what I had, so

Code:
Vectorize(sample)(data.frame(A = letters), 5)
There's also the advantage to this that the return object is itself not a vector, which may be an advantage in this case: table-in to table-out.

I must have misread your solution. The lapply(lapply(res, unlist), sample, 5) would be an appropriate (and more efficient) approach. It's just embedding the pre-processing required to supply a vector input.
 

Jake

Cookie Scientist
I must have misread your solution. The lapply(lapply(res, unlist), sample, 5) would be an appropriate (and more efficient) approach.
It appears so. But again, efficiency is not the issue I'm interested in. The issue is that the use of Vectorize() that you showed is strange and slightly "magical", and so IMO should be avoided.
 

trinker

ggplot2orBust
TIL why 1:length(x) is bad and seq_along(x) is good. Consider:

Code:
> x <- LETTERS
> 
> 1:length(x)
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
[26] 26
> seq_along(x)
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
[26] 26
> 
> x <- NULL
> 
> 1:length(x)
[1] 1 0
> seq_along(x)
integer(0)
 

Lazar

Phineas Packard
TIL: You should not use equality tests for floats but rather a small tolerence
Code:
> (3.14e0 + 1e10) - 1e10 == 3.14
[1] FALSE
> (3.14e0 + 1e10) - 1e10 - 3.14 < .0001
[1] TRUE
 

TheEcologist

Global Moderator
TIL: You should not use equality tests for floats but rather a small tolerence
Code:
> (3.14e0 + 1e10) - 1e10 == 3.14
[1] FALSE
> (3.14e0 + 1e10) - 1e10 - 3.14 < .0001
[1] TRUE
A IMO better way would be to use R's formal results checking functions:

Code:
all.equal((3.14e0 + 1e10) - 1e10,3.14,tolerance = .Machine$double.eps)
## [1] "Mean relative difference: 1.943795e-07"
You get the error, but can also set the tolerance like this

Code:
all.equal((3.14e0 + 1e10) - 1e10,3.14,tolerance = 1e-6)
## [1] TRUE
 

trinker

ggplot2orBust
@TE nice. I use all.eall.equal but never knew about the tolerance argument. Also are the results a character string? On phone so can't check.
 

TheEcologist

Global Moderator
@TE nice. I use all.eall.equal but never knew about the tolerance argument. Also are the results a character string? On phone so can't check.
Here you go!
:)

Code:
str(all.equal((3.14e0 + 1e10) - 1e10,3.14,tolerance = .Machine$double.eps))
## chr "Mean relative difference: 1.943795e-07"
 

Dason

Ambassador to the humans
Which is why wrapping it in isTRUE() is the preferred approach when using all.equal

Code:
> isTRUE(all.equal(3,3))
[1] TRUE
> isTRUE(all.equal(3,2))
[1] FALSE
> all.equal(3,2)
[1] "Mean relative difference: 0.3333333"
 

TheEcologist

Global Moderator
Which is why wrapping it in isTRUE() is the preferred approach when using all.equal
There is no one truth here. Use isTRUE when you want a TRUE of FALSE response. Don't If - like me - you are actually interested in the "mean relative difference".

Especially when working with flatfiles ("chunk-based iteration") that is more useful as statistics converge to the "TRUE" value but NEVER get there.
 

Dason

Ambassador to the humans
I guess. But if you actually want to use that value I would think it would make more sense to calculate it directly rather than rely on all.equal which might give TRUE or it might give a character output. Seems like if you're interested in the mean relative difference all.equal isn't the cleanest way to get it.
 

TheEcologist

Global Moderator
I guess. But if you actually want to use that value I would think it would make more sense to calculate it directly rather than rely on all.equal which might give TRUE or it might give a character output. Seems like if you're interested in the mean relative difference all.equal isn't the cleanest way to get it.
Code:
isTRUE(all.equal(TRUTH,ESTIMATE, tolerance=MySpecificThreshold))
I get your point, and the above would actually be a decent way to gauge convergence automatically.

I was referring to casual convergence checks in the console, while prototyping scripts, and then it's pretty clean. I think that the fact that it returns the "mean relative difference" is pretty neat, and that the "FALSE" may not truly be a FALSE in many cases - or atleast it may be meaningless.
 

Dason

Ambassador to the humans
I guess I would think it would be cleaner if it returned a consistent result. Maybe a list with TRUE/FALSE as the first element and the mean relative difference as the second element and then give it a class and a specific print method. But if we could easily grab the TRUE/FALSE and easily grab the mean relative difference I think it would be much better.
 

noetsi

Fortran must die
That you really can load 5000+ R programs at one time. That spunky is more popular than a poster who vanished before I came to this board.
 

TheEcologist

Global Moderator
I guess I would think it would be cleaner if it returned a consistent result. Maybe a list with TRUE/FALSE as the first element and the mean relative difference as the second element and then give it a class and a specific print method. But if we could easily grab the TRUE/FALSE and easily grab the mean relative difference I think it would be much better.
Agreed........ [but my response must be at least 10 characters]
 

Dason

Ambassador to the humans
It's actually more work to do that but I find that sometimes you just want the message to be concise and don't want to have type jibberish just so it can process.
 

Dason

Ambassador to the humans
Yeah but it looks weird. I'd rather it just be the short message which is why I know I add the [noparse]\(\hspace{1cm}\)[/noparse] stuff instead of using that other hack.