Simplifying for loops in R

#1
Hi all,

I am new to both programming and R. While some components of R I've found
easy to navigate (R seems to be the best software in terms of a
self-teaching program and I've enjoyed learning it), I've really been
struggling to understand how to code for-loops, even in tutorials. For
example, one I've seen within a tutorial is

∑25 =1 ^2
sum <- 0
for(i in 1:25)
sum <- sum + i^2
sum

but this example doesn't fit into using the assignment operator and then
creating loops from there, which I think I would better understand (if this
is possible to do). Other examples I have found also seem very complex.

I was hoping to share a scenario in which a for-loop would be incredibly
helpful for the analyses I am running so that I may better understand how to
write a loop going forward.

I have a huge data set ("studydata") composed of rows of categorical
variables and columns of continuous variables. The data can be organized
into those 3 categorical variables-- let's call them PatientGroup1
PatientGroup2 and ControlGroup. Then there are over 30 measured continuous
variables I need to run t.tests on (let's say three of the variables, for
example, are A, B, and C). so that it looks like this:
A B C ...D,E,F,etc.
PatientGroup1 4 6 8
PatientGroup2 5 7 9
ControlGroup 10 15 20
*repeats*

I've managed to go ahead and assign the values of each variable to the
respective categorical variable using the assignment operator so that I have
"PatientGroup1" "PatientGroup2" and "ControlGroup." I know how to run a
t.test by setting up the following code (though it may be amateur):

Variable_A_PatientGroup1=PatientGroup1$'A'
Variable_A_PatientGroup2=PatientGroup2$'A'
t.test(Variable_A_PatientGroup1, Variable_A_PatientGroup2)

However, if I wanted to create a loop, so that this same code runs for
variables B,C, etc. Is that feasible? Or is there another easier method to
implement? I appreciate any feedback and apologize in advance for my amateur
question.
 
#2
Update: this is the closest I have gotten to getting a for loop to work, but it just spits out 1 column of t test results for the ValTest and one column of t tests for the ValControls, based on the number of subjects in each group. So I believe that a t test was run collectively across all columns, rather than individual results per column? I am not sure exactly sure, but if anyone has feedback to the code below it be incredibly helpful

for (colnum in 2:30) {
print(col)
ValTest=PatientGroup1[, colnum]
ValControl=ControlGroup[, colnum]
t.test(ValTest, ValControl)
}
 
#3
just reading your post briefly here, I think you need to use the 'lapply' type functions. These are preferred to for lists in R. In particular try using 'plyr' package which contain a number of functions useful for 'split apply combine' which is what you are doing here. probably ddply() is the function for you.

You will probably:
1) melt data to one row per variable/patient. this will feel weird at first, but role with it.
2) use ddply to apply t.test to each variable
3) ddply will combine results and squirt back a dataset.

id say what is written above acounts for 99.99% of statistical programming, possibly more.

good luck and may god have mercy on this analysis.
 
#4
just reading your post briefly here, I think you need to use the 'lapply' type functions. These are preferred to for lists in R. In particular try using 'plyr' package which contain a number of functions useful for 'split apply combine' which is what you are doing here. probably ddply() is the function for you.

You will probably:
1) melt data to one row per variable/patient. this will feel weird at first, but role with it.
2) use ddply to apply t.test to each variable
3) ddply will combine results and squirt back a dataset.

id say what is written above acounts for 99.99% of statistical programming, possibly more.

good luck and may god have mercy on this analysis.
Haha appreciate it. I willl look into those packages. Will this be helpful even if everything is part of the same dataset?
 
#5
yes, should be. the 'melt' function is in the 'reshape2' package i think. if you post data fragment im sure someone can help you write.