efficient R code

Marwah Soliman

New Member
I have the following code in R to creat .ped file and .map file that is used in Plink "open-source whole genome association analysis toolset"

I need more efficient R code since the data I have is 905460 rows and 1120 columns
and it takes forever to run , " I didn't get results for 2 days running"

here is the code that was written for part of my data:

Code:
# raw data
rs987435        C       G       1       1       1       0       2
rs345783        C       G       0       0       1       0       0
rs955894        G       T       1       1       2       2       1
rs6088791       A       G       1       2       0       0       1
rs11180435      C       T       1       0       1       1       1
rs17571465      A       T       1       2       2       2       2
rs17011450      C       T       2       2       2       2       2
rs6919430       A       C       2       1       2       2       2
rs2342723       C       T       0       2       0       0       0
rs11992567      C       T       2       2       2       2       2")

nIndividuals <- ncol(myRaw) - 3
nSNPs <- nrow(myRaw)

# make map, easy
MAP <- data.frame(
CHR = 1,
SNP = myRaw$V1, CM = 0, BP = seq(nSNPs)) # get first 6 columns of PED, easy PED6 <- data.frame( FID = seq(nIndividuals), IID = seq(nIndividuals), FatherID = 0, MotherID = 0, Sex = 1, Phenotype = 1) # convert 0,1,2 to genotypes, a bit tricky # make helper dataframe for matching alleles myAlleles <- data.frame( AA = paste(myRaw$V2, myRaw$V2), AB = paste(myRaw$V2, myRaw$V3), BB = paste(myRaw$V3, myRaw\$V3))

# make index to match with alleles
PEDsnps <- myRaw[, 4:ncol(myRaw)] + 1

# convert
PEDsnpsAB <-
sapply(seq(nSNPs), function(snp)
sapply(PEDsnps[snp, ], function(ind) myAlleles[snp, ind]))

# column bind first 6 cols with genotypes
PED <- cbind(PED6, PEDsnpsAB)

#output PED and MAP
write.table(PED, "gwas.ped", quote = FALSE, col.names = FALSE, row.names = FALSE, sep = "\t")
write.table(MAP, "gwas.map", quote = FALSE, col.names = FALSE, row.names = FALSE, sep = "\t")
the .ped file have the following coulmns

The PED file is a white-space (space or tab) delimited file: the first six columns are mandatory:
Family ID
Individual ID
Paternal ID
Maternal ID
Sex (1=male; 2=female; other=unknown)
Phenotype

expect results look like:
Code:
FAM001  1  0 0  1  2  A A  G G  A C