hello,
I know this should be easy but my brain can't get it
I have the following part of data
as data is 900,000 rows * 1000 columns
I would like to have another data frame where it keeps only the rows of snps with minor allele frequency ( frequency of number 2 in the data) > 5% and remove the ones with MAF < 5%
basically I want the following out put for example ( represent the frequency of 0,1,2 for each snp
and then remove rs2342723 since proportion of 2 = 0.03
I used the following code
data <- datasnp[rowSums(datasnp==2)/ncol(datasnp) > 0.05, ] is this correct??
I know this should be easy but my brain can't get it
I have the following part of data
as data is 900,000 rows * 1000 columns
Code:
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 rs987435 C G 0 2 1 2 2 1 1
2 rs345783 C G 0 0 0 0 0 0 0
3 rs955894 G T 2 1 2 2 2 2 2
4 rs6088791 A G 0 1 1 0 1 2 2
5 rs11180435 C T 1 1 1 1 0 2 0
6 rs17571465 A T 2 2 2 2 2 2 2
7 rs17011450 C T 2 2 2 2 2 2 2
8 rs6919430 A C 2 2 2 2 2 2 2
9 rs2342723 C T 0 0 0 0 0 0 1
10 rs11992567 C T 2 2 2 2 2 2 2
basically I want the following out put for example ( represent the frequency of 0,1,2 for each snp
Code:
0 1 2
rs2342723 20 10 1
rs11992567 35 20 10
I used the following code
data <- datasnp[rowSums(datasnp==2)/ncol(datasnp) > 0.05, ] is this correct??
Last edited: