I have the following data frame:
ID<-seq(1:5) #patient ID
snp1<-c("A","T","A","A","T")
snp2<-c("C","C","0","C","C")
snp3<-c("A","G","A","A","G")
snp4<-c("T","0","C","G","T")
snp5<-c("G","G","G","G","A")
dat<-data.frame(ID,snp1,snp2,snp3,snp4,snp5)
print(dat)
which gives:
ID snp1 snp2 snp3 snp4 snp5
1 1 A C A T G
2 2 T C G 0 G
3 3 A 0 A C G
4 4 A C A G G
5 5 T C G T A
I am trying to use a nested for loop to calculate the number of occurrences of a given value for each column in dat. To start, I create an empty data frame where the columns are snps1-5 and the rows indicate the possible values each column can take in dat:
results<- data.frame(matrix(0,ncol = 5, nrow = 5))
colnames(results)=c("snp1","snp2","snp3","snp4","snp5")
rownames(results)=c("A","T","C","G","0")
To make sure the code I want to incorporate in my loop works, I do the following:
results["A","snp1"]<-nrow(subset(dat,subset= snp1=="A"))
print(results)
which correctly gives 3 for snp1 in dat having A three times:
snp1 snp2 snp3 snp4 snp5
A 3 0 0 0 0
T 0 0 0 0 0
C 0 0 0 0 0
G 0 0 0 0 0
0 0 0 0 0 0
I then use the following nested for loop to do the same for each column (first for loop) but repeat the process for each of the possible values a column in dat can take (second for loop):
for(i in colnames(results)){for(j in c("A","T","C","G","0")){
snp<-as.name(i)
results[j,i]=nrow(subset(dat,subset= snp==j))
results
}}
print(results)
which gives a data frame completely filled with 0's:
snp1 snp2 snp3 snp4 snp5
A 0 0 0 0 0
T 0 0 0 0 0
C 0 0 0 0 0
G 0 0 0 0 0
0 0 0 0 0 0
I've spent hours online trying to determine what the problem is but am at loss for an explanation. I was originally hoping to do this process depending on the value of a phenotype column added to dat such that I get counts for cases and controls, but I cannot get past this point. Any suggestions would be greatly appreciated. Thank you.
Write a function that does the right thing for one column, e.g.,
fun = function(x)
table(factor(x, levels = c("A", "C", "G", "T", "0")))
then apply it to all columns
apply(dat[,-1], 2, fun)
Probably it is much better to use NA
rather than 0 to represent missing values; adjust the function to work in that case
fun = function(x)
table(factor(x, levels = c("A", "C", "G", "T")), useNA = "always")