Search code examples
rdataframereorganize

data reorganization in r


I have following type of data:

Person <- c("A", "B", "C", "AB", "BC", "AC",  "D", "E")
Father <- c(NA,  NA,  NA,   "A", "B", "C",    NA, "D")
Mother <- c(NA,  NA,  NA, "B",   "C", "A", "C",    NA)
var1 <- c(  1,   2,   3,     4,   2,   1,     6, 9)
var2 <- c(1.4, 2.3, 4.3,  3.4, 4.2, 6.1,   2.6, 8.2)
myd <- data.frame (Person, Father, Mother, var1, var2)

 Person Father Mother var1 var2
1      A   <NA>   <NA>    1  1.4
2      B   <NA>   <NA>    2  2.3
3      C   <NA>   <NA>    3  4.3
4     AB      A      B    4  3.4
5     BC      B      C    2  4.2
6     AC      C      A    1  6.1
7      D   <NA>      C    6  2.6
8      E      D   <NA>    9  8.2

Here is for missing (unknown). I want re-organize data in to trio (an Individual and its Father and Mother). For example trio for AB individual will include data from from its father A and mother B.

 Person Father Mother var1 var2
1      A   <NA>   <NA>    1  1.4
2      B   <NA>   <NA>    2  2.3
4     AB      A      B    4  3.4

A, B, C can not make trio as they do not have parents. Somecases as E has only one parent father known that is D. In this case there will just two members in the trio.

  7      D   <NA>      C    6  2.6
  3      C   <NA>   <NA>    3  4.3

In case where mother and fathers are repeated in two trios the same value will be recycled.

Thus expected complete output would be:

    Person Father Mother var1 var2  Trio 
1      A   <NA>   <NA>    1  1.4     1
2      B   <NA>   <NA>    2  2.3     1
4     AB      A      B    4  3.4     1

2      B   <NA>   <NA>    2  2.3     2
3      C   <NA>   <NA>    3  4.3     2
5     BC      B      C    2  4.2     2

1      A   <NA>   <NA>    1  1.4     3
3      C   <NA>   <NA>    3  4.3     3
6     AC      C      A    1  6.1     3

NA       <NA> <NA>    <NA>  NA  NA     4
3      C   <NA>   <NA>    3  4.3      4
7      D   <NA>      C    6  2.6      4

NA       <NA> <NA>    <NA>  NA  NA     5
7      D   <NA>      C      6  2.6     5
8      E      D   <NA>      9  8.2     5     

Solution

  • This maybe roughly what you want

    Person <- c("A", "B", "C", "AB", "BC", "AC",  "D", "E")
    Father <- c(NA,  NA,  NA,   "A", "B", "C",    NA, "D")
    Mother <- c(NA,  NA,  NA, "B",   "C", "A", "C",    NA)
    var1 <- c(  1,   2,   3,     4,   2,   1,     6, 9)
    var2 <- c(1.4, 2.3, 4.3,  3.4, 4.2, 6.1,   2.6, 8.2)
    myd <- data.frame (Person, Father, Mother, var1, var2,stringsAsFactors=F)
    

    note the slight change in definition of myd using stringsAsFactors=F

    parentage<-function(x,myd){
        y<-myd[x,]
        p1<-as.character(y['Father'])
        p2<-as.character(y['Mother'])
        out<-y
        if(!is.na(p1)){
            out<-rbind(out,myd[myd$Person==p1,])
        }
        if(!is.na(p2)){
            out<-rbind(out,myd[myd$Person==p2,])
        }
        out$Trio=x
        out
    }
    
    ans<-lapply(seq_along(myd$Person),parentage,myd)
    
     > ans
    [[1]]
      Person Father Mother var1 var2 Trio
    1      A   <NA>   <NA>    1  1.4    1
    
    [[2]]
      Person Father Mother var1 var2 Trio
    2      B   <NA>   <NA>    2  2.3    2
    
    [[3]]
      Person Father Mother var1 var2 Trio
    3      C   <NA>   <NA>    3  4.3    3
    
    [[4]]
       Person Father Mother var1 var2 Trio
    4      AB      A      B    4  3.4    4
    2       A   <NA>   <NA>    1  1.4    4
    21      B   <NA>   <NA>    2  2.3    4
    
    [[5]]
      Person Father Mother var1 var2 Trio
    5     BC      B      C    2  4.2    5
    2      B   <NA>   <NA>    2  2.3    5
    3      C   <NA>   <NA>    3  4.3    5
    
    [[6]]
       Person Father Mother var1 var2 Trio
    6      AC      C      A    1  6.1    6
    3       C   <NA>   <NA>    3  4.3    6
    31      A   <NA>   <NA>    1  1.4    6
    
    [[7]]
      Person Father Mother var1 var2 Trio
    7      D   <NA>      C    6  2.6    7
    3      C   <NA>   <NA>    3  4.3    7
    
    [[8]]
      Person Father Mother var1 var2 Trio
    8      E      D   <NA>    9  8.2    8
    7      D   <NA>      C    6  2.6    8
    

    if you want to have a dataframe you can use the plyr package

    library(plyr)
    ans<-adply(seq_along(myd$Person),1,parentage,myd)