Search code examples
rreshapeflat-file

Creating a true flat file from a semi-flat file


I have the following dataset (this is the first 10 lines, there are >500). Each row depicts a unique location in space and the date and crew that set up the site. The NUMGB column is how many bears were caught at that site, and the GBIDSEX1, to GBIDSEX8 provides the individual identifier for those bears that were caught at the site

df <- read.table(text= 'CELL SITE INSTCREW INSTDATE NUMGB GBIDSEX1 GBIDSEX2 GBIDSEX3 GBIDSEX4 GBIDSEX5 GBIDSEX6 GBIDSEX7 GBIDSEX8
1 1 DF/EJ 2000-06-02          
2 1 DF/EJ 2000-06-02  3 F6514-001 M6514-012 F6514-023     
3 1 DF/EJ 2000-06-02  1 SPTEST       
4 1 DF/EJ 2000-06-02          
5 1 DF/EJ 2000-06-02  1 M6643-015       
6 1 DF/EJ 2000-06-02          
7 1 DF/EJ 2000-06-02  1 M6514-012       
8 1 DF/EJ 2000-06-02  2 F6566-001 M6509-005      
9 1 DF/EJ 2000-06-02          
10 1 DF/KP 2000-06-01  5 M6505-001  M6505-010 F6514-001 M6514-012 F6514-02   ' ,header=TRUE, sep=" ")

For the analysis I would like to do, I need to have each bear ID as a unique row with the site, crew and date info also. Here is an example of how I would like the data to look:

CELL    SITE    INSTCREW    INSTDATE    NUMGB   GBIDSEX
2   1   DF/EJ   2000-06-02  3   F6514-001
2   1   DF/EJ   2000-06-02  3   M6514-012
2   1   DF/EJ   2000-06-02  3   F6514-023
3   1   DF/EJ   2000-06-02  1   SPTEST
5   1   DF/EJ   2000-06-02  1   M6643-015
7   1   DF/EJ   2000-06-02  1   M6514-012
8   1   DF/EJ   2000-06-02  2   F6566-001
8   1   DF/EJ   2000-06-02  2   M6509-005
10  1   DF/KP   2000-06-01  5   M6505-001
10  1   DF/KP   2000-06-01  5   M6505-010
10  1   DF/KP   2000-06-01  5   F6514-001
10  1   DF/KP   2000-06-01  5   M6514-012
10  1   DF/KP   2000-06-01  5   F6514-023

I'd like to cull rows where no bears were caught, which is easily done with a subset function:

dfcull<-subset(df, NUMGB>0)

But I'm unsure how to create a unique row for each GBIDSEX. Any thoughts?


Solution

  • You can get this quite easily with melt from "reshape2":

    library(reshape2)
    df[df == ""] <- NA
    melt(df, na.rm=TRUE, 
         id.vars=c("CELL", "SITE", "INSTCREW", "INSTDATE", "NUMGB"))
    #    CELL SITE INSTCREW   INSTDATE NUMGB variable     value
    # 2     2    1    DF/EJ 2000-06-02     3 GBIDSEX1 F6514-001
    # 3     3    1    DF/EJ 2000-06-02     1 GBIDSEX1    SPTEST
    # 5     5    1    DF/EJ 2000-06-02     1 GBIDSEX1 M6643-015
    # 7     7    1    DF/EJ 2000-06-02     1 GBIDSEX1 M6514-012
    # 8     8    1    DF/EJ 2000-06-02     2 GBIDSEX1 F6566-001
    # 10   10    1    DF/KP 2000-06-01     5 GBIDSEX1 M6505-001
    # 12    2    1    DF/EJ 2000-06-02     3 GBIDSEX2 M6514-012
    # 18    8    1    DF/EJ 2000-06-02     2 GBIDSEX2 M6509-005
    # 20   10    1    DF/KP 2000-06-01     5 GBIDSEX2 M6505-010
    # 22    2    1    DF/EJ 2000-06-02     3 GBIDSEX3 F6514-023
    # 30   10    1    DF/KP 2000-06-01     5 GBIDSEX3 F6514-001
    # 40   10    1    DF/KP 2000-06-01     5 GBIDSEX4 M6514-012
    # 50   10    1    DF/KP 2000-06-01     5 GBIDSEX5  F6514-02