Search code examples
rdplyrspssr-haven

R - Haven - SPSS (.sav): Iterate over all columns and replace name and label of columns


for a project I needed to merge an Excel and an SPSS file with R.
Not sure if this was my best idea. I got the merge done, however in the process I had to use the attribute(col)$label as the name in order to work.

My final merged data.frame has thus way to long column names including special characters (e.g. :). Here are the first few examples how the colnames of the current merged df look like

colnames(combined_retro)
  [1] "Zeitpunkt zu dem das Interview begonnen hat (Europe/Berlin)"                                                             
  [2] "Studiencode: [01]"                                                                                                       
  [3] "Format"                                                                                                                  
  [4] "Geschlecht"                                                                                                              
  [5] "Alter (direkt): Ich bin   ... Jahre"                                                                                     
  [6] "Staatsangehörigkeit"   

So I created another data.frame Naming_Back in which I have two columns: Name Label

Naming_Back
   Name     Label 
 1 CASE     Interview-Nummer (fortlaufend)                            
 2 SERIAL   Seriennummer (sofern verwendet)                           
 3 REF      Referenz (sofern im Link angegeben)                       
 4 QUESTNNR Fragebogen, der im Interview verwendet wurde              
 5 MODE     Interview-Modus  

So now I would like to iterate over the Columns of my merged data.frame combined_retro and check if the current name of the column (e.g. "Zeitpunkt zu dem das Interview begonnen hat (Europe/Berlin)" is available in the Label column of the second (Naming_Back) data.frame. If it is I would like to exchange the current column name with the one provided by the Name column.

My Current approach is the following loop:

for(i in 1:ncol(retro)) {       # for-loop over columns
  new_name_buffer <- Naming_Back %>% 
    filter(Label == colnames(retro[ , i])) 
  if(!(is_empty(new_name_buffer$Name))){
    colnames(retro[ , i]) <- new_name_buffer$Name
    print(colnames(retro[ , i]))
    print(new_name_buffer$Name)
  }
}

Examples for the print commands from the loop

[1] "Geschlecht"
[1] "SD02"
[1] "Staatsangehörigkeit"
[1] "SD04"
[1] "Staatsangehörigkeit: Anders"
[1] "SD04_04"

So obviously the problem is this line colnames(retro[ , i]) <- new_name_buffer$Name as it does not change the column name. Has anyone a quick idea how to fix it?

EDIT: Found a solution, by creating a character vector and stepwise filling it with either the abbreviated name when available or the old name if not

new_col_names <- c()
for(i in 1:ncol(retro)) {       # for-loop over columns
  new_name_buffer <- Naming_Back %>% 
    filter(Label == colnames(retro[ , i])) 
  if(!(is_empty(new_name_buffer$Name))){
    colnames(retro[ , i]) <- new_name_buffer$Name
    new_col_names <- c(new_col_names, new_name_buffer$Name)
  }
  else{
    new_col_names <- c(new_col_names, colnames(retro[ , i]))
  }
}
colnames(retro) <- new_col_names

EDIT 2: Just found an alternative solution to overwrite a column name while iterating over the columns with a for loop, you can just do names(dataframe)[index] and then just assign a new value with <- "newColName"

  for(i in 1:ncol(retro)) {       # for-loop over columns
  new_name_buffer <- Naming_Back %>% 
    filter(Label == colnames(retro[ , i])) 
  if(!(is_empty(new_name_buffer$Name))){
    names(retro)[i] <- new_name_buffer$Name
    print(colnames(retro[ , i]))
    print(new_name_buffer$Name)
  }
}

Solution

  • The problem (as correctly identified in the comments to the OP by @IRTFM) with the original code was, that the assignment of the new column name: colnames(retro[ , i]) <- new_name_buffer$Name was not working properly, as colnames does not work on an atomic vector.

    I found work-around to overwrite a column, name while iterating over the columns of a data.frame with a for loop. One can just do call names(dataframe)[index] and then just assign a new column name with <- "newColName"in my example the important line would thus look like this:

    Assigning a new column name

    names(retro)[i] <- new_name_buffer$Name
    

    The Complete solution with the for loop

    for(i in 1:ncol(retro)) {       # for-loop over columns
          # Check if a row with the label is available in the Naming_Back dataframe
          new_name_buffer <- Naming_Back %>% 
             filter(Label == colnames(retro[ , I])) 
    
      # When a Name matching the label is found, replace the old name 
      if(!(is_empty(new_name_buffer$Name))){
        names(retro)[i] <- new_name_buffer$Name
      }
    }