Search code examples
rdataframepopulation

R question: I modified two categorical columns, how do I embedded back into the original dataframe?


I'm fairly new to R, and after taking a 2 hour free course on Youtube, I feel no better. I'm trying to learn so I hope someone can help me out! I feel close to the answer but here I am :D I have a dataset, and I've modified the two columns by editing them as strings (characters). They consists of first(1st column) and last names(2nd column) of people so I was ordered to remove punctuations therefore, had to edit them as strings.Now I'm unsure how to add them back into the dataframe. Here is where I'm at.

    # FILE: Vaccine_CSV
    # INSTALL AND LOAD PACKAGES 
    library(datasets)  # Load base packages manually

    # Use pacman to load add-on packages as desired
    pacman::p_load(pacman, rio) 

    # Importing CSV from desktop
    Vaccine_CSV <- import("~/Desktop/Vaccine CSV.csv")

    # Summary 
    summary(Vaccine_CSV)

    # Transform lowercases in data into upper case
    Vaccine_CSV = as.data.frame(sapply(Vaccine_CSV, toupper))


    Vaccine_CSV$FirstName
    Vaccine_CSV$LastName

    # Trim the spaces between the names
    trimws(Vaccine_CSV$FirstName) 
    trimws(Vaccine_CSV$LastName) 

    # First and last names combined
    FirstNameFixed<- Vaccine_CSV [, c(3)]
    LastNFixed<- Vaccine_CSV [, c(4)]

     # Trimming inside the first name column
    FirstNameFixed <- gsub("\\-", "", FirstNameFixed)
    FirstNameFixed <- gsub("\\s", "", FirstNameFixed)
    FirstNameFixed <- gsub("\\'", "", FirstNameFixed)

    # Trimming inside last name column 
    LastNFixed<- gsub("\\-", "", LastNFixed)
    LastNFixed <- gsub("\\s", "", LastNFixed)
    LastNFixed<- gsub("\\'", "", LastNFixed)

Solution

  • I think dplyr package will be a friend here.

    Once you have applied toupper, your code can be writen as shown:

    library(dplyr)
    Vaccine_CSV$FirstName <- trimws(.) %>% gsub("\\-", "",.) %>% gsub("\\s", "",.) %>% gsub("\\'", "",.)
    

    and dataframe columns will be changed.

    On the other hand, if you want to work with lists or vectors and not with data frames, once you have FirstNameFixed and LastNFixed with all operations done, you can combine them:

    new_df <- cbind(FirstNameFixed,LastNFixed)
    

    And if you want to substitute them into data frame:

    Vaccine_CSV$FirstName <- FirstNameFixed
    Vaccine_CSV$LastName <- LastNFixed