Search code examples
rdata-transform

Search and format the string


Here is my data at below,

Data

So in my activity description column I have many charges.

Some string contain pattern like charge, charges, or nothing.

So at first, 1. I need to find for pattern named charge and replace with charges.

  1. But for 2 of the charges named container charges and store charges I need to name as charge instead of charges. Ex. Container charge not container charges.

  2. If no pattern named charge is present I need to place charges at end of the string.

For Ques 1, I tried below code in R,

    df$Activity description = gsub("*charge","charges",df$Activity description)

But it replacing additional s in the output as Ex. Chargess. I dont know why.

For ques 2 and 3, I dont know how to start.

Can anyone help me on this.


Solution

  • First, I highly recommend you use headers without spaces (ex. Activity_description).

    Next, you probably want to use a series of if-else statements:

    new_column <- c()
    for (line in df$Activity_description){
        # check for the two specific cases
        if (line == "Container Tracking Charges"){
            new_column <- c(new_column, "Container Tracking Charge")
        } else if (line == "Store Tracking Charges"){
            new_column <- c(new_column, "Store Tracking Charge")
        } else if (grepl("Charge$", line)){
            new_column <- c(new_column, paste(line,"s",sep=""))
        } else if (! grepl("Charge", line)){
            new_column <- c(new_column, paste(line,"Charges"))
        } else {
            new_column <- c(new_column, line)
        }
    }
    

    You may then set the original column using the new character vector:

    df$Activity_description <- new_column
    

    This may be a bit simple since it's done in base R, but it should at least get you started.