Search code examples
dplyrtidyverse

R Data cleaning Row Renaming with wildcards


I am new and trying to use R and very used to using excel for cleaning and visualizing. I cannot seem to find anywhere on how to Rename Row variables using wild cards, like excel. If at all possible I would like to keep the functions in the Tidyverse.

I can go through my data for each "area" but I was hoping to find one of two solutions.
I've seen using like % around the web but cannot find a package this works with. I'm trying to rename the variables in several rows at a time by either the first 3 characters or before a dash. My data looks like...

Destination| Cost
500-1 | $40
500-2 | $100
500-3 | $40
AL-1 | 10
FL-2 | $15
FL-3 | $7

And I would like to rename these rows to a short hand...
500
AL
FL

Using Excel I don't need to rename anything and get my end results, but my data is getting to large and it's taking more time then Ii would like.
My excel formulas look about like...
=coutnif(A:A, "500*") =sumif(A:A,"500*")

So my end result is similar to...

Area | Count | Sum
500 | 3 | $180
AL | 1 | $10
FL | 2 | $7


Solution

  • This is Juani.

    It's nice to see new people starting with R :)

    It's very easy to solve what you're trying to do within the tidyverse.

    Assuming that your dataset is called "df" in your environment, you can do something like this:

    library(tidyverse)
    
    df %>%
       mutate(Destination = gsub("-.*","", Destination))%>%
       group_by(Destination) %>%
       summarize(Count = n(), Cost = sum(Cost))
    

    You can rename Destination to Area in the first part of the mutate statement if you want.

    Also, you may want to remove the "$" of the Cost column and convert it to numeric to avoid some errors. If this is your case, you can directly jump in to this code:

    df %>%
       mutate(Destination = gsub("-.*","", Destination),
              Cost = as.numeric(gsub("\\$", "", Cost))) %>%
       group_by(Destination) %>%
       summarize(Count = n(), Cost = sum(Cost))