I have a column populated with string with same pattern *.stage1. I want to grab every string, copy every string to another column as a bullet point; trim out ".stage1" and populate the first column with the every character before ".stage1".
This will save a lot of time, can you suggest a package that can help me create this script?
Thanks, Mago
Copying the column should not be an issue. You can make the altered version with sub
.
## Some sample data
df = data.frame(x = paste0("A", 1:9, ".stage1"))
> df
x
1 A1.stage1
2 A2.stage1
3 A3.stage1
4 A4.stage1
5 A5.stage1
6 A6.stage1
7 A7.stage1
8 A8.stage1
9 A9.stage1
df$x2 = df$x
df$x = sub("(.*)\\.stage1", "\\1", df$x)
df
x x2
1 A1 A1.stage1
2 A2 A2.stage1
3 A3 A3.stage1
4 A4 A4.stage1
5 A5 A5.stage1
6 A6 A6.stage1
7 A7 A7.stage1
8 A8 A8.stage1
9 A9 A9.stage1
Some extra detail on the sub
statement.
sub
will replace everything matching the first expression with the second one. What are those expressions?
First expression: "(.*)\\.stage1"
. matches any character.
.* matches any number of characters.
Because .* is in parentheses, whatever it matches will be stored in a variable called \1.
So "(.*)\\.stage1" will match the string ".stage1" and everything before it storing the characters before .stage1 in \1.
Second expression: "\\1"
We want to replace this with just the characters before, so the replacement string is "\\1".