I have a file with 10,1900 lines with Delimiter as 5 ('|') [obviously 6 columns now] , and I have statement in sixth column like "Dropped 12 (0.01%)" !! I am longing to extract the number after Dropped within brackets;
Actual -- Dropped 12 (0.01%)
Expected -- 0.01
I need a solution using Apache pig.
You are looking for the REGEX_EXTRACT
function.
Let's say you have a table A
that looks like:
+--------------------+
| col1 |
+--------------------+
| Dropped 12 (0.01%) |
| Dropped 24 (0.02%) |
+--------------------+
You can extract the number in parenthesis with the following:
B = FOREACH A GENERATE REGEX_EXTRACT(col6, '.*\\((.*)%\\)', 1);
+---------+
| percent |
+---------+
| 0.01 |
| 0.02 |
+---------+
I'm specifying a regex capture group for whatever characters are between (
and %)
. Notice that I'm using \\
as the escape character so that I match the opening and closing parenthesis.