I have a JSON file in which I put my SQL query and it will later on be run automatically on Hive.
Structure of the JSON is as-follow:
{
"name": "query1",
"query": "select regexp_extract(column, '(.*)\\s\\|', 1) as column_one from data"
}
The idea is to extract everything until the first space + vertical bar.
For a given example, when I try in Hive, it works as expected:
select regexp_extract('First part | Second Part', '(.*)\\s\\|', 1) as column_one;
First part
As you can see you already need one more backslash for \s
and \|
otherwise is does not work in Hive.
However when run automatically using the JSON file, I get the following:
Fir
Then I understood you need another backslash to escape it in JSON so I went with the following:
{
"name": "query1",
"query": "select regexp_extract(column, '(.*)\\\s\\\|', 1) as column_one from data"
}
But still, it gives me the Fir
instead of First part
.
You can use
{
"name": "query1",
"query": "select regexp_extract(column, '([^|]*?)[[:space:]]*[|]', 1) as column_one from data"
}
Note that
([^|]*?)
- Captures any zero or more chars other than |
as few as possible[[:space:]]*
- zero or more whitespace chars[|]
- a literal |
char (inside a character class, [...]
, the |
char is treated as a literal pipe symbol).