I'm trying to write a regex that can match ONLY those lines where every white-space separated token occurs exactly twice, regardless of order.
For example, the entirety of the following lines should match:
1 1 2 2
100 10 10 100
A B B A
HELLO HELLO
The following lines should NOT match:
hello hello hello
1 1 22
1001
Although I'm able to match individual groups of repetitions in a given line, using the regex (\d+)(?=.*(\1))
, I am struggling to match entire lines using ^$
. My guess is that when I use the lookahead, this is creating an infinite loop where we constantly look at every token (including the repetitions) and expect a repeat later in the string, although I am not sure how to resolve this. Any ideas?
Thanks!
[EDIT]: To add some details, based on questions from the comments: Obviously it is fairly trivial to implement this as a function in most programming languages. However,I was originally looking to implement this as a regex, because I was trying to match certain records in a database. Thus, this regex was intended to be embedded as a CASE statement in a SQL query, which in my mind would've been a nice way make the selection.
Given the apparent complexity of such a regex, it seems that creating a function is the way to go, and thus pretty much any of the following answers would be fine solutions, circumstances depending.
You could create a function to split the string and count the occurrences of each word in it, returning true
if all the words have a count of 2:
create function all_pairs(v text) returns bool as $$
with counts as (
select count(*) as c
from unnest(string_to_array(v, ' ')) as vals(val)
group by val
),
arr as (
select array_agg(c) as cc
from counts
)
select 2 = all(cc)
from arr
$$ language sql
Then you can simply call the function to test your string. For example:
with values as (
select '1 1 2 2' as v union all
select '100 10 10 100' union all
select 'A B B A' union all
select 'HELLO HELLO' union all
select 'hello hello hello' union all
select '1 1 22' union all
select '1001'
)
select *
from values
where all_pairs(v)
Output:
v
1 1 2 2
100 10 10 100
A B B A
HELLO HELLO
Demo on dbfiddle.uk