TL;DR;
I want to transform a string (representing a regex) like "\\."
into "\."
in a clean and resilient way (something akin to sed 's/\\\\/\\/g'
, I don't know if this could break on edge cases though)
val.decode('string-escape')
is not an option since I'm using python3.
What I tried so far:
val.replace('\\\\', '\\')
val.encode().decode('unicode-escape')
I am sure that I missed a relevant part, because string escaping (and unescaping) seems like a fairly common and basic problem, but I haven't found a solution yet =/
Full Story:
I have a YAML-File like so
- !Scheme
barcode: _([ACGTacgt]+)[_.]
lane: _L(\d\d\d)[_.]
name: RKI
read: _R(\d)+[_.]
sample_name: ^(.+)(?:_.+){5}
set: _S(\d+)[_.]
user: _U([a-zA-Z0-9\-]+)[_.]
validation: .*/(?:[a-zA-Z0-9\-]+_)+(?:[a-zA-Z0-9])+\.fastq.*
...
that describes a "Scheme" Object. The 'name' key is an identifier and the rest describe regexes.
I want to be able to parse an object from that YAML so I wrote a from_yaml
class method:
scheme = Scheme()
loaded_mapping = loader.construct_mapping(node) # load yaml-node as dictionary WARNING! loads str escaped
# re.compile all keys except name, adding name as regular string and
# unescaping escaped sequences (like '\') in the process
for key, val in loaded_mapping.items():
if key == 'name':
processed_val = val
else:
processed_val = re.compile(val) # backslashes in val are escaped
scheme.__dict__[key] = processed_val
the problem is that loader.construct_mapping(node)
loads the strings with backslashes escaped, so the regex is not correct anymore.
I tried several variations of val.encode().decode('unicode-escape')
and val.replace('\\\\', '\\')
,
but had no luck with it
If anyone has an idea how to handle this I'd appreciate it very much! I am not married to this specific way of doing things and open to alternative approaches.
Kind Regards!
Assuming I have this super simple YAML file
lane: _L(\d\d\d)[_.]
and load it with PyYAML like this:
import yaml
import re
with open('test.yaml', 'rb') as stream:
data = yaml.safe_load(stream)
lane_pattern = data['lane']
print(lane_pattern)
lane_expr = re.compile(data['lane'])
print(lane_expr)
Then the result is exactly as one would expect:
_L(\d\d\d)[_.]
re.compile('_L(\\d\\d\\d)[_.]')
There is no double escaping of strings going on when YAML is parsed, so there is nothing for you to unescape.