Search code examples
pythonglobxonsh

My code is confusing an input file name for a regex expression


My regular expression does not explicitly include a dash in a character range, but my code fails when the input file name is like this:

Rage Against The Machine - 1996 - Bulls On Parade [Maxi-Single]

Here is my code:

def find_cue_files(path):
  found_files = []
  for root, dirs, files in os.walk(path):
    if files:
      fcue = glob(os.path.join(root, '*.[Cc][Uu][Ee]')) # this is line 81 in my source file (mentioned in the traceback)
      # do a few other things...
  return found_files

It seems obvious that this part of the filename is the issue: [Maxi-Single]

How do I handle filenames similar to that so that they are treated as fixed strings, not part of the regex expression?

(Not my main question, but in case it is related, I am open to try an alternative method of making a case-insensitive search. I have looked at several Stack Overflow questions on that topic and I didn't -- so far -- find any solutions that seemed to fit this case.)

Here is my error:

Traceback (most recent call last):

  File "/usr/bin/xonsh", line 33, in <module>
    sys.exit(load_entry_point('xonsh==0.10.0', 'console_scripts', 'xonsh')())
  File "/usr/lib/python3.9/site-packages/xonsh/__amalgam__.py", line 21336, in main
    _failback_to_other_shells(args, err)
  File "/usr/lib/python3.9/site-packages/xonsh/__amalgam__.py", line 21283, in _failback_to_other_shells
    raise err
  File "/usr/lib/python3.9/site-packages/xonsh/__amalgam__.py", line 21334, in main
    sys.exit(main_xonsh(args))
  File "/usr/lib/python3.9/site-packages/xonsh/__amalgam__.py", line 21388, in main_xonsh
    run_script_with_cache(
  File "/usr/lib/python3.9/site-packages/xonsh/__amalgam__.py", line 3285, in run_script_with_cache
    run_compiled_code(ccode, glb, loc, mode)
  File "/usr/lib/python3.9/site-packages/xonsh/__amalgam__.py", line 3190, in run_compiled_code
    func(code, glb, loc)
  File "process_audio_files.xsh", line 160, in <module>
    cue_files = find_cue_files(dest_path)
  File "process_audio_files.xsh", line 81, in find_cue_files
    fcue = glob(os.path.join(root, '*.[Cc][Uu][Ee]'))
  File "/usr/lib/python3.9/glob.py", line 22, in glob
    return list(iglob(pathname, recursive=recursive))
  File "/usr/lib/python3.9/glob.py", line 74, in _iglob
    for dirname in dirs:
  File "/usr/lib/python3.9/glob.py", line 75, in _iglob
    for name in glob_in_dir(dirname, basename, dironly):
  File "/usr/lib/python3.9/glob.py", line 86, in _glob1
    return fnmatch.filter(names, pattern)
  File "/usr/lib/python3.9/fnmatch.py", line 58, in filter
    match = _compile_pattern(pat)
  File "/usr/lib/python3.9/fnmatch.py", line 52, in _compile_pattern
    return re.compile(res).match
  File "/usr/lib/python3.9/re.py", line 252, in compile
    return _compile(pattern, flags)
  File "/usr/lib/python3.9/re.py", line 304, in _compile
    p = sre_compile.compile(pattern, flags)
  File "/usr/lib/python3.9/sre_compile.py", line 764, in compile
    p = sre_parse.parse(p, flags)
  File "/usr/lib/python3.9/sre_parse.py", line 948, in parse
    p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0)
  File "/usr/lib/python3.9/sre_parse.py", line 443, in _parse_sub
    itemsappend(_parse(source, state, verbose, nested + 1,
  File "/usr/lib/python3.9/sre_parse.py", line 834, in _parse
    p = _parse_sub(source, state, sub_verbose, nested + 1)
  File "/usr/lib/python3.9/sre_parse.py", line 443, in _parse_sub
    itemsappend(_parse(source, state, verbose, nested + 1,
  File "/usr/lib/python3.9/sre_parse.py", line 598, in _parse
    raise source.error(msg, len(this) + 1 + len(that))
re.error: bad character range i-S at position 70

EDIT: I tried using re.escape which is referenced here: https://docs.python.org/3/library/re.html

def find_cue_files(path):
  found_files = []
  for root, dirs, files in os.walk(path):
    if files:
      root2 = re.escape(root)
      fcue = glob(os.path.join(root2, '*.[Cc][Uu][Ee]')) 
      # do a few other things...
  return found_files

It processed the earlier filename but now fails with the input filename Aerosmith - Aerosmith (2014) [24-96 HD] producing the same error at the same point in my revised code.


Solution

  • Rather than using glob with funny file patterns passed through root, you are better off sorting out just the names, and then prepend the root. One possible one-liner:

    fcue=list(map(lambda x: os.path.join(root,x), (f for f in files if f.lower().endswith('.cue'))))