Search code examples
pythonregexreplacecamelcasing

Adding space before capital letters in yaml file (flexget config)


I have some problems with the FlexGet Configuration.

I want to rename and move some movies.

Example

For example the movie "ElPatriota" (which currently is unable to rename) can not be found in TheMovieDataBase (tmdb) when searching for this title without spaces.

So I need to rename it first to "El Patriota" before I can look it up at tmdb and move it to his correct directory.

What I researched

I saw this function using a regular-expression but I don't know how to implement it on my config or if it's the correct solution for me.

re.sub(r"(\w)([A-Z])", r"\1 \2", "WordWordWord")
'Word Word Word'

FlexGet Config YAML

This is a part of the related config:

move movies:
    priority: 3
    template:
      - movies-metainfo
      - telegram
    filesystem:
      path: /downloads/
      recursive: yes
      retrieve: files
      regexp: '.*\.(avi|mkv|mp4)$'
    seen: local
    regexp:
      reject:
        - \b(duo|tri|quadri|tetra|penta)logy\b: {from: title}
        - s\d{2}(e\d{2,})?: {from: title} 
    require_field: 
      - tmdb_name
      - movie_name
    accept_all: yes
    tmdb_lookup:
      language: es
    set:
      title: "{{title|replace('4K','[]')|replace('BD1080','[]')|replace('M1080','[]')}}"  
    move:
      to: "/media/Peliculas/"
      rename: "{{tmdb_name|replace('/','_')|replace(':',' -')|replace(',','')|replace('?','')}}"
      along:
        extensions:
          - sub
          - srt
        subdirs:
          - Subs
      clean_source: 50

Solution

  • Assumptions on construction of search-terms

    From your comment I assume the file-name replacing step as input for the search is:

        set:
          title: "{{title|replace('4K','[]')|replace('BD1080','[]')|replace('M1080','[]')}}"  
    

    So the different search-terms (set titles) are alternatives (separated by | like boolean OR):

    title|replace('4K','[]')|replace('BD1080','[]')|replace('M1080','[]')
    

    See also FlexGet docs:

    Regex as solution

    Assume further that you can use a regular-expression to substitute the title. Then a regex-substitution adding a space between lower-case and upper-case letters will do:

    Step Value
    Input ElPatriotaM1080.www.url.com.mkv
    Wanted El Patriota M1080.www.url.com.mkv
    Regex substitute ([a-z])([A-Z]) by \1 \2
    Output El Patriota M1080.www.url.com.mkv

    Manipulate and replace by regex

    Appropriate seems the manipulate Plugin with action replace as sown in Example 4:

    You can control how the regex hits are output using \1, \2, etc in format.

    manipulate:
      - title:
          replace:            
            regexp: '(.*)/(.*)/(.*)'
            format: '\2.\1.\3'
    

    ⚠️ Caution: Regex matches are ignore-case by default Since the regex is case-sensitive (depends on different upper-case and lower-case characters), the default regex-flags of the manipulate replace-by-regex (IGNORE and UNICODE) must be disabled explicitly by surrounding the regex with disabled inline-flag i like (?-i:<regex>).

    Config snippets

    In this case it could look like separating the lower-case (first group ([a-z]) and insert by reference \1) from upper-case (second group ([A-Z]) and insert by reference \2) by a space between.

    Additionally disabling the i we need to config: (?-i:([a-z])([A-Z])).

    manipulate:
      - title:
          replace:            
            regexp: '(?-i:([a-z])([A-Z]))'
            format: '\1 \2'
    

    or alternatively, without capturing but with a positive look-ahead as (?=[A-Z]) then inserting a space (with switched-off ignore-case flag):

    manipulate:
      - title:
          replace:            
            regexp: '(?-i:(?=[A-Z]))'
            format: ' '
    

    Demo in pure Python

    A working demo in pure Python shows how to replace file-names. It was adapted from How to replace camelCasing in all files in a folder using Python or c#?:

    import re
    
    old_name = 'ElPatriotaM1080.www.url.com.mkv'
    print(f"Given:           '{old_name}'")
    
    flags=re.I  # default for FlexGet's replace-plugin: ignore-case
    
    regex_1           = '(?=[A-Z])'
    regex_1_no_ignore = '(?-i:(?=[A-Z]))'
    
    new_name = re.sub(regex_1, ' ', old_name, flags=flags)
    print(f"Regex 1 (I on ): '{new_name}'")
    new_name = re.sub(regex_1_no_ignore, ' ', old_name, flags=flags)
    print(f"Regex 1 (I off): '{new_name}'")
    
    
    regex_2           = r'([a-z])([A-Z])'
    regex_2_no_ignore = r'(?-i:([a-z])([A-Z]))'
    
    new_name = re.sub(regex_2, r'\1 \2', old_name, flags=flags)
    print(f"Regex 2 (I on ): '{new_name}'")
    new_name = re.sub(regex_2_no_ignore, r'\1 \2', old_name, flags=flags)
    print(f"Regex 2 (I off): '{new_name}'")
    

    Prints:

    Given:           'ElPatriotaM1080.www.url.com.mkv'
    Regex 1 (I on ): ' E l P a t r i o t a M1080. w w w. u r l. c o m. m k v'
    Regex 1 (I off): ' El Patriota M1080.www.url.com.mkv'
    Regex 2 (I on ): 'E lP at ri ot aM1080.w ww.u rl.c om.m kv'
    Regex 2 (I off): 'El Patriota M1080.www.url.com.mkv'
    

    Both regex-approaches (1+2) have almost the same effect: space inserted before upper-case letters. However, the ignore-case flag (whether "I on" or "I off") has unexpected impact on the result.