Search code examples
pythonpython-3.xregexstringregex-group

ValueError: too many values to unpack (expected 2) , when I try to extract only 2 substrings from a regex pattern


This is the code but the part of the error is where is the extraction of the substrings after validating the regex pattern structure

def name_and_img_identificator(input_text, text):
    input_text = re.sub(r"([^n\u0300-\u036f]|n(?!\u0303(?![\u0300-\u036f])))[\u0300-\u036f]+", r"\1", normalize("NFD", input_text), 0, re.I)
    input_text = normalize( 'NFC', input_text) # -> NFC
    input_text_to_check = input_text.lower() #Convierte a minuscula todo

    
    #regex_patron_01 = r"\s*\¿?(?:dime los|dime las|dime unos|dime unas|dime|di|cuales son los|cuales son las|cuales son|cuales|que animes|que|top)\s*((?:\w+\s*)+)\s*(?:de series anime|de anime series|de animes|de anime|animes|anime)\s*(?:similares al|similares a|similar al|similar a|parecidos al|parecidos a|parecido al|parecido a)\s*(?:la serie de anime|series de anime|la serie anime|la serie|anime|)\s*(llamada|conocida como|cuyo nombre es|la cual se llama|)\s*((?:\w+\s*)+)\s*\??"

    #Regex in english
    regex_patron_01 = r "\ s * \ ¿? (?: tell me the | tell me some| tell me | say | which are the | which are the | which are | which | which animes | which | top) \ s * ((?: \ w + \ s *) +) \ s * (?: anime series | anime series | anime | anime | anime | anime) \ s * (?: similar to | similar to | similar to | similar to | similar to | similar to | similar to | similar to) \ s * (?: the anime series | anime series | the anime series | the series | anime |) \ s * (called | known like | whose name is | which is called |) \ s * ((?: \ w + \ s *) +) \ s * \ ?? "

    m = re.search(regex_patron_01, input_text_to_check, re.IGNORECASE) #Con esto valido la regex haber si entra o no en el bloque de code

    if m:
        num, anime_name = m.groups()[2]

        num = num.strip()
        anime_name = anime_name.strip()
        print(num)
        print(anime_name)

    return text

input_text_str = input("ingrese: ")
text = ""

print(name_and_img_identificator(input_text_str, text))

It gives me this error, and the truth is I don't know how to structure this regex pattern so that it only extracts those 2 values (substrings) from that input

Traceback (most recent call last):
  File "serie_recommendarion_for_chatbot.py", line 154, in <module>
    print(serie_and_img_identificator(input_text_str, text))
  File "anime_recommendarion_for_chatbot.py", line 142, in name_and_img_identificator
    num, anime_name = m.groups()
ValueError: too many values to unpack (expected 2)

If I put an input like this: 'Dame el top 8 de animes parecidos a Gundam' 'Give me the top 8 anime like Gundam'

I need you to extract:

num = '8'
anime_name = 'Gundam'

How do I have to fix my regex sequence in that case?


Solution

  • Errors in the regex pattern

    1. You forgot to add ?: to not capture this group. Change:
    regex_patron_01 = r"...(llamada|conocida como|cuyo nombre es|la cual se llama|)..."
    

    To:

    regex_patron_01 = r"...(?:llamada|conocida como|cuyo nombre es|la cual se llama|)..."
    
    1. To not capture additional spaces or words, your capturing of the num should be non-greedy so that it doesn't catch words like "de"and let the succeeding patterns match it. Change:
    regex_patron_01 = r"...((?:\w+\s*)+)..."
    

    To:

    regex_patron_01 = r"...((?:\w+?\s*?)+)..."
    
    1. The .groups() contain already the string matches, thus accessing an index would give you a single string only, which is the root cause of your error. Change:
    num, anime_name = m.groups()[2]
    

    To:

    num, anime_name = m.groups()
    

    With those changes above, it would be successful:

    8
    gundam
    

    Improvement

    Your regex is too complicated and contains a lot of hard-coded words which would differ by language. My suggestion is to set a standard on the format of the string it can accept to:

    Any text here (num) any text here (anime_name)
    

    Which is already the format of your input:

    Dame el top 8 de animes parecidos a Gundam
    

    Thus you can remove that long regex and replace with this and the output would be the same:

    regex_patron_01 = r"^.*?(\d+).*\s(.+)$"
    

    Note that this requires the (anime_name) to be a single-word. To support multi-words, we have to set a special character that will mark the start of the anime name such as colon :

    Dame el top 8 de animes parecidos a: Gundam X
    

    Then the regex would be:

    regex_patron_01 = r"^.*?(\d+).*:\s(.+)$"
    

    Output

    8
    gundam x