Search code examples
pythontextnlp

ValueError: Cannot use a compiled regex as replacement pattern with regex=False


I'm doing a project, on Google Colab, where I use the following version: !pip install "gensim==4.2.0" !pip install "texthero==1.0.5"

Until recently, I received the following warning: FutureWarning: The default value of regex will change from True to False in a future version. return input.str.replace(r"^\d+\s|\s\d+\s|\s\d+$", " ")

But the execution worked normally. Now, I'm getting the following error: image

How should I proceed?

I tried different versions, but the problem persists.


Solution

  • This is a texthero bug triggering a pandas error.

    Pandas str.replace now uses regex=False by default:

    Texthero's replace_digits function hasn't been updated in two years and doesn't explicitly pass regex=True:

        if only_blocks:
            pattern = r"\b\d+\b"
            return s.str.replace(pattern, symbols)
        else:
            return s.str.replace(r"\d+", symbols)
    

    You should fill a bug report to texthero, there are probably several other occurrences of str.replace to fix.

    In there meantime you can patch the library by changing the code to:

        if only_blocks:
            pattern = r"\b\d+\b"
            return s.str.replace(pattern, symbols, regex=True)
        else:
            return s.str.replace(r"\d+", symbols, regex=True)
    

    Or use a pandas version prior to 2 (e.g. 1.5.2)