Search code examples
pythonhuggingface-transformerspiipresidio

presidio transformers package not available, despite being installed


I'm trying to run the example code on this Microsoft documentation, but I am presented with a package not found error. I'm on a MAC and my friend had the same problem on his machine too. I'm sure that a I have installed the transformers package. I imported with no error. I'm on a virtual environment, on a jupyter notebook on vs code.

If I remove the config.yaml file, it runs with no errors, so maybe is something that's in it. But is kinda the same version that is on documentation.

Code:

from presidio_analyzer import AnalyzerEngine, RecognizerRegistry
from presidio_analyzer.nlp_engine import NlpEngineProvider

conf_file = 'config.yaml'

provider = NlpEngineProvider(conf_file=conf_file)
nlp_engine = provider.create_engine()

analyzer = AnalyzerEngine(
    nlp_engine=nlp_engine, 
    supported_languages=["en"]
)

results_english = analyzer.analyze(text="My name is Morris", language="en")
print(results_english)

Error stack:

ValueError                                Traceback (most recent call last)
Cell In[3], line 6
      4 # Create NLP engine based on configuration
      5 provider = NlpEngineProvider(conf_file=conf_file)
----> 6 nlp_engine = provider.create_engine()
      8 # Pass the created NLP engine and supported_languages to the AnalyzerEngine
      9 analyzer = AnalyzerEngine(
     10     nlp_engine=nlp_engine, 
     11     supported_languages=["en"]
     12 )

File ~/Projects/pii/lib/python3.12/site-packages/presidio_analyzer/nlp_engine/nlp_engine_provider.py:81, in NlpEngineProvider.create_engine(self)
     79 nlp_engine_name = self.nlp_configuration["nlp_engine_name"]
     80 if nlp_engine_name not in self.nlp_engines:
---> 81     raise ValueError(
     82         f"NLP engine '{nlp_engine_name}' is not available. "
     83         "Make sure you have all required packages installed"
     84     )
     85 try:
     86     nlp_engine_class = self.nlp_engines[nlp_engine_name]

ValueError: NLP engine 'transformers' is not available. Make sure you have all required packages installed

My config.yaml:

nlp_engine_name: transformers
models:
  -
    lang_code: en
    model_name:
      spacy: en_core_web_sm
      transformers: StanfordAIMI/stanford-deidentifier-base

ner_model_configuration:
  labels_to_ignore:
  - O
  aggregation_strategy: simple # "simple", "first", "average", "max"
  stride: 16
  alignment_mode: strict # "strict", "contract", "expand"
  model_to_presidio_entity_mapping:
    PER: PERSON
    LOC: LOCATION
    EMAIL: EMAIL
    PHONE: PHONE_NUMBER

  low_confidence_score_multiplier: 0.4
  low_score_entity_names:
  - ID

Solution

  • You will need to install the Presidio Analyzer package with the transformers extra dependency specifier:

    pip install "presidio-analyzer[transformers]"
    

    This will install the extra dependencies needed for the transformers based NLP engine.