Search code examples
pythonloaderlangchainlarge-language-modellibmagic

Langchain UnstructuredURLLoader shows Libmagic Unavailble


Attempting to use UnstructuredURLLoader but getting a 'libmagic is unavailable'.

I have:

  • Install langchain
  • Install unstructured libmagic python-magic python-magic-bin
  • Install python-magic-bin==0.4.13
  • python_magic-0.4.13-py2.py3-none-any.whl (I even tried other versions). I am on an AMD64 windows machine.
  • Uninstalled and reinstalled.
  • Google, ChatGTP, similar issues on stackoverflow for answers.

Code:

from langchain.document_loaders import UnstructuredURLLoader
loader = UnstructuredURLLoader(
    urls = [
        "https://www.moneycontrol.com/news/business/banks/hdfc-bank-re-appoints-sanmoy-chakrabarti-as-chief-risk-officer-11259771.html",
        "https://www.moneycontrol.com/news/business/markets/market-corrects-post-rbi-ups-inflation-forecast-icrr-bet-on-these-top-10-rate-sensitive-stocks-ideas-11142611.html"
    ]
)
data = loader.load()
len(data)

Error:

libmagic is unavailable but assists in filetype detection on file-like objects. Please consider installing libmagic for better results.
Error fetching or processing https://www.moneycontrol.com/news/business/banks/hdfc-bank-re-appoints-sanmoy-chakrabarti-as-chief-risk-officer-11259771.html, exception: Invalid file. The FileType.UNK file type is not supported in partition.
libmagic is unavailable but assists in filetype detection on file-like objects. Please consider installing libmagic for better results.
Error fetching or processing https://www.moneycontrol.com/news/business/markets/market-corrects-post-rbi-ups-inflation-forecast-icrr-bet-on-these-top-10-rate-sensitive-stocks-ideas-11142611.html, exception: Invalid file. The FileType.UNK file type is not supported in partition.

Solution

  • Resolution: The path to the libmagic.dll folder in the venv has to be added to system variables.

    In my instance: D:\ds_projects\code-basic-LLM-finance-domain.venv\Lib\site-packages\magic\libmagic

    For others, it will likely be: your_path\ .venv\Lib\site-packages\magic\libmagic