Search code examples
pythongensim

LDA Mallet Gensim CalledProcessError


Seems like many people are having issues with Mallet.

import os
from gensim.models.wrappers import LdaMallet

os.environ.update({'MALLET_HOME':r'C:/Users/myusername/Desktop/Topic_Modelling/mallet-2.0.8'})

mallet_path = r'C:/Users/myusername/Desktop/Topic_Modelling/mallet-2.0.8/bin/mallet' 

model = gensim.models.wrappers.LdaMallet(mallet_path, corpus=corpus,num_topics=num_topics, id2word=id2word)

Getting the following errors:

/bin/sh: C:/Users/myusername/Desktop/Topic_Modelling/mallet-2.0.8/bin/mallet.bat: No such file or directory

CalledProcessError: Command 'C:/Users/myusername/Desktop/Topic_Modelling/mallet-2.0.8/bin/mallet.bat import-file --preserve-case --keep-sequence --remove-stopwords --token-regex "\S+" --input /var/folders/ml/lxzrtxwn02vbvq65c80g1b640000gn/T/c52cdc_corpus.txt --output /var/folders/ml/lxzrtxwn02vbvq65c80g1b640000gn/T/c52cdc_corpus.mallet' returned non-zero exit status 127.

I downloaded mallet from http://mallet.cs.umass.edu/dist/mallet-2.0.8.zip and unzipped it in my directory. I've tried running the command in the error in the terminal and I'm getting the same 'no such file found' error, but it's there in my directory?

I've also followed this: https://ps.au.dk/fileadmin/ingen_mappe_valgt/installing_mallet.pdf

When I go to the directory via command line and type ./bin/mallet I get a whole bunch of commands, which according to the instructions, is what I'm looking for to know that it's been installed ok.

I'm running the following on MacOS

  • Python==3.9.6
  • gensim==3.8.3

Anyone have any ideas?


Solution

  • As silly as this sounds, I resolved this by changing the path to:

    os.environ.update({'MALLET_HOME':r'mallet-2.0.8'})
    
    mallet_path = r'mallet-2.0.8/bin/mallet' 
    

    So if you have the mallet directory in the same one as where your code is, this will work!