Search code examples
installationtokenizemecab

install ipadic on Ubuntu 16.04 for mecab Japanese tokenizer


I am trying to install mecab and the ipadic dictionary as outlined here: http://taku910.github.io/mecab/#install-unix

I was able to successfully download mecab and install it and succesfully downloaded ipadic but get stuck on the second line of instruction below:

% tar zxfv mecab-ipadic-2.7.0-XXXX.tar.gz
% mecab-ipadic-2.7.0-XXXX
% ./configure
% make
% su
# make install

I am getting:

mecab-ipadic-2.7.0-20070801: command not found

I tried chmod -x on it and then tried it but same result.

Any help is appreciated.

Edit (result of cat /etc/mecabrc)

;
; Configuration file of MeCab
;
; $Id: mecabrc.in,v 1.3 2006/05/29 15:36:08 taku-ku Exp $;
;
dicdir = /usr/local/lib/mecab/dic/mecab-ipadic-neologd

; userdic = /home/foo/bar/user.dic

; output-format-type = wakati
; input-buffer-size = 8192

; node-format = %m\n
; bos-format = %S\n
; eos-format = EOS\n

Solution

  • There is no reason to compile from source on Ubuntu 16.04

    Simple do:

    $ sudo apt-get update
    $ sudo apt install mecab mecab-ipadic-utf8
    

    Then test it with

    $ echo "日本語です" | mecab
    日本  ニッポン    ニッポン    日本  名詞-固有名詞-地名-国        
    語   ゴ   ゴ   語   名詞-普通名詞-一般      
    です  デス  デス  です  助動詞 助動詞-デス  終止形-一般
    EOS
    

    If things don't work, you may need to link /etc/mecabrc to the installed dictionary by setting dicdir=SOMEPATH_TO_IPADIC