Search code examples
pythonwindowsnlpsummarization

How to install the Python package pyrouge on Microsoft Windows?


I want to use the python package pyrouge on Microsoft Windows. The package doesn't give any instructions on how to install it on Microsoft Windows. How can I do so?


Solution

  • The following instructions were tested on Windows 7 SP1 x64 Ultimate and python 3.5 x64 (Anaconda).

    1. In the cmd.exe, run

      pip install pyrouge

    2. Download ROUGE-1.5.5. You may download it from https://github.com/andersjo/pyrouge/tree/master/tools/ROUGE-1.5.5

    3. pyrouge comes with a python script named pyrouge_set_rouge_path (it has no file extension for some reason), which you need to run in order to point pyrouge to the directory where ROUGE-1.5.5 is located. You need to locate pyrouge_set_rouge_path, which is typically in the python Scripts directory.

    Run the following command from cmd.exe, appropriately replacing the directories for pyrouge_set_rouge_path and ROUGE-1.5.5:

    python C:\Anaconda\envs\py35\Scripts\pyrouge_set_rouge_path  C:\pyrouge-master\tools\ROUGE-1.5.5
    
    1. pyrouge should now be able to initialize a Rouge155 object. You can run the following python script, it should give no error:

      from pyrouge import Rouge155 r = Rouge155()

    2. If you don't have perl.exe, you need to install it (because pyrouge is just a wrapper around the original ROUGE script, which is written in Perl)You can install http://strawberryperl.com

    Make sure the perl.exe binary is in your Path system environment variable, e.g. using which perl:

    enter image description here

    To add perl in your Path system environment variable:

    enter image description here

    Lastly, to avoid this kind of error:

    enter image description here

    One way is to copy C:\Strawberry\c\bin\*.dll to C:\Strawberry\perl\bin\*.dll.

    1. To prevent the following error message when running pyrouge:

      Cannot open exception db file for reading: C:\Anaconda\pyrouge-master\tools\ROUGE-1.5.5\data/WordNet-2.0.exc.db

    You should remove \RELEASE-1.5.5\data\WordNet-2.0.exc.db, then from cmd.exe:

    cd RELEASE-1.5.5\data\
    perl WordNet-2.0-Exceptions/buildExeptionDB.pl ./WordNet-2.0-Exceptions ./smart_common_words.txt ./WordNet-2.0.exc.db
    
    1. Open C:\Anaconda\envs\py35\Lib\site-packages\pyrouge\Rouge155.py (or wherever you installed pyrouge), go to the function def evaluate(self, system_id=1, rouge_args=None) (it's at line 319 at the time I am writing this answer), and add command.insert(0, 'perl ') right before self.log.info("Running ROUGE with command {}".format(" ".join(command))). (If you don't do it, you'll get OSError: [WinError 193] %1 is not a valid Win32 application, which is the same error message as what you would get for if you don't do some of the previous steps).

    2. At that point pyrouge should work fine. Don't try to run python -m pyrouge.test, it is buggy. Instead, you can test it as follows:

      some_folder: │ rouge.py │ ├───model_summaries │ text.A.001.txt │ └───system_summaries text.001.txt

    rouge.py contains:

    from pyrouge import Rouge155
    r = Rouge155()
    
    r.system_dir = 'system_summaries'
    r.model_dir = 'model_summaries'
    r.system_filename_pattern = 'text.(\d+).txt'
    r.model_filename_pattern = 'text.[A-Z].#ID#.txt'
    
    output = r.convert_and_evaluate()
    print(output)
    output_dict = r.output_to_dict(output)
    

    text.A.001.txt contains:

    preprocess my summaries, then run ROUGE
    

    text.001.txt contains:

    I only want to preprocess my summaries and then run ROUGE on my own
    

    Output when running rouge.py:

    2017-10-31 21:55:37,239 [MainThread  ] [INFO ]  Writing summaries.
    2017-10-31 21:55:37,249 [MainThread  ] [INFO ]  Processing summaries. Saving system files to C:\Users\Francky\AppData\Local\Temp\tmpmh72hoxa\system and model files to C:\Users\Francky\AppData\Local\Temp\tmpmh72hoxa\model.
    2017-10-31 21:55:37,249 [MainThread  ] [INFO ]  Processing files in system_summaries.
    2017-10-31 21:55:37,249 [MainThread  ] [INFO ]  Processing text.001.txt.
    2017-10-31 21:55:37,249 [MainThread  ] [INFO ]  Saved processed files to C:\Users\Francky\AppData\Local\Temp\tmpmh72hoxa\system.
    2017-10-31 21:55:37,249 [MainThread  ] [INFO ]  Processing files in model_summaries.
    2017-10-31 21:55:37,249 [MainThread  ] [INFO ]  Processing text.A.001.txt.
    2017-10-31 21:55:37,249 [MainThread  ] [INFO ]  Saved processed files to C:\Users\Francky\AppData\Local\Temp\tmpmh72hoxa\model.
    2017-10-31 21:55:37,249 [MainThread  ] [INFO ]  Written ROUGE configuration to C:\Users\Francky\AppData\Local\Temp\tmpgx71qygq\rouge_conf.xml
    2017-10-31 21:55:37,249 [MainThread  ] [INFO ]  Running ROUGE with command perl  C:\Anaconda\pyrouge-master\tools\ROUGE-1.5.5\ROUGE-1.5.5.pl -e C:\Anaconda\pyrouge-master\tools\ROUGE-1.5.5\data -c 95 -2 -1 -U -r 1000 -n 4 -w 1.2 -a -m C:\Users\Francky\AppData\Local\Temp\tmpgx71qygq\rouge_conf.xml
    command: ['C:\\Anaconda\\pyrouge-master\\tools\\ROUGE-1.5.5\\ROUGE-1.5.5.pl', '-e', 'C:\\Anaconda\\pyrouge-master\\tools\\ROUGE-1.5.5\\data', '-c', '95', '-2', '-1', '-U', '-r', '1000', '-n', '4', '-w', '1.2', '-a', '-m', 'C:\\Users\\Francky\\AppData\\Local\\Temp\\tmpgx71qygq\\rouge_conf.xml']
    ---------------------------------------------
    1 ROUGE-1 Average_R: 1.00000 (95%-conf.int. 1.00000 - 1.00000)
    1 ROUGE-1 Average_P: 0.42857 (95%-conf.int. 0.42857 - 0.42857)
    1 ROUGE-1 Average_F: 0.60000 (95%-conf.int. 0.60000 - 0.60000)
    ---------------------------------------------
    1 ROUGE-2 Average_R: 0.80000 (95%-conf.int. 0.80000 - 0.80000)
    1 ROUGE-2 Average_P: 0.30769 (95%-conf.int. 0.30769 - 0.30769)
    1 ROUGE-2 Average_F: 0.44444 (95%-conf.int. 0.44444 - 0.44444)
    ---------------------------------------------
    1 ROUGE-3 Average_R: 0.50000 (95%-conf.int. 0.50000 - 0.50000)
    1 ROUGE-3 Average_P: 0.16667 (95%-conf.int. 0.16667 - 0.16667)
    1 ROUGE-3 Average_F: 0.25000 (95%-conf.int. 0.25000 - 0.25000)
    ---------------------------------------------
    1 ROUGE-4 Average_R: 0.00000 (95%-conf.int. 0.00000 - 0.00000)
    1 ROUGE-4 Average_P: 0.00000 (95%-conf.int. 0.00000 - 0.00000)
    1 ROUGE-4 Average_F: 0.00000 (95%-conf.int. 0.00000 - 0.00000)
    ---------------------------------------------
    1 ROUGE-L Average_R: 1.00000 (95%-conf.int. 1.00000 - 1.00000)
    1 ROUGE-L Average_P: 0.42857 (95%-conf.int. 0.42857 - 0.42857)
    1 ROUGE-L Average_F: 0.60000 (95%-conf.int. 0.60000 - 0.60000)
    ---------------------------------------------
    1 ROUGE-W-1.2 Average_R: 0.69883 (95%-conf.int. 0.69883 - 0.69883)
    1 ROUGE-W-1.2 Average_P: 0.42857 (95%-conf.int. 0.42857 - 0.42857)
    1 ROUGE-W-1.2 Average_F: 0.53131 (95%-conf.int. 0.53131 - 0.53131)
    ---------------------------------------------
    1 ROUGE-S* Average_R: 1.00000 (95%-conf.int. 1.00000 - 1.00000)
    1 ROUGE-S* Average_P: 0.16484 (95%-conf.int. 0.16484 - 0.16484)
    1 ROUGE-S* Average_F: 0.28303 (95%-conf.int. 0.28303 - 0.28303)
    ---------------------------------------------
    1 ROUGE-SU* Average_R: 1.00000 (95%-conf.int. 1.00000 - 1.00000)
    1 ROUGE-SU* Average_P: 0.19231 (95%-conf.int. 0.19231 - 0.19231)
    1 ROUGE-SU* Average_F: 0.32258 (95%-conf.int. 0.32258 - 0.32258)
    

    If you don't perform step 3, running from pyrouge import Rouge155; r = Rouge155() will get the following error message

    Traceback (most recent call last):
      File "C:\Users\Franck\Documents\rouge.py", line 8, in <module>
        r = Rouge155()
      File "C:\Anaconda\envs\py35\lib\site-packages\pyrouge\Rouge155.py", line 88, in __init__
        self.__set_rouge_dir(rouge_dir)
      File "C:\Anaconda\envs\py35\lib\site-packages\pyrouge\Rouge155.py", line 402, in __set_rouge_dir
        self._home_dir = self.__get_rouge_home_dir_from_settings()
      File "C:\Anaconda\envs\py35\lib\site-packages\pyrouge\Rouge155.py", line 416, in __get_rouge_home_dir_from_settings
        with open(self._settings_file) as f:
    FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\Franck\\AppData\\Roaming\\pyrouge\\settings.ini'
    

    Note: Google Research's Python lib to compute Rouge is much easier to use: https://github.com/google-research/google-research/tree/master/rouge

    To install:

    pip install rouge-score
    

    Example from their readme:

    from rouge_score import rouge_scorer
    scorer = rouge_scorer.RougeScorer(['rouge1', 'rougeL'], use_stemmer=True)
    scores = scorer.score('The quick brown fox jumps over the lazy dog',
                          'The quick brown dog jumps on the log.')
    

    It worked smoothly on my Windows 10 with Python 3.9.16 (without Perl installed).

    Longer example:

    import pprint
    from rouge_score import rouge_scorer
    scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2','rouge3','rougeL'], use_stemmer=True)
    
    string1 = 'The quick brown fox jumps this is a test on the dog'
    string2 = 'The quick brown fox jumps over the lazy dog'
    scores = scorer.score(string1, string2)
    pprint.pprint(scores)
    
    # If reversing the strings:
    scores = scorer.score(string2, string1)
    pprint.pprint(scores)
    
    # To select just one score:
    print(scores['rouge2'][2])
    

    outputs:

    {'rouge1': Score(precision=0.7777777777777778, recall=0.5833333333333334, fmeasure=0.6666666666666666),
     'rouge2': Score(precision=0.5, recall=0.36363636363636365, fmeasure=0.4210526315789474),
     'rouge3': Score(precision=0.42857142857142855, recall=0.3, fmeasure=0.3529411764705882),
     'rougeL': Score(precision=0.7777777777777778, recall=0.5833333333333334, fmeasure=0.6666666666666666)}
    
    {'rouge1': Score(precision=0.5833333333333334, recall=0.7777777777777778, fmeasure=0.6666666666666666),
     'rouge2': Score(precision=0.36363636363636365, recall=0.5, fmeasure=0.4210526315789474),
     'rouge3': Score(precision=0.3, recall=0.42857142857142855, fmeasure=0.3529411764705882),
     'rougeL': Score(precision=0.5833333333333334, recall=0.7777777777777778, fmeasure=0.6666666666666666)}
    
    0.4210526315789474