Search code examples
perlaspell

Perl interface with Aspell


I am trying to identify misspelled words with Aspell via Perl. I am working on a Linux server without administrator privileges which means I have access to Perl and Aspell but not, for example, Text::Aspell which is a Perl interface for Aspell.

I want to do the very simple task of passing a list of words to Aspell and having it return the words that are misspelled. If the words I want to check are "dad word lkjlkjlkj" I can do this through the command line with the following commands:

aspell list
dad word lkjlkjlkj

Aspell requires CTRL + D at the end to submit the word list. It would then return "lkjlkjlkj", as this isn't in the dictionary.

In order to do the exact same thing, but submitted via Perl (because I need to do this for thousands of documents) I have tried:

my $list = q(dad word lkjlkjlkj):
my @arguments = ("aspell list", $list, "^D");
my $aspell_out=`@arguments`;
print "Aspell output = $aspell_out\n";

The expected output is "Aspell output = lkjlkjlkj" because this is the output that Aspell gives when you submit these commands via the command line. However, the actual output is just "Aspell output = ". That is, Perl does not capture any output from Aspell. No errors are thrown.

I am not an expert programmer, but I thought this would be a fairly simple task. I've tried various iterations of this code and nothing works. I did some digging and I'm concerned that perhaps because Aspell is interactive, I need to use something like Expect, but I cannot figure out how to use it. Nor am I sure that it is actually the solution to my problem. I also think ^D should be an appropriate replacement for CTRL+D at the end of the commands, but all I know is it doesn't throw an error. I also tried \cd instead. Whatever it is, there is obviously an issue in either submitting the command or capturing the output.


Solution

  • The complication with using aspell out of a program is that it is an interactive and command-line driver tool, as you suspect. However, there is a simple way to do what you need.

    In order to use aspell's command list one needs to pass it words via STDIN, as its man page says. While I find the GNU Aspell manual a little difficult to get going with, passing input to a program via its STDIN is easy enough and we can rewrite the invocation as

    echo dad word lkj | aspell list
    

    We get lkj printed back, as due. Now this can run out of a program just as it stands

    my $word_list = q(word lkj good asdf);
    
    my $cmd = qq(echo $word_list | aspell list);
    
    my @aspell_out = qx($cmd);
    
    print for @aspell_out;
    

    This prints lines lkj and asdf.

    I assemble the command in a string (as opposed to an array) for specific reasons, explained below. The qx is the operator form of backticks, which I prefer for its far superior readability.

    Note that qx can return all output in a string, if in scalar context (assigned to a scalar for example), or in a list when in list context. Here I assign to an array so you get each word as an element (alas, each also comes with a newline, so may want to do chomp @aspell_out;).


    Comment on a list vs string form of a command

    I think that it's safe to recommend to use a list-form for a command, in general. So we'd say

    my @cmd = ('ls', '-l', $dir);  # to be run as an external command
    

    instead of

    my $cmd = "ls -l $dir";        # to be run as an external command
    

    The list form generally makes it easier to manage the command, and it avoids the shell altogether.

    However, this case is a little different

    • The qx operator doesn't really behave differently -- the array gets concatenated into a string, and that runs. The very fact that we can pass it an array is incidental, and not even documented

    • We need to pipe input to aspell's STDIN, and shell does that for us simply. We can use a shell with command's LIST form as well, but then we'd need to invoke it explicitly. We can also go for aspell's STDIN by means other than the shell but that's more complex

    • With a command in a list the command name must be the first word, so that "aspell list" from the question is wrong and it should fail (there is no command named that) ... except that in this case it wouldn't (if the rest were correct), since for qx the array gets collapsed into a string

    Finally, apsell nicely exposes its API in a C library and that's been utilized for the module you mention. I'd suggest to install it as a user (no privileges needed) and use that.