Search code examples
prologswi-prolog

How would I filter declared facts according to whether one variable is a member of a list in Prolog?


I'm relatively new to Prolog programming, and I'm trying to create a program that reads a text file, finds the words in the file as well as the number of times each word occurs, finds words within the file that match a pre-defined list of words, and determines the file's status depending on how many times the words within the pre-defined list are used within the file.

I've already managed to get the code reading a text file, getting the individual words, and determining how many times each word occurs within it, so that's not an issue. However, I am having problems filtering the words to only display words that occur within the pre-set list.

For some idea, I am declaring words as facts within the Prolog database with the format:

word(W, X)

where W is the word itself and X is the number of times it occurs within the file.

As an example, if I read the text file with the following text in it:

It's a lovely day outside; it's so hot and sunny, and it's overall lovely!

and then run the following query:

forall(word(W, X), writeln(W+X)

The query works successfully, outputting:

It's+1
a+1
day+1
outside+1
so+1
hot+1
sunny+1
and+2
it's+2
overall+1
lovely+2

It's not currently looking the nicest, and it's case-sensitive as well, but my point is that it does read the file successfully and work out the number of uses of a word successfully.

However, I am experiencing problems if I try and filter the words to only display the ones where W matches a word in a pre-defined list of words, such as:

PredefinedWords = [hot, sunny, cold, rainy]

I did some internet searching, and based on that, my initial idea was to use the findall and member predicates within the following query:

findall(word(W, X), member(W, PredefinedWords), MatchingWords).

My hope was that when applied to the example above, it would filter through the pre-declared words found in the file and return a result looking something like this:

MatchingWords = [word(hot, 1), word(sunny, 1)]

However, when I actually ran the query, the result was something like this:

MatchingWords = [word(hot, _), word(sunny, _), word(cold, _), word(rainy, _)]

I found that it was simply regurgitating every member of the pre-defined list in the format word(W, X), where W was the list item and X was an empty variable. It was not using the words found in the file at all.

When I put a similar query into a predicate within the file being consulted, I realised that Prolog was considering the word(W, X) declared within that query to be something completely different to in the earlier queries, even when an above query within the same predicate considered word(W, X) to be the same as the earlier queries. Unlike other queries such as forall, the findall query was not using the previously declared words at all. I was a bit confused by this, but I decided to try something different.

As such, I decided to instead try mapping the words and their frequencies into a nested list, as I had done to sort the words by frequency, using the following query:

findall([W, X], word(W, X), WordsList)

This code worked as intended and mapped the words and their corresponding frequencies into a nested list named WordsList.

With the words mapped into a list, I decided to retry the findall query with the following format:

findall(WordsList, member(0, PredefinedWords), MatchingWords).

My hope was that this query would return all members of WordsList where the first list index (the word) was found within the list of pre-defined words and write them into a new list, MatchingWords.

However, it merely outputted MatchingWords as an empty list:

MatchingWords = []

I then wondered if it was getting confused because I wasn't declaring a specific list index format, so I then tried this:

findall(WordsList[W, X], member(W, PredefinedWords), MatchingWords).

However, this threw a syntax error of "Operator expected", with the [ after WordsList being the offending character.

Finally, I did some more searching and tried the include predicate with a lambda function in the following query:

include(([W, X]>>member(W, PredefinedWords)), WordsList, MatchingWords).

However, it just returned an empty list once again:

WordsList = MatchingWords, MatchingWords = []

I apologise, as I know that's quite a lot of text to digest, but I'm feeling quite stuck and I'm not really sure what to try next.

Any help or guidance would be greatly appreciated.


Solution

  • The predicate findall(+Template, :Goal, -Bag) creates a list of instantiations that Template gets successively on backtracking over Goal and unify that list with Bag. In fact, Template is not a predicate, but just a term that determines how the collected data should be structured.

    Assuming the predicates word/2 and keywords/1 are defined, you can use them to define the predicate keyword/2 as follows:

    keyword(Word, Counter) :-
        keywords(Keywords),
        member(Word, Keywords),
        word(Word, Counter).
    
    keywords([hot, sunny, cold, rainy]).
    
    word('It\'s', 1).
    word(a,       1).
    word(day,     1).
    word(outside, 1).
    word(so,      1).
    word(hot,     1).
    word(sunny,   1).
    word(and,     2).
    word('it\'s', 2).
    word(overall, 1).
    word(lovely,  2).
    

    Example:

    ?- keyword(W, C).
    W = hot,
    C = 1 ;
    W = sunny,
    C = 1 ;
    false.
    

    Then, using predicates keyword/2 and findall/3, you can ask:

    ?- findall(word(W,C), keyword(W,C), L).
    L = [word(hot, 1), word(sunny, 1)].
    
    ?- findall(W-C, keyword(W,C), L).
    L = [hot-1, sunny-1].
    
    ?- findall([W,C], keyword(W,C), L).
    L = [[hot, 1], [sunny, 1]].
    
    ?- findall(frequency(W,C), keyword(W,C), L).
    L = [frequency(hot, 1), frequency(sunny, 1)].
    

    Note: As already pointed out in a previous comment, it would be better to just count keywords as the file is read. Anyway, the point of this answer is to clarify how to use findall/3.