Search code examples
swiftstring-matching

Swift: How to identify and delete prepositions in a string


I am trying to identify keys word in user entry to search for, so I thought of filtering out some parts of speech in order to extract key words to query in my database . currently I use the code below to replace the word "of" from a string

 let rawString = "I’m jealous of my parents. I’ll never have a kid as cool as theirs, one who is smart, has devilishly good looks, and knows all sorts of funny phrases."

 var filtered = self.rawString.replacingOccurrences(of: "of", with: "")

what I want to do now is extend it to replace all preposition in a string.

What I was thinking of doing is creating a huge list of known prepositions like

 let prepositions = ["in","through","after","under","beneath","before"......]

and then spliting the string by white space with

var WordList : [String] = filtered.components(separatedBy: " ")

and then looping through the wordlist to find a prepositional match and deleting it. Creating the list will be ugly and might not be efficient for my code.

What is the best way to detect and delete prepositions from a string?


Solution

  • Use NaturalLanguage:

    import NaturalLanguage
    
    let text = "The ripe taste of cheese improves with age."
    let tagger = NLTagger(tagSchemes: [.lexicalClass])
    tagger.string = text
    let options: NLTagger.Options = [.omitPunctuation, .omitWhitespace]
    
    var newSentence = [String]()
    
    tagger.enumerateTags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .lexicalClass, options: options) { tag, tokenRange in
        guard let tag = tag, tag != .preposition else { return true }
        newSentence.append("\(text[tokenRange])")
        return true
    }
    
    print("Input: \(text)")
    print("Output: \(newSentence.joined(separator: " "))")
    

    This prints:

    Input: The ripe taste of cheese improves with age.
    Output: The ripe taste cheese improves age
    

    Notice the two prepositions of and with are removed. My approach also removes the punctuation; you can adjust this with the .omitPunctuation option.