Search code examples
pythonnlpdata-scienceextractdata-analysis

How can I extract numbers based on context of the sentence in python?


I tried using regular expressions but it doesn't do it with any context

Examples:: "250 kg Oranges for Sale" "I want to sell 100kg of Onions at 100 per kg"


Solution

  • You can do something like this. First you split the text in words and then you try to convert each word to a number. If the word can be converted to a number, it is a number and if you are sure that a quantity is always followed by the word "kg", once you find the number you can test if the next word is "kg". Then, depending on the result, you add the value to the respective array. In this particular case, you have to assure the numbers are written alone (e.g. "100 kg" and not "100kg") otherwise it will not be converted.

    string = "250 kg Oranges for Sale. I want to sell 100 kg of Onions at 100 per kg."
    
    # Split the text
    words_list = string.split(" ")
    print(words_list)
    
    # Find which words are numbers
    quantity_array = []
    price_array = []
    for i in range(len(words_list)):
        try:
            number = int(words_list[i])
            # Is it a price or a quantity?
            if words_list[i + 1] == 'kg':
                quantity_array.append(number)
            else:
                price_array.append(number)
        except ValueError:
            print("\'%s\' is not a number" % words_list[i])
    
    # Get the results
    print(quantity_array)
    print(price_array)