Search code examples
pythonstringstring-matching

how can I make my Python string match non-greedily?


Given a text file that looks like so:

Samsung Galaxy S6 active SM-G890A 32GB Camo White (AT&T) *AS-IS* Cracked Screen
Samsung Galaxy S6 SM-G920 - 32GB - White Verizon Cracked screen
Samsung Galaxy S6 edge as is cracked screen

I've tried to think of a number of different ways to have the string Samsung Galaxy S6 not match Samsung Galaxy S6 edge, but can't seem to come up with a way that works. There's no point in the string where it's clear that the name of the phone has ended and the extraneous information begins, so splitting them up that way and comparing to a dictionary or something like that wouldn't work.

I tried to think of some way to write the following:

phones = ['Samsung Galaxy S6', 'Samsung Galaxy S6 Edge']
lines = open('phones.txt', 'r').readlines()
for line in lines:
    for phone in phones:
        if phone in line and no other phone in phones is in line:
            print('match found')

but I can't think of the right way to structure it - anyone have any ideas? I'm sure that I'm missing something simple here, but just can't figure out what.


Solution

  • start by sorting your phones so that it will look at them by length

    phones.sort(key=len,reverse=True) 
    

    then break when you find a match

    for phone in phones:
       if phone in line:
          print "FOUND:",repr(phone),"IN",repr(line)
          break # we dont need to keep looking for other phones in this line
    

    maybe?

    this way "Samsung Galaxy s6 Edge" comes before "Samsung Galaxy" in your checks and you will match the longest one... without requireing more knowledge of your phone list like the regex answer