I'm new to Python and found a couple of suggestions for finding the longest WORD in a string, but none which accounted for a string with a number of words which match the longest length.
After playing around, I settled on this:
inputsentence = raw_input("Write a sentence: ").split()
longestwords = []
for word in inputsentence:
if len(word) == len(max(inputsentence, key=len)):
longestwords.append(word)
That way I have a list of the longest words that I can do something with. Is there any better way of doing this?
NB: Assume inputsentence
contains no integers or punctuation, just a series of words.
If you'll be doing this with short amounts of text only, there's no need to worry about runtime efficiency: Programming efficiency, in coding, reviewing and debugging, is far more important. So the solution you have is fine, since it's clear and sufficiently efficient for even thousands of words. (However, you ought to calculate len(max(inputsentence, key=len))
just once, before the for
loop.)
But suppose you do want to do this with a large corpus, which might possibly be several gigabytes long? Here's how to do it in one pass, without ever storing every word in memory (note that inputcorpus
might be an iterator or function that reads the corpus in stages): Save all the longest words only. If you see a word that's longer than the current maximum, it's obviously the first one at this length, so you can start a fresh list.
maxlength = 0
maxwords = [ ] # unnecessary: will be re-initialized below
for word in inputcorpus:
if len(word) > maxlength:
maxlength = len(word)
maxwords = [ word ]
elif len(word) == maxlength:
maxwords.append(word)
If a certain word of maximal length repeats, you'll end up with several copies. To avoid that, just use set( )
instead of a list (and adjust initializing and extending).