Search code examples
pythontext-processing

how to remove specific word from an array that is starts with "[ "?


I have a array that contains many sentences. I have split this sentences into words and make another array. I want that the words that id start with "[" and end with "]" are removed from my array.

ex.

from nltk import sent_tokenize
sentences = sent_tokenize(text)
print(sentences[0])
z= np.array(sentences)

sentence: [42] On 20 January 1987, he also turned out as substitute for Imran Khan's side in an exhibition game at Brabourne Stadium in Bombay, to mark the golden jubilee of Cricket Club of India.

words = z[0].split()
words= list(words)
print(words)

after split into words : ['[42]', 'On', '20', 'January', '1987,', 'he', 'also', 'turned', 'out', 'as', 'substitute', 'for', 'Imran', "Khan's", 'side', 'in', 'an', 'exhibition', 'game', 'at', 'Brabourne', 'Stadium', 'in', 'Bombay,', 'to', 'mark', 'the', 'golden', 'jubilee', 'of', 'Cricket', 'Club', 'of', 'India.']

Now I want to remove [42] from my array. and then join this words into sentence. How can I do that? I tried this way. but this is not working. it remove whole array and print None.

for i in words:
  if i[0]=="[":
    b=words.remove(i)
    print(b)
  else:
    print("")

Solution

  • You may consider using list comprehension as below:

    sentence = "[42] On 20 January 1987, he also turned out as substitute for Imran Khan's side in an exhibition game at Brabourne Stadium in Bombay, to mark the golden jubilee of Cricket Club of India."
    words = sentence.split()
    words = [ w for w in words if w[0]!='[' and w[-1]!= ']' ]
    filtered = ' '.join(words)
    print(filtered)
    "On 20 January 1987, he also turned out as substitute for Imran Khan's side in an exhibition game at Brabourne Stadium in Bombay, to mark the golden jubilee of Cricket Club of India."