(2, 43) 0.74670222994
(3, 15) 0.74132892839
(3, 31) 0.671141877647
(4, 19) 0.699490245832
(4, 47) 0.422715095257
(4, 48) 0.433278265941
(4, 0) 0.379862196713
(5, 19) 0.653731227092
(5, 72) 0.756726821729
Above is a tfidf matrix which has been written to a file. I want to read only the tf-idf values like 0.74132892839 and append them to a list.
Is there a way to do f.read() and then strip the indices off?
Simple solution using re.sub() function:
import re
# specify your actual file name
with open('lines.txt', 'r') as fh:
result = re.sub(r'\([^)]+\)\s*', '', fh.read()).split('\n')
print(result)
The output:
['0.74670222994', '0.74132892839', '0.671141877647', '0.699490245832', '0.422715095257', '0.433278265941', '0.379862196713', '0.653731227092', '0.756726821729']
\([^)]+\)
- matches a sequence between parentheses