Input:
(Note: The input has been preprocessed to this stage by some Python codes so that it will be easier to process using some Python packages.)
Expected output:
I have tried dateutil. However it can only extract one date, right? Even for this situation, extraction of both preposition and date is also a problem.
I also looked at dateparser and datefinder. It seems they both use dateutil.
Dates can be YYYY-MM-DD, DDMMYYYY, etc., as long as in the same format.
Output doesn't have to be identical to the above one, as long as it reflects accurate information.
Finally, thanks for your time and thoughts. I will also keep trying.
After a few days of research, I come up with the following approaches which solve the extraction problem.
Part the codes are shown below. (An excerpt which need dependencies in context)
new_w = new_s.split()
for j in range(len(new_w)):
if new_w[j] in prepositions and (new_w[j+1].isdecimal() or new_w[j+1].lower() in months):
# Process case like "Starting from Mar27, 2016 to Dec31, 2016"
if j+7 in range(len(new_w)) and new_w[j+4] in prepositions:
if new_w[j+5].isdecimal() or new_w[j+5].lower() in months:
u = ' '.join(new_w[j:j+8])
print(label_class[i] + ': ' + u)
break
# Process case like "Ticket must be issued on/before 29FEB, 2016"
elif new_w[j-1] in prepositions:
u = ' '.join(new_w[j-1:j+4])
print(label_class[i] + ': ' + u)
break
# Process case like "Ticketing valid until 18FEB16"
else:
u = ' '.join(new_w[j:j+4])
print(label_class[i] + ': ' + u)
break
# Process case like "TICKETING PERIOD: NOW - FEB 02, 2016"
# Process case like "TRAVELING DATES: NOW - FEB 10,2016 FEB 22,2016 - MAY 12,2016"
if new_w[j] in ['-'] and (new_w[j+1].lower() in months or new_w[j+2].lower() in months):
if new_w[j-1].lower() == 'now':
u = released_date + ' - ' + ' '.join(new_w[j+1:j+4])
print(label_class[i] + ': ' + u)
elif new_w[j-3].lower() in months or new_w[j-2].lower() in months:
u = ' '.join(new_w[j-3:j+4])
print(label_class[i] + ': ' + u)