I am trying to parse film names from the IMDB top 250 list (from page source) which is full of html tags.
I have a regular expression, but when I run it with the grep command, after a while it gives core dumped. Command is as follows:
grep -o -P ">[[A-Z]+\w* ([a-zA-Z]+\w* ?)*<" film.xml
.
What is the reason of this core dumped?
I don't understand exactly what you are trying to do, but try with:
grep -o -P ">[A-Z]\w*( [a-zA-Z]\w*)* ?<" film.xml