Search code examples
regexgrepcygwinhtml-parsing

Regular Expression Gives Core Dumped


I am trying to parse film names from the IMDB top 250 list (from page source) which is full of html tags. I have a regular expression, but when I run it with the grep command, after a while it gives core dumped. Command is as follows: grep -o -P ">[[A-Z]+\w* ([a-zA-Z]+\w* ?)*<" film.xml. What is the reason of this core dumped?


Solution

  • I don't understand exactly what you are trying to do, but try with:

    grep -o -P ">[A-Z]\w*( [a-zA-Z]\w*)* ?<" film.xml