Search code examples
unit-testinggreptdd

How to unit test regular expressions when samples are in a file?


Here in a file (test.txt) which keep sample text to match.

Here is the sample text.

evince /media/ismail/SSDWorking/book-collection/_Books/kids/Coping Skills.pdf
/usr/lib/libreoffice/program/soffice.bin --impress file:///home/ismail/Desktop/LibreOffice%20Impress.odp --splash-pipe=5
/usr/bin/gjs /usr/bin/com.github.johnfactotum.Foliate /media/ismail/SSDWorking/book-collection/_Books/kids/Coping Skills.epub
mpv /media/ismail/8TBRaid0/_IslamNaseed/_JunaidJamshed/Mera Dil Badal De.mp3
evince /media/ismail/SSDWorking/book-collection/_Books/self-development-anxiety-self-talk/child/Freeing Your Child From Negative Thinking Powerful, Practical Strategies to Build a Lifetime of Resilience, Flexibility, and... (Tamar E. Chansky) My Library).pdf --splash-pipe=5
This is the line that does not match.
/usr/bin/gjs /usr/bin/com.github.johnfactotum.Foliate /media/ismail/SSDWorking/book-collection/_Books/self-development-anxiety-self-talk/child/Freeing Your Child from Anxiety Powerful, Practical Solutions to Overcome Your Childs Fears, Worries, and Phobias (Tamar Chansky Ph.D.) (My Library).epub

Here is the regex that give lines that did not match in test.txt

grep -nvP '(file://)?(?<!\w)/(?!usr/).*?\.\w{3,4}+' test.txt

I want to create a unit test which will fail if all the lines in this file does not match. I also want to know which line did not match. In the given grep command it gives the line numbers (-n) so I have to manually check if a line is missing.

The idea is I will add new lines to the file as we progress and run the test.

I have checked How do you unit test regular expressions? but it does not talk about this approach. How can I do that?


Solution

  • If I'm understanding your requirements properly and bash is available as a wrapper script, how about:

    #!/bin/bash
    
    lines=$(wc -l < test.txt)
    failed=$(grep -nvP '(file://)?(?<!\w)/(?!usr/).*?\.\w{3,4}+' test.txt | sed 's/:.*//')
    failed_lines=$(wc -l <<< "$failed")
    
    if (( lines == failed_lines )); then
        echo "all lines did not match"
    fi
    
    echo "failed lines are:" $(tr "\n" " " <<< "$failed")
    
    • The variable lines counts the lines in test.txt.
    • The variable failed stores the line numbers which did not match.
    • The variable failed_lines counts the lines of failed.
    • If lines and failed_lines are the same, it means the all lines did not match. Then report on it.
    • Finally report the line numbers which did not match.