Search code examples
pyparsing

Correctly suppress comments


I want to filter out comments starting with a hash # out of a text file, before I run a larger parser over it.

For this I make use of suppress as mentioned here.

pythonStyleComment does not work, because it ignores quotations and removes stuff within it. A hash in a quoted string is not a comment. It is part of the string and therefore should be preserved.

Here is my pytest which I already implemented to test the expected behavior.

def test_filter_comment():
    teststrings = [
        '# this is comment', 'Option "sadsadlsad#this is not a comment"'
    ]
    expected = ['', 'Option "sadsadlsad#this is not a comment"']

    for i, teststring in enumerate(teststrings):
        result = filter_comments.transformString(teststring)
        assert result == expected[i]

My current implementation breaks somewhere in pyparsing. I probably do something which was not intended:

filter_comments = Regex(r"#.*")
filter_comments = filter_comments.suppress()
filter_comments = filter_comments.ignore(QuotedString)

fails with:

*****/lib/python3.7/site-packages/pyparsing.py:4480: in ignore
    super(ParseElementEnhance, self).ignore(other)
*****/lib/python3.7/site-packages/pyparsing.py:2489: in ignore
    self.ignoreExprs.append(Suppress(other.copy()))
E   TypeError: copy() missing 1 required positional argument: 'self'

Any help how to ignore comments correctly, would be helpful.


Solution

  • Ah I was so close. I have of course to properly instantiate the QuotedString class.The following works as expected:

    filter_comments = Regex(r"#.*")
    filter_comments = filter_comments.suppress()
    qs = QuotedString('"') | QuotedString("'")
    filter_comments = filter_comments.ignore(qs)
    

    Here are some more tests.

    def test_filter_comment():
        teststrings = [
            '# this is comment', 'Option "sadsadlsad#this is not a comment"',
            "Option 'sadsadlsad#this is not a comment'",
            "Option 'sadsadlsad'#this is a comment"
        ]
        expected = [
            '', 'Option "sadsadlsad#this is not a comment"',
            "Option 'sadsadlsad#this is not a comment'",
            "Option 'sadsadlsad'"
        ]
    
        for i, teststring in enumerate(teststrings):
            result = filter_comments.transformString(teststring)
            assert result == expected[i]