Search code examples
regexpython-2.7robotframeworkextended-ascii

Parse in Python ASCII extended Characters located at the beginning


I need to remove the first 3 or 4 ASCII extended chracters from a debug sentences in Python but I can't by now. This is an example:

ª!è[002:58:535]REGMICRO:Load: 36.6

ëª7è[001:40:971]HTTP_CLI:Http Client Mng not initialized.

I tried: ^.*[A-Za-z]+$

and

[\x80-\xFF]+HTTP_CLI:0 - Line written in.*

But everything is ignored and gives me this error:

"20160922 15:16:28.549 : FAIL : UnicodeEncodeError: 'ascii' codec can't encode character u'\x80' in position 1: ordinal not in range(128) 20160922 15:16:28.551 : INFO : ${resulters} = ('FAIL', u"UnicodeEncodeError: 'ascii' codec can't encode character u'\\x80' in position 1: ordinal not in range(128)") 20160922 15:16:28.553 : INFO : ('FAIL', u"UnicodeEncodeError: 'ascii' codec can't encode character u'\\x80' in position 1: ordinal not in range(128)")"

Anyone who works on RIDE and Python?

Thank you!


Solution

  • Answering how to remove the characters before the square brackets with RF (if I understood the question correctly, frankly - I'm not sure) - the regex you tried with is not correct; say you want to get everything after the first square bracket:

    ${line}=    Set Variable    ëª7è[001:40:971]HTTP_CLI:Http Client Mng not initialized.
    ${regx}=    Set Variable    ^.*(\\[.*$)
    ${result}=  Get Regexp Matches      ${line}      ${regx}      1
    

    The regex you're going after (line 2 ^) is "from start of the line, skip everything up to the 1st square bracket - and the sequence from the square bracket to the end is group 1". Then using the kw "Get Regexp Matches" you get the matched group 1.

    In python:

    import re
    line = "ëª7è[001:40:971]HTTP_CLI:Http Client Mng not initialized."
    regx = "^.*(\\[.*$)"
    result = re.search(regx, line).group(1)  # the value of result is "[001:40:971]HTTP_CLI:Http Client Mng not initialized."