regex python-2.7 robotframework extended-ascii

Parse in Python ASCII extended Characters located at the beginning

I need to remove the first 3 or 4 ASCII extended chracters from a debug sentences in Python but I can't by now. This is an example:

ª!è[002:58:535]REGMICRO:Load: 36.6

ëª7è[001:40:971]HTTP_CLI:Http Client Mng not initialized.

I tried: ^.*[A-Za-z]+$

and

[\x80-\xFF]+HTTP_CLI:0 - Line written in.*

But everything is ignored and gives me this error:

"20160922 15:16:28.549 : FAIL : UnicodeEncodeError: 'ascii' codec can't encode character u'\x80' in position 1: ordinal not in range(128) 20160922 15:16:28.551 : INFO : ${resulters} = ('FAIL', u"UnicodeEncodeError: 'ascii' codec can't encode character u'\\x80' in position 1: ordinal not in range(128)") 20160922 15:16:28.553 : INFO : ('FAIL', u"UnicodeEncodeError: 'ascii' codec can't encode character u'\\x80' in position 1: ordinal not in range(128)")"

Anyone who works on RIDE and Python?

Thank you!

Solution

Answering how to remove the characters before the square brackets with RF (if I understood the question correctly, frankly - I'm not sure) - the regex you tried with is not correct; say you want to get everything after the first square bracket:

${line}=    Set Variable    ëª7è[001:40:971]HTTP_CLI:Http Client Mng not initialized.
${regx}=    Set Variable    ^.*(\\[.*$)
${result}=  Get Regexp Matches      ${line}      ${regx}      1

The regex you're going after (line 2 ^) is "from start of the line, skip everything up to the 1st square bracket - and the sequence from the square bracket to the end is group 1". Then using the kw "Get Regexp Matches" you get the matched group 1.

In python:

import re
line = "ëª7è[001:40:971]HTTP_CLI:Http Client Mng not initialized."
regx = "^.*(\\[.*$)"
result = re.search(regx, line).group(1)  # the value of result is "[001:40:971]HTTP_CLI:Http Client Mng not initialized."