Search code examples
algorithmlanguage-agnosticparsingbnf

Are unescaped user names incompatible with BNF?


I've got a (proprietary) output from a software that I need to parse. Sadly, there are unescaped user names and I'm scratching my hairs trying to know if I can, or not, describe the files I need to parse using a BNF (or EBNF or ABNF).

The problem, oversimplified (it's really just an example), may look like this:

(data) ::= <username>
<username> ::= (other type of data)

And in some case, instead of appearing at the left or at the right, the username can also appear in the middle of a line.

The problem is that the username is unescaped and there are not enough restrictions on user names (they're printable ASCII, max 20 chars and they can't contain line break). So "=" would be a perfectly valid username, for example. And so would "= 1 = john = 2" (because user, at sign-on, where allowed to choose any user name they wanted and these appear unescaped in the output I've got).

I'm asking because my parser chocked on some very creative usernames (once again, not in my control, they're "weird" and I need to deal with it) and I cannot find an easy way to deal with this. Also note that I do not know in advance the user names (for example I don't have access to a database that would contain all the user names that the users created).

So are unrestricted and unescaped user names incompatibles with BNF?

P.S: be cool with me if I made mistakes, it's my first post on stackoverflow :)


Solution

  • BNF doesn't "care" for user names per-se. It works on the token level. If you define a username token, you can build describe a grammar using BNF based on it.

    Your problem should be solved on the lexer level. The lexer should be smart enough to recognize user names, even when they're not escaped, and pass username tokens to the parser.

    In theory you could describe all kinds of user names with a grammar, but this heavily depends on the other things in your language. Is = a valid token on its own right? How can you tell a username having = in it apart if it is? I think you'll have to describe the rest of the rules and valid tokens in your language to get a fuller answer here.