Search code examples
databasebrowseruser-agent

What's the point of storing the user agent?


So far while logging userlogins I always stored the complete user agent additionally to already parsed informations (like browser, version, os, etc). The user agent usually just is a TEXT field in the table.

While implementing another similar thing, I was asking myself: What's even the point of doing that? Obviously, the user agent can be manipulated easily in any case, and the only relevant informations (browser, version and operating system) are already parsed and stored separately anyways.

Is there some actual benefit in still storing it, except for backtracking of data that could be faked anyways? What other relevant informations does the user agent contain to justify the (over years, quite large) amount of data that is used to store it?

And of course I realize that the user agent contains a lot more than just the browser specifications - but how many times did you really have to go back and analyze the user agent itself?

Just to clarify: I'm talking about reasons why to store the raw user-agent string, after parsing the "relevant" informations out of it (browser, os, etc) - what is the point of the user-agent after that point?


Solution

  • The user agent string contains information about the environment including operating system and browser. It is something I frequently check. There are two main reasons to store it.

    • If you are following up on a bug report or error then this information is useful or even essential for determining what went wrong - imagine trying to find an error that occurs only on IE8 without the user agent! This information can also help you prioritize a bug fix. You will want to fix an issue that is present on 93% of environments before you fix the one that is present on 7%.

    • Secondly, it provides very useful stats on the profile of your user. You might only want to support environments of more than a certain percentage of your user base. For example, if you are designing a new version of your software and, on examining your user agent logs, you find no one using IE, you might not bother to optimize or design for IE.

    You seem to be concerned that the user agent string can be faked. While this is possible, unless there is some specific reason someone might do this in your app, it seems rather paranoid to worry about it. You make a good point, though, to remember what information is possible to fake.

    UPDATE: I see your point, in fact in the logging I recently implemented I removed the parsed string because of the data overhead. There is little point in storing both the raw string and the parsed string. The only real reason to do that would be to make querying the logs slightly easier, which is not a good enough reason to me. Personally, I store the whole raw useragent which means no loss of data, future proofing for future browsers/oses/formats of user string, and eliminates the possibility of making mistakes when parsing.

    From Wikipedia:

    For this reason, most Web browsers use a User-Agent value as follows: Mozilla/[version] ([system and browser information]) [platform] ([platform details]) [extensions]

    If you have stored all the fields out of that you need then by all means discard the rest. The amount of data to log, how long to keep logs for, and in what form to keep them is a fairly personal thing that will differ in some ways from company to company and project to project.