Client-side user agent detection is known to be bad and discouraged in favor of feature detection. However, is it also bad to react differently based on the incoming user agent field in a HTTP request?
An example would be sending smaller or larger images based on whether the incoming user agent is mobile or desktop.
I think it depends what your motivation is. For example, in the mobile web sector what you are attempting to do is provide the user with something that looks sensible on their platform. Why be concerned about what user-agent the user is reporting, when it is purely for their own benefit? If they go to the effort of tricking you with a different user-agent, then they are the only person that suffers. The main trouble of course is false positives; it's not entirely reliable.
I follow the argument that you should not rely on it as such, but mobile developers are under attack from generic broad statements like this. Yes there are good alternatives, but across every browser you can imagine, this information can actually be useful at some point as the certainty begins to degrade.
What you certainly don't ever do with any plain-text header is use it to facilitate access control.
User agent detection is considered bad when there are better alternatives, but there is certainly no harm in including it in a detection process which degrades gracefully in certainty.
The issue I have with the whole process is that we are caught up in providing the user something sensible, but never seem to think it's acceptable to ask when you are uncertain. If you are uncertain about the user-agent, why not ask once and store? You can use the user-agent as a guideline.
So to conclude my thoughts, essentially the user-agent header is unreliable, so it is bad to rely on it. This doesn't mean you can't extract a degree of valuable information from it where more reliable options leave you in an uncertain state. In general it's wrong to conclude that it is bad. It's simply what you do with this information that makes it bad or not.
After seeing your updates to the question, I have the following comments to contribute. Do I want to be sniffing image requests and providing the client with an image based on user agent?
If this is the only variable then maybe it could work, but it's rarely the case that the only thing you are varying is the images. I don't want to detect per request because I want to serve the client a coherent solution. This means I served them a page that causes them to request the correct resources. This page yields a single coherent solution for all of the integrated resources. All variations in this document work together for a particular view.
I respect that the chance of the user-agent string changing mid-view is so slim it doesn't seem worth worrying about. However adopting this principle also reduces the number of times you need to perform browser/platform detection, which can only be beneficial. This allows you to switch views on the client much more easily. If the client says actually you got the view wrong, I am a tablet not a phone, how do you go about correcting that? You serve the user a better page, otherwise you will need to be spoofing headers for your image requests... terrible idea. Don't use the user-agent string to serve generic resources like images.
Platform identification is a very active area of modern developments in the web. As computing becomes more ubiquitous and platforms vary much more widely, our need to understand the platforms we are serving increases. I think the general solution to this problem under the current conditions is going to fall on fingerprinting and statistical analysis.
Consider this application - akinator.com - Notice how the statistical analysis from a huge set of sparse data is annoyingly accurate. In a limited environment (the set of browser configurations), you can imagine that we could ask the client's browser some questions. We then perform a statistical analysis on the response in some n-dimensional feature space. Using the user-agent as a dimension of this space is going to be useful and self limiting, depending on the results that you find. If it's largely inaccurate then it will see a large spread, and the amount of worth you derive from it will be self limiting.
Of course your ability to derive any value from this statistical model requires you to be able to obtain some verified truths. This could be, for example, running a JavaScript test-suite to detect client side js capabilities, or indeed, in uncertainty, you can actually ask the user to tell you what their platform is.
------- For further reading I'd refer you to this article by Mozilla
https://developer.mozilla.org/en-US/docs/Web/HTTP/Browser_detection_using_the_user_agent
Today, looking for these strings are the only way to know that the device runs on a mobile device (resp. a tablet) before serving the HTML.