My business problem is to group emails by sender, where "sender" we need to decide the meaning of. I've noticed gmail's "via" tag usually tells me who I mean - e.g. it nicely groups all the various meetup.com emails together even though meetup sends many emails from people @personaldomain.com, @gmail.com etc. I don't know which header field this corresponds to in MIME. In general I don't understand MIME headers and have found no resource that's intermediate between RFC docs you can get lost in and wikipedia's overview.
So - 1) what does gmail use for "via"? 2) what are useful header fields for this business goal? I don't know the actual "meaning" of received by, sent by, from, etc. - all sound the same to me.
http://support.google.com/mail/bin/answer.py?hl=en&answer=1311182 says
Gmail detected that the email was sent via another mail service. This means that the sender may be using a third-party email service to generate this message. For example, the message may have been sent through a social networking site which offers an email service or sent through a mailing list that you’re subscribed to.
Gmail displays this information because many of the services that send emails on behalf of others don’t verify that the name that the sender gives matches that email address. We want to protect you against misleading messages from people pretending to be someone you know.
The don't say exactly how they decide what to show, but they go on to say that if you use SPF to authorize the sending IPs and DKIM to sign the messages that they won't show the "via" label.