From what I understand, you have to enter in all of your usernames and passwords into Mint, so I assume they are actually logging into your bank account and scraping the resulting screen to put this data into a form that Mint and others use.
How do they actually simulate the keypresses and mouse clicks? I assume banks don't like it when they do this - how do their scrapers avoid detection?
I'm pretty sure they don't simulate clicks, etc. In the end, any data that ends up on a user's page is transmitted in a response to a request. If you can figure out how to construct a valid request and then how to parse the response, you'll have the data you want.
As far as I could gather after using Yodlee for quite a while, they deal with sites in two major ways: the sites they have official agreements to work with and the sites they don't have official agreements with. For the first category of sites they, most often, have agreed upon APIs for getting the data. For the sites in the second category they reverse-engineer layer 7 communication protocols and data structures (a.k.a. screen/html scraping).