Wordpress has a spam filtering plugin called Akismet that seems to be able to classify any block of text as spam or not. The only caveat being that you need to go through their interface and their database/algorithm is not open sourced or readily available otherwies.
There are also commercial providers that provide a web accessible API for you to classify the emails, comments or any other text being submitted by users in your web application.
Is there any sort of open source or freely accessible database that can classify a block of text as spam/non-spam?
Edit: Here's a clearer explanation of what I want
Basically I was hoping that there was an extensive database out there with the probabilities of certain phrases being spam. Since (I'm assuming) spammers spam all email addresses equally, by pre-populating my Bayesian spam filter with this database, I could create an application that starts off by capturing most spam without any user training.
Update based on comment:
I don't think a simple database would do the trick. Most spam is algorithmicly generated (e.g. comment spam usually incorporates content from the post). Akismet does a combination of things, probably including link analysis and use of known spam signatures, but they don't publish it.
I've read about some interesting AI projects to classify good rather than bad content. You might also look at Spam Karma, which analyzes blog comments based on a variety of spammy triggers (post of response immediately after loading page, etc.).
Original answer (DNS blacklists):