We have a system where customers, mainly European enter texts (in UTF-8) that has to be distributed to different systems, most of them accepting UTF-8, but now we must also distribute the texts to a US system which only accepts US-Ascii 7-bit
So now we'll need to translate all European characters to the nearest US-Ascii. Is there any Java libraries to help with this task?
Right now we've just started adding to a translation table, where Å (swedish AA)->A and so on and where we don't find any match for an entered character, we'll log it and replace with a question mark and try and fix that for the next release, but it seems very inefficient and somebody else must have done something similair before.
The uni2ascii program is written in C, but you could probably convert it to Java with little effort. It contains a large table of approximations (implicitly, in the switch-case statements).
Be aware that there are no universally accepted approximations: Germans want you to replace Ä by AE, Finns and Swedes prefer just A. Your example of Å isn't obvious either: Swedes would probably just drop the ring and use A, but Danes and Norwegians might like the historically more correct AA better.