I am using grep to parse a friend list obtained via the facebook Open Graph API. I am mostly able to do what I want with the following command, issued in bash:
grep -aiPo '"name":"(.*?)","id":"[[:digit:]]*"' friends?blahblah-access-token-stuff
which yields a list which looks like:
"name":"John Day","id":"--id ommitted--"
"name":"Andria Cast\u00f1eda","id":"--id ommitted--" // let me draw your attention here
"name":"Jane Doe","id":"--id ommitted--"
Names were changed above to preserve privacy
If you notice, there is an unescaped sequence in the middle entry, that corresponds to a tilde N. Is there an easy way to to feed such characters into a java program (my primary intention) so that java understands that \u00f1eda is unicode speak for the curly n?
I would prefer not to solve this problem by parsing the string in java and manually unescaping the unicode. I would very much prefer to instruct grep to handle this situation, or another GNU or open source tool that is widely available for bash.
At that point, I would feed the entire input as a file to a java program without having to worry about OMG, is that a unicode escape sequence!!? Java would naturally detect the unicode characters and map them to it's corresponding internal representation.
Thanks in advance!
Java understands Unicode. You provide Java Unicode escapes in the following manner:
String str = "\u00F6";
So if you pass a string such as "Andria Cast\u00f1eda"
which is an escaped sequence, it should be handled correctly without any additional handling required.
Here's also a very brief, but easy to understand introduction:
If you're still not convinced, try this class:
public class UnicodeExample {
public static void main(String[] args) {
String escaped = new String("\u00f1");
String unescaped = new String("ñ");
System.out.println(escaped);
System.out.println(unescaped);
if(escaped.equals(unescaped)){
System.out.println("The strings are the same!");
}
else {
System.out.println("The strings are different!");
}
}
}