Search code examples
androidnfcndef

Strange character on Android NDEF record payload


I just started coding with Android NFC, i've successfully read and write NDEF data into mifare classic tag. The problem is when app read the payload from ndef record, it always contain character '*en' at the beginning of the text. I think it is language character, but how to get the real text message without that character?

This is the screenshot what app read from the tag, the actual text is 'Hello World'

enter image description here Here is the code to read

@Override
public void onNewIntent(Intent intent) {
    Log.i("Foreground dispatch", "Discovered tag with intent: " + intent);
   // mText.setText("Discovered tag NDEF " + ++mCount + " with intent: " + intent);

    if (NfcAdapter.ACTION_NDEF_DISCOVERED.equals(intent.getAction())) {
        Parcelable[] rawMsgs = intent.getParcelableArrayExtra(NfcAdapter.EXTRA_NDEF_MESSAGES);

        if (rawMsgs != null) {
            NdefMessage[] msgs = new NdefMessage[rawMsgs.length];

            for (int i = 0; i < rawMsgs.length; i++) {
                msgs[i] = (NdefMessage) rawMsgs[i];
            }

            NdefMessage msg = msgs[0];

            try {
            mText.setText(new String(msg.getRecords()[0].getPayload(), "UTF-8"));
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
    }
}

Solution

  • What you're seeing is the raw data of an NDef text-record converted to UTF8.

    The NDef text-record is build like this:

    First byte: Control-Byte

    Bit 7: 0: The text is encoded in UTF-8 1: The text is encoded in UTF16

    Bit 6: RFU (MUST be set to zero)

    Bit 5..0: The length of the IANA language code.

    This is followed by the language code, stored in US-ASCII (en in your case) as defined in RFC 3066. The length of the language-code is given in the control-byte.

    And this is followed by the text in the format as specified by bit 7 of the control-byte.

    The empty square character comes from your conversion of raw data into UTF-8. I'm almost sure that the control-byte in your case has the numeric value 2. Since there is no printable character for this numeric value it gets replaced with the non-printable placeholder character from the unicode-set. This is usually displayed as an empty square.