Search code examples
javadecodeurlconnection

Java Resolve Response URLConnection Header Value


I send http head request with URLConnection and got header value Content-Disposition Unreadable value like bellow.

Content-Disposition: attachment; filename="৩টি ধাপে সহজেই আতà§à¦¬à¦¬à¦¿à¦¶à§à¦¬à¦¾à¦¸à§€ হয়ে উঠà§à¦¨ | Motivational Video in Bangla.mp4"

How to resolve this text ৩টি ধাপে সহজেই আতà§à¦¬à¦¬à¦¿à¦¶à§à¦¬à¦¾à¦¸à§€ হয়ে উঠà§à¦¨ to ৩টি ধাপে সহজেই আত্ববিশ্বাসী হয়ে উঠুন


Solution

  • Your issue is that the response comes in a.. non-typical, Bengali encoding. I couldn't find the exact one, but seems to be something close to "Windows-1252".

    Running the below code gives me the following output, having issues with some composite characters:

    public static void main(String[] args) throws UnsupportedEncodingException {
        var source = "৩টি ধাপে সহজেই আতà§à¦¬à¦¬à¦¿à¦¶à§à¦¬à¦¾à¦¸à§€ হয়ে উঠà§à¦¨";
        var bytes = source.getBytes("Windows-1252");
        System.out.println("Expected: " + "৩টি ধাপে সহজেই আত্ববিশ্বাসী হয়ে উঠুন");
        System.out.println("Actual  : " + new String(bytes, StandardCharsets.UTF_8));
    }
    
    Expected: ৩টি ধাপে সহজেই আত্ববিশ্বাসী হয়ে উঠুন
    Actual  : ৩টি ধাপে সহজেই আত�ববিশ�বাসী হয়ে উ� �ন
    

    The solution may be to find the right decoder for this encoding of Bengali text so you can convert it to Unicode. Best of luck!