Search code examples
javaencodingcharacter-encodingpentahourl-encoding

Decode multiple times encoded String


I have written Java code to decode a string encoded with "UTF-8". That String was encoded three times. I am using this code in the ETL. so, I can use an ETL step three times in a row, but it will be a little inefficient. I researched over the internet but didn't find anything promising. Is there any way in Java to decode the String encoded multiple times?

Here's my input string "uri":

file:///C:/Users/nikhil.karkare/dev/pentaho/data/ba-repo-content-original/public/Development+Activity/Defects+Unresolved+%252528by+Non-Developer%252529.xanalyzer

Here's my code which is decoding this string:

import java.net.URLDecoder;
import java.nio.charset.StandardCharsets;
import java.io.*;

String decodedValue;
public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException {
// First, get a row from the default input hop
//
Object[] r = getRow();
// If the row object is null, we are done processing.
//
if (r == null) {
    setOutputDone();
    return false;
}

// It is always safest to call createOutputRow() to ensure that your output row's Object[] is large
// enough to handle any new fields you are creating in this step.
//
Object[] outputRow = createOutputRow(r, data.outputRowMeta.size());

String newFileName = get(Fields.In, "uri").getString(r);

try{
    decodedValue = URLDecoder.decode(newFileName, "UTF-8");
}
catch (UnsupportedEncodingException e) {
throw new AssertionError("UTF-8 is unknown");
}
// Set the value in the output field
//
get(Fields.Out, "decodedValue").setValue(outputRow, decodedValue);

// putRow will send the row on to the default output hop.
//
putRow(data.outputRowMeta, outputRow);

return true;}

Output of this code is following:

file:///C:/Users/nikhil.karkare/dev/pentaho/data/ba-repo-content-original/public/Development Activity/Defects Unresolved %2528by Non-Developer%2529.xanalyzer

When I run this code in the ETL three times, I get the output I want, which is this:

file:///C:/Users/nikhil.karkare/dev/pentaho/data/ba-repo-content-original/public/Development Activity/Defects Unresolved (by Non-Developer).xanalyzer

Solution

  • URL encoding replaces %, ( and ) with resp. %25.%28 and %29.

    String s = "file:///C:/Users/nikhil.karkare/dev/pentaho/data/"
        + "ba-repo-content-original/public/Development+Activity/"
        + "Defects+Unresolved+%252528by+Non-Developer%252529.xanalyzer";
    
    // %253528 ... %252529
    s = URLDecoder.decode(s, "UTF-8");
    // %2528 ... %2529
    s = URLDecoder.decode(s, "UTF-8");
    // %28 .. %29
    s = URLDecoder.decode(s, "UTF-8");
    // ( ... )