Search code examples
javastringunicodereplacewmic

Remove unwanted character from WMIC output using Java replace()


I am currently working on a system configuration checker. For this purpose, I need to retrieve the Operating System of the tested machine and test it against a .csv file.

Unfortunately, while testing, one machine gave me quite the headache : upon retrieving the string from the WMI command, a ÿ character is inserted where a space should be. As a result, my string comparison is wrong when in fact it shouldn't. Here is a small code block to help you understand the process :

    //The command to execute
    String masterCommand = "wmic os get ";
    String command = "Caption";

    //The process that executes the command
    ProcessBuilder pb = new ProcessBuilder("cmd.exe", "/c", masterCommand + command);
    Process p = pb.start();
    p.waitFor();
    BufferedReader br = new BufferedReader(new InputStreamReader(p.getInputStream()));
    //The command result stored in a string
    while((line = br.readLine()) != null) {
        result += line;
    }
    //The string cleaned of unwanted substring and trailing spaces
    result = result.replace(command, "").trim();

The expected result would be Microsoft Windows 10 Enterprise but it ends up being Microsoft Windowsÿ10 Enterprise

I thought that using Java's replace() method would solve the problem but it does nothing. Here is the replace I am currently using.

    result = result.replace("(?i)windows.", "Windows ");

I should add that the command (wmic os get Caption) outputs the correct result on the cmd and also seems to output it correctly to a .txt file.


TL;DR

I use a wmic inside Java using ProcessBuilder and get an unwanted character (ÿ) that isn't detected by replace().

What could I do to get the correct result (avoiding writing to a file and then reading it) ?


Please, point out any need for clarification or rectification.

Thanks in advance for your answers.


Solution

  • I found a solution that is kinda clunky but works for me.

    Because the unwanted character is a Unicode character, I simply clean the string by keeping ASCII only characters.

    result = result.replaceAll("[^ -~]", "").trim().replaceAll(" +", " ");
    result = result.replace("(?i)windows[^ ]", "Windows ");
    

    What this does is it takes the result String and replaces by nothing ("" empty string) all character whose value is outside the (white space) to ~ range (printable ASCII).

    The additional code simply trims all spaces and replaces 2+ spaces by a single one. The last line takes care of potential printable ASCII characters coming between "Windows" and its version (e.g. 7, XP, Vista, etc.).