Search code examples
javagroovy

Add an attribute to an HTML tag


How want to convert an html input string, which is the from of:

String tag = "<input type=\"submit\" class=\"cssSubmit\"/>";

to

"<input type=\"submit\" class=\"cssSubmit disable\" disabled=\"disabled\"/>"

Is there any possible Java or Groovy way to do this?

For example:

String convert(String input) {
 //input: <input type=\"submit\" class=\"cssSubmit\"/>
 //process the input string
 //processedString: <input type=\"submit\" class=\"cssSubmit disable\" disabled=\"disabled\"/>
 return processedString;
}

Solution

  • This is the most generic way I can think of:

    public static String editTagXML(String tag,
            Map<String, String> newAttributes,
            Collection<String> removeAttributes)
            throws SAXException, IOException,
            ParserConfigurationException, TransformerConfigurationException,
            TransformerException {
        Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder()
                .parse(new InputSource(new StringReader(tag)));
        Element root = doc.getDocumentElement();
        NamedNodeMap attrs = root.getAttributes();
        for (String removeAttr : removeAttributes) {
            attrs.removeNamedItem(removeAttr);
        }
        for (Map.Entry<String, String> addAttr : newAttributes.entrySet()) {
            final Attr attr = doc.createAttribute(addAttr.getKey());
            attr.setValue(addAttr.getValue());
            attrs.setNamedItem(attr);
        }
        StringWriter result = new StringWriter();
        final Transformer transformer = TransformerFactory.newInstance()
                .newTransformer();
        transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
        transformer.transform(new DOMSource(doc), new StreamResult(result));
        return result.toString();
    }
    
    public static void main(String[] args) throws Exception {
        long start = System.nanoTime();
        String tag = "<input type=\"submit\" class=\"cssSubmit\"/>";
        String edited = editTagXML(tag, new HashMap<String, String>() {{
            put("class", "cssSubmit disable");
            put("disabled", "disabled");
        }}, new ArrayList<>());
        long time = System.nanoTime() - start;
        System.out.println(edited);
        System.out.println("Time: " + time + " ns");
        start = System.nanoTime();
        tag = "<input type=\"submit\" class=\"cssSubmit\"/>";
        editTagXML(tag, new HashMap<String, String>() {{
            put("class", "cssSubmit disable");
            put("disabled", "disabled");
        }}, new ArrayList<>());
        time = System.nanoTime() - start;
        System.out.println("Time2: " + time + " ns");
    }
    

    It is ugly, huge, complicated, throws a lot of checked exceptions and mixes up the attributes order which may or may not be important. It is probably not how it should be done. It is also pretty slow.

    Here is the output:

    <input class="cssSubmit disable" disabled="disabled" type="submit"/>
    Time: 86213231 ns
    Time2: 2379674 ns
    

    The first run is probably so slow because it takes a while to load up the necessary libraries. The second run is surprisingly fast, but my PC is pretty powerful too. If you put some constraints on your input (like, attribute values are only quoted with ", and no " in attribute values and so on), there will be probably a much better way to do it, like using regular expressions or maybe even simple iteration.

    For example, if your input always looks like that, this could work just as well:

        start = System.nanoTime();
        edited = tag.replaceFirst("\"cssSubmit\"", "\"cssSubmit disable\" disabled=\"disabled\"");
        time = System.nanoTime() - start;
        System.out.println(edited);
        System.out.println("Time3: " + time + " ns");
    

    Output:

    <input type="submit" class="cssSubmit disable" disabled="disabled"/>
    Time3: 1422672 ns
    

    Hmm. The funny thing is, it's not that faster.

    OK, but what if we want a more generic solution, but still simple enough? We could use regular expressions:

    private static final Pattern classAttributePattern
            = Pattern.compile("\\bclass=\"([^\"]+)\"");
    public static String disableTag(String tag) {
        Matcher matcher = classAttributePattern.matcher(tag);
        if (!matcher.find()) {
            throw new IllegalArgumentException("Doesn't match: " + tag);
        }
        int start = matcher.start();
        int end = matcher.end();
        String classValue = matcher.group(1);
        if (classValue.endsWith(" disable")) {
            return tag; // already disabled
        } else {
            // assume that if the class doesn't end with " disable",
            // then the disabled attribute is not present as well
            return tag.substring(0, start)
                    + "class=\"" + classValue
                    + " disable\" disabled=\"disabled\""
                    + tag.substring(end);
        }
    }
    

    Note that usually using regular expressions for XML/(X)HTML is extremely error-prone. Here is a non-exhaustive list of example inputs that could break the code above:

    • <input type="submit" class="cssSubmit disable " disabled="disabled"/> - this will break because of the space before the quote;
    • <input type="submit" class='cssSubmit disable' disabled="disabled"/> - this will break because single quotes are not expected by our code;
    • <input type="submit" class = "cssSubmit" disabled="disabled"/> - this will break because there are spaces around =;
    • <input title='this is an input with class="cssSubmit" that could be changed to class="cssSubmit disable"' type="submit" class="cssSubmit" disabled="disabled"/> - this will break because there is attribute-like text in another attribute's value.

    Each of these cases can be fixed by modifying the pattern in some way (although I'm not sure about the last one), but then you can find yet another case when it breaks. So this technique is best used for the input that was generated by a program, rather than written by a human, and even then you should be careful about where the input for that program came from (it could easily contain attribute values like in the last example).