Search code examples
pythonregexjavadocregex-groupregex-greedy

RegEx for adding a char in specific places


I am trying to analyze Javadoc comments in python and for that, I need full stops for splitting. How can I add full stops in the right places in a Javadoc comment?

I want something like this: Input:

/**
     * The addVertex method checks to see if the vertex isn't null, and then if
     * the graph does not contain the vertex, the vertex is then added and true
     * is returned
     *
     * @param vertex
     *
     * @throws NullPointerException.
     *
     * @return b
     */

Output:

/**
     * The addVertex method checks to see if the vertex isn't null, and then if
     * the graph does not contain the vertex, the vertex is then added and true
     * is returned.*here*
     *
     * @param vertex.*here*
     *
     * @throws NullPointerException.*here*
     *
     * @return b.*here*
     */

Note: If a full stop/semicolon/comma already exists, then replacement is not required since my program splits on the basis of these 3 punctuation marks.

In addition, Javadoc descriptions can have inline tags, like {@link...}, no punctuation marks are required around this.

Only before @param, @throw, @return(at the end as well) is required.

SOLUTION

test_str = ("/**\n"
    "     * The addVertex method checks to see if the vertex isn't null, and then if\n"
    "     * the graph does not contain the vertex, the vertex is then added and true\n"
    "     * is returned\n"
    "     *\n"
    "     * @param vertex\n"
    "     *\n"
    "     * @throws NullPointerException.\n"
    "     *\n"
    "     * @return b\n"
    "     */")

result = re.sub(r'(@param|@throw|@return)', r'.\1', test_str)
print(result)

This adds a full stop in the places required except for after the last tag, not a problem for splitting!


Solution

  • For those with missing ., you can simply write an expression maybe similar to:

    (\* @param|@return)(.*)
    

    which you can replace it with $1$2.

    enter image description here

    RegEx

    You can modify/change your expressions in regex101.com.

    RegEx Circuit

    You can also visualize your expressions in jex.im:

    enter image description here

    JavaScript Demo

    const regex = /(\* @param|@return)(.*)/gm;
    const str = `/**
         * The addVertex method checks to see if the vertex isn't null, and then if
         * the graph does not contain the vertex, the vertex is then added and true
         * is returned
         *
         * @param vertex
         *
         * @throws NullPointerException.
         *
         * @return b
         */`;
    const subst = `$1$2.`;
    
    // The substituted value will be contained in the result variable
    const result = str.replace(regex, subst);
    
    console.log('Substitution result: ', result);

    Python Code:

    # coding=utf8
    # the above tag defines encoding for this document and is for Python 2.x compatibility
    
    import re
    
    regex = r"(\* @param|@return)(.*)"
    
    test_str = ("/**\n"
        "     * The addVertex method checks to see if the vertex isn't null, and then if\n"
        "     * the graph does not contain the vertex, the vertex is then added and true\n"
        "     * is returned\n"
        "     *\n"
        "     * @param vertex\n"
        "     *\n"
        "     * @throws NullPointerException.\n"
        "     *\n"
        "     * @return b\n"
        "     */")
    
    subst = "\\1\\2."
    
    # You can manually specify the number of replacements by changing the 4th argument
    result = re.sub(regex, subst, test_str, 0, re.MULTILINE)
    
    if result:
        print (result)
    
    # Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.
    

    Output

    /**
         * The addVertex method checks to see if the vertex isn't null, and then if
         * the graph does not contain the vertex, the vertex is then added and true
         * is returned
         *
         * @param vertex.
         *
         * @throws NullPointerException.
         *
         * @return b.
         */
    

    Expression for description

    If you like to add a . after description, this expression might work:

    ([\s\*]+@param)
    

    Python Code

    # coding=utf8
    # the above tag defines encoding for this document and is for Python 2.x compatibility
    
    import re
    
    regex = r"([\s\*]+@param)"
    
    test_str = ("/**\n"
        "     * The addVertex method checks to see if the vertex isn't null, and then if\n"
        "     * the graph does not contain the vertex, the vertex is then added and true\n"
        "     * is returned\n"
        "     *\n"
        "     * @param vertex\n"
        "     *\n"
        "     * @throws NullPointerException.\n"
        "     *\n"
        "     * @return b\n"
        "     */")
    
    subst = ".\\1"
    
    # You can manually specify the number of replacements by changing the 4th argument
    result = re.sub(regex, subst, test_str, 0, re.MULTILINE)
    
    if result:
        print (result)
    
    # Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.