Search code examples
linuxtextfindcode-duplication

How do I find and print duplicate words in a document with the linux Find command?


I want to find out if variable names are duplicated. So, I want to find duplicate words. In addition,   Is it possible to output duplicate word lines?
I found it on the internet but it does not work.

 # grep test.php | awk ‘{print $3}’ | sort | uniq -dc

example.

$color2=$_POST['color2'] ?? '';
$color1=$_POST['color1'] ?? '';
$color3=$_POST['color3'] ?? '';
$color5=$_POST['color5'] ?? '';

$color6=$_POST['color6'] ?? '';
$color3=$_POST['color3'] ?? '';
$color8=$_POST['color8'] ?? '';
$color9=$_POST['color9'] ?? '';

$color13=$_POST['color13'] ?? '';
$color10=$_POST['color10'] ?? '';
$color11=$_POST['color11'] ?? '';
$color12=$_POST['color12'] ?? '';

$color13=$_POST['color13'] ?? '';

Solution

  • Create script findDup.sh with following code:

    for n in $(seq 1 13)
    do
            no_of_lines=$(grep -n color$n= test.php|wc -l)
            if  [ $no_of_lines -gt 1 ]
            then
                    grep -n color$n= test.php
                    echo "--------"
            fi
    done
    

    When you run it in directory containing your test.php you will get duplicate lines with line numbers.

    Example:

    $ ./findDup.sh
    3:$color3=$_POST['color3'] ?? '';
    6:$color3=$_POST['color3'] ?? '';
    --------
    9:$color13=$_POST['color13'] ?? '';
    13:$color13=$_POST['color13'] ?? '';
    --------
    

    You may change limit from 13 to anything you want in above script.