Search code examples
notepad++

How to find unused words in a wordlist from a dictionary using notepad++


I want to find words in notepad++ that are not used in any files. Suppose i have a dictionary and a book. I want to find words from the dictionary that are not present in books. How can i do this? Thanks.


Solution

  • As suggested by Toto, Notepad++ is not the job for this. That being said, it is not impossible in Notepad++. Here is how to do it with Shakespeare's sonnet 24:

    Mine eye hath play'd the painter and hath stell'd
    Thy beauty's form in table of my heart;
    My body is the frame wherein 'tis held,
    And perspective it is the painter's art.
    For through the painter must you see his skill,
    To find where your true image pictured lies;
    Which in my bosom's shop is hanging still,
    That hath his windows glazed with thine eyes.
    Now see what good turns eyes for eyes have done:
    Mine eyes have drawn thy shape, and thine for me
    Are windows to my breast, where-through the sun
    Delights to peep, to gaze therein on thee;
    Yet eyes this cunning want to grace their art;
    They draw but what they see, know not the heart.
    
    • Format your book so that it consists of one word per line. Start by going to Search->Replace and typing \b([A-Za-z']+)\b into the Find what: field and \1\n into the Replace with: field. Then ensure the Regular expression radio box is checked and press Replace All. This gives us
    Mine
     eye
     hath
     play'd
     the
    ...
     they
     see
    , know
     not
     the
     heart
    .
    
    • Remove all punctuation from the document by putting [ .,;:] into the Find what and making sure the Replace with is empty:
    Mine
    eye
    hath
    play'd
    the
    ...
    grace
    their
    art
    
    They
    draw
    but
    what
    they
    see
    know
    not
    the
    heart
    
    • Now copy your dictionary (which I hope is in the form of one word per line) above the text. I will just use an example dictionary containing the words painter, aeroplane, camel, shape, done. Mark the end of the dictionary with something unique so that you can find it later. You should now have
    painter
    aeroplane
    camel
    shape
    done
    ----ENDOFDICTIONARY---
    
    
    Mine
    eye
    hath
    play'd
    ...
    the
    heart
    
    • Make everything lowercase by pressing Ctrl-A to select everything and then pressing Ctrl-U
    • Open the Replace dialog and put ^(.*?)$\s+?^(?=.*^\1$) (cf this answer) into Find what and leave Replace with empty. Ensure the . matches newline checkbox (next to the Regular Expression radio box) is checked. Now press Replace All and all the words in the dictionary list which appear in the book will be removed:
    aeroplane
    camel
    ----ENDOFDICTIONARY---
    eye
    play'd
    stell'd
    beauty's
    ...
    

    The words above ---ENDOFDICTIONARY--- will be those which appear nowhere in the text.