Remove Duplicate string(s) from data set

I have a input file like below :

A->B->C->E
A->B->C->D
B->C->D
C->D
D->E ........ ........

My requirement is to write only unique string in output file. If any substring is repeated in any record then do not write in output file.

Output file should be like below :

A->B->C->E
A->B->C->D
D->E

Skip the record 3rd and 4th as these strings are already present in 2nd record.

How can I achieve this through COBOL or Utility program?

Solution

Sort the input into descending order by the reverse value of the string and its length.
Match the trailing strings for two adjacent records, dropping the shorter matching record.
Sort the remaining records into their original sequence.

For the examples given:

Input:

A->B->C->E
A->B->C->D
B->C->D
C->D
D->E

Released to sort (Phase-1 output):

00001 10 E>-C>-B>-A
00002 10 D>-C>-B>-A
00003 07 D>-C>-B
00004 04 D>-C
00005 04 E>-D

Returned from sort (Phase-2 input):

00005 04 E>-D
00001 10 E>-C>-B>-A
00002 10 D>-C>-B>-A
00003 07 D>-C>-B
00004 04 D>-C

Records 3 and 4 match the trailing characters of record 2 and will be dropped.

Phase-2 output (edited):

00005 D->E
00001 A->B->C->E
00002 A->B->C->D

Output (Resequenced):

A->B->C->E
A->B->C->D
D->E

In the following code, the display statements were used only to make a record of the activity normally hidden.

Code:

   environment division.
   input-output section.
   file-control.
       select word-out assign "E:w2out.txt"
           organization line sequential.
       select word-list assign "E:w2in.txt"
           organization line sequential.
       select ph-2-wrk assign "ph-2.txt"
           organization sequential.
       select sort-work-1 assign "sortwork.dat".
       select sort-work-2 assign "sortwork.dat".
   data division.
   file section.
   fd word-out.
   01 word-out-rec pic x(40).
   fd word-list.
   01 word-rec pic x(40).
   fd ph-2-wrk.
   01 ph-2-rec.
       02 ph-2-seq pic 9(5).
       02 ph-2-word pic x(40).
   sd sort-work-1.
   01 sort-1-rec.
       02 sort-1-seq pic 9(5).
       02 sort-1-len pic 9(2).
       02 sort-1-word pic x(40).
   sd sort-work-2.
   01 sort-2-rec.
       02 sort-2-seq pic 9(5).
       02 sort-2-word pic x(40).
   working-storage section.
   01 word-len pic 99 value 0.
   01 seq-num pic 9(5) value 0.
   01 comp-word pic x(40) value high-values.
   procedure division.
       sort sort-work-1
               descending sort-1-word sort-1-len
           input procedure phase-1
           output procedure phase-2
       sort sort-work-2
               ascending sort-2-seq
           using ph-2-wrk
           output procedure write-output-list
       stop run
       .
   phase-1.
       display "Released to sort:"
       open input word-list
       perform until exit
           read word-list
           at end exit perform
           end-read
           perform get-word-len
           add 1 to seq-num
           move seq-num to sort-1-seq
           move word-len to sort-1-len
           move function reverse (word-rec (1:word-len))
               to sort-1-word
           display sort-1-seq space sort-1-len space sort-1-word
           release sort-1-rec
       end-perform
       close word-list
       .
   get-word-len.
       move 0 to word-len
       inspect word-rec tallying word-len
           for characters before space
       .
   phase-2.
       display "Returned from sort:"
       open output ph-2-wrk
       perform until exit
           return sort-work-1
           at end exit perform
           end-return
           display sort-1-seq space sort-1-len space sort-1-word
           if sort-1-word (1:sort-1-len)
                   not = comp-word (1:sort-1-len)
             or sort-1-len = 1
               move sort-1-word to comp-word
               move function reverse (sort-1-word (1:sort-1-len))
                   to ph-2-word
               move sort-1-seq to ph-2-seq
               write ph-2-rec
           end-if
       end-perform
       close ph-2-wrk
       .
   write-output-list.
       open output word-out
       perform until exit
           return sort-work-2
           at end exit perform
           end-return
           write word-out-rec from sort-2-word
       end-perform
       close word-out
       .

This program was developed and tested with a word list containing 69,904 words and without the display statements. Hence the five-digit size of the sequence number and 40-character words. The need to reverse the text to capture the trailing substrings and to reverse again for output appears to be the bottleneck for speed.