I have a input file like below :
My requirement is to write only unique string in output file. If any substring is repeated in any record then do not write in output file.
Output file should be like below :
Skip the record 3rd and 4th as these strings are already present in 2nd record.
How can I achieve this through COBOL or Utility program?
For the examples given:
Input:
A->B->C->E
A->B->C->D
B->C->D
C->D
D->E
Released to sort (Phase-1 output):
00001 10 E>-C>-B>-A
00002 10 D>-C>-B>-A
00003 07 D>-C>-B
00004 04 D>-C
00005 04 E>-D
Returned from sort (Phase-2 input):
00005 04 E>-D
00001 10 E>-C>-B>-A
00002 10 D>-C>-B>-A
00003 07 D>-C>-B
00004 04 D>-C
Records 3 and 4 match the trailing characters of record 2 and will be dropped.
Phase-2 output (edited):
00005 D->E
00001 A->B->C->E
00002 A->B->C->D
Output (Resequenced):
A->B->C->E
A->B->C->D
D->E
In the following code, the display
statements were used only to make a record of the activity normally hidden.
Code:
environment division.
input-output section.
file-control.
select word-out assign "E:w2out.txt"
organization line sequential.
select word-list assign "E:w2in.txt"
organization line sequential.
select ph-2-wrk assign "ph-2.txt"
organization sequential.
select sort-work-1 assign "sortwork.dat".
select sort-work-2 assign "sortwork.dat".
data division.
file section.
fd word-out.
01 word-out-rec pic x(40).
fd word-list.
01 word-rec pic x(40).
fd ph-2-wrk.
01 ph-2-rec.
02 ph-2-seq pic 9(5).
02 ph-2-word pic x(40).
sd sort-work-1.
01 sort-1-rec.
02 sort-1-seq pic 9(5).
02 sort-1-len pic 9(2).
02 sort-1-word pic x(40).
sd sort-work-2.
01 sort-2-rec.
02 sort-2-seq pic 9(5).
02 sort-2-word pic x(40).
working-storage section.
01 word-len pic 99 value 0.
01 seq-num pic 9(5) value 0.
01 comp-word pic x(40) value high-values.
procedure division.
sort sort-work-1
descending sort-1-word sort-1-len
input procedure phase-1
output procedure phase-2
sort sort-work-2
ascending sort-2-seq
using ph-2-wrk
output procedure write-output-list
stop run
.
phase-1.
display "Released to sort:"
open input word-list
perform until exit
read word-list
at end exit perform
end-read
perform get-word-len
add 1 to seq-num
move seq-num to sort-1-seq
move word-len to sort-1-len
move function reverse (word-rec (1:word-len))
to sort-1-word
display sort-1-seq space sort-1-len space sort-1-word
release sort-1-rec
end-perform
close word-list
.
get-word-len.
move 0 to word-len
inspect word-rec tallying word-len
for characters before space
.
phase-2.
display "Returned from sort:"
open output ph-2-wrk
perform until exit
return sort-work-1
at end exit perform
end-return
display sort-1-seq space sort-1-len space sort-1-word
if sort-1-word (1:sort-1-len)
not = comp-word (1:sort-1-len)
or sort-1-len = 1
move sort-1-word to comp-word
move function reverse (sort-1-word (1:sort-1-len))
to ph-2-word
move sort-1-seq to ph-2-seq
write ph-2-rec
end-if
end-perform
close ph-2-wrk
.
write-output-list.
open output word-out
perform until exit
return sort-work-2
at end exit perform
end-return
write word-out-rec from sort-2-word
end-perform
close word-out
.
This program was developed and tested with a word list containing 69,904 words and without the display
statements. Hence the five-digit size of the sequence number and 40-character words. The need to reverse the text to capture the trailing substrings and to reverse again for output appears to be the bottleneck for speed.