Search code examples
cobol

Remove specific values out of string in cobol


Would like to ask how to remove specific values in different lengths out of a string.

I have this: '{4:72:SELLS¬#:73:ABC¬#:PPF:TESTPPF¬#:74:BLA¬#:PPF:ABC¬#:74:BLA¬#-}'

( want to remove all ¬#:PPF: Tags with its content. In my example this should be removed:

  • ¬#:PPF:TESTPPF
  • ¬#:PPF:ABC )

And would like to have this: '{4:72:SELLS¬#:73:ABC¬#:74:BLA¬#:74:BLA¬#-}'

I have this code:

01 TINP.
    05 TINPFIELD                         PIC  X(2000) VALUE
       '{4:72:SELLS¬#:73:ABC¬#:PPF:TESTPPF¬#:74:BLA¬#:PPF:TESTPPF¬#:74:BLA¬#-}'.

01 WA-OUTPUT    PIC X(2000) value spaces.
01 WA-TEMP      PIC X(2500).
01 WP-MSG       PIC 9(8) BINARY value zero.
01 WN-ROWCNT    PIC S9(8) BINARY.
01 WN-ROWCNT2   PIC S9(8) BINARY.
01 WP-BEG       PIC 9(8) BINARY.
01 WP-END       PIC 9(8) BINARY.
01 WN-OUT-LEN   PIC 9(8) BINARY value zero.

0000-TESTPROCESSING SECTION.

    display TINPFIELD.

    INSPECT TINPFIELD
            TALLYING WN-Rowcnt FOR ALL "¬#".

    MOVE 1 TO WP-MSG

    PERFORM UNTIL WN-ROWCNT2 >= WN-ROWCNT

      MOVE WP-MSG TO WP-BEG
      display 'WP-BEG' WP-BEG

      UNSTRING TINPFIELD
        DELIMITED BY "¬#"
        INTO WA-TEMP
        POINTER WP-MSG
      END-UNSTRING

      MOVE WP-MSG TO WP-END
      display 'WP-END' WP-END

      if WA-OUTPUT = space
        subtract 1 from wp-end
        STRING TINPFIELD(WP-BEG:WP-END)
               delimited by SIZE
               INTO WA-OUTPUT
        END-STRING
        move wp-end to WN-OUT-LEN
       else
         STRING WA-OUTPUT(1:WN-OUT-LEN)
                delimited by SIZE
                TINPFIELD(WP-BEG:WP-END)
                delimited by SIZE
                '¬#'
                delimited by SIZE
                INTO WA-OUTPUT
         END-STRING
       end-if

       move WP-END TO WN-OUT-LEN
       display 'WN-OUT-LEN' WN-OUT-LEN

       ADD 1 TO WN-ROWCNT2

     END-Perform
     .
     EXIT.

//edit: The Input data is always the same. In the output (after my code runs) I have sometimes a "Tag" twice, sometimes only once and so on. Its not consistent at all. I suppose this is a code issue on my side.


Solution

  • The code was changed to reflect a change in the question.

    This code splits the input into delimited segments copying each segment directly to the output using reference modification. If there is no text to be removed (no delimiter) it will copy all the input directly to the output.

    There are performance implications with this method. Specifically, the move of each segment will cause space-filling in the output. The larger the number of segments the worse the performance.

    I changed WA-OUTPUT to X(2000), since the output can never be larger than the input.

       01 TINP.
           05 TINPFIELD                         PIC  X(2000) VALUE
               '{4:72:SELLS¬#:73:ABC¬#:PPF:TESTPPF¬#:74:BLA' &
               '¬#:PPF:TESTPPF¬#:74:BLA¬#-}'.
    
       01 WA-OUTPUT    PIC X(2000) value spaces.
       01 segment-length binary pic 9(4).
       01 additional-characters binary pic 9(4).
       01 input-pointer binary pic 9(4).
       01 output-pointer binary pic 9(4).
       01 input-length binary pic 9(4).
       01 output-length binary pic 9(4).
       procedure division.
       begin.
           move 1 to input-pointer output-pointer
                     input-length output-length
           inspect TINPFIELD tallying
               input-length for characters before "}".
           display input-length
           display TINPFIELD
           perform until input-pointer > function length(TINPFIELD)
               unstring TINPFIELD delimited "¬#:PPF:"
                   into WA-OUTPUT (output-pointer:)
                   count in segment-length
                   with pointer input-pointer
               add segment-length to output-pointer
               if input-pointer <= function length(TINPFIELD)
                   move 0 to additional-characters
                   inspect TINPFIELD (input-pointer:) tallying
                       additional-characters for characters before "¬"
                   add additional-characters to input-pointer
               end-if
           end-perform
           inspect WA-OUTPUT tallying
               output-length for characters before "}".
           display wa-output
           display output-length
           goback
           .
    

    Output:

    0070
    {4:72:SELLS¬#:73:ABC¬#:PPF:TESTPPF¬#:74:BLA¬#:PPF:TESTPPF¬#:74:BLA¬#-}
    {4:72:SELLS¬#:73:ABC¬#:74:BLA¬#:74:BLA¬#-}
    0042
    

    This is a modification of the above that, while slightly more complicated, eliminates the performance penalty for space-filling when there is a larger number of segments. It is unusual because, while it uses the UNSTRING statement, it does not actually "unstring" anything.

       01 TINP.
           05 TINPFIELD                         PIC  X(2000) VALUE
               '{4:72:SELLS¬#:73:ABC¬#:PPF:TESTPPF¬#:74:BLA' &
               '¬#:PPF:TESTPPF¬#:74:BLA¬#-}'.
    
       01 WA-OUTPUT    PIC X(2000) value spaces.
       01 segment-length binary pic 9(4).
       01 segment-holder pic x.
       01 additional-characters binary pic 9(4).
       01 input-pointer binary pic 9(4).
       01 backup-pointer binary pic 9(4).
       01 output-pointer binary pic 9(4).
       01 input-length binary pic 9(4).
       01 output-length binary pic 9(4).
       procedure division.
       begin.
           move 1 to input-pointer output-pointer
                     input-length output-length
           inspect TINPFIELD tallying
               input-length for characters before "}".
           display input-length
           display TINPFIELD
           perform until input-pointer > function length(TINPFIELD)
               move input-pointer to backup-pointer
               unstring TINPFIELD delimited "¬#:PPF:"
                   into segment-holder
                   count in segment-length
                   with pointer input-pointer
               move TINPFIELD (backup-pointer:segment-length)
                   to WA-OUTPUT (output-pointer:segment-length)
               add segment-length to output-pointer
               if input-pointer <= function length(TINPFIELD)
                   move 0 to additional-characters
                   inspect TINPFIELD (input-pointer:) tallying
                       additional-characters for characters before "¬"
                   add additional-characters to input-pointer
               end-if
           end-perform
           if output-pointer < function length (WA-OUTPUT)
               move space to WA-OUTPUT (output-pointer:)
           inspect WA-OUTPUT tallying
               output-length for characters before "}".
           display wa-output
           display output-length
           goback
           .
    

    The output is the same.