Search code examples
unixawksedtextedit

I need to delete string from position X to position Y on each line in a text file


I have a huge flat file 100K records each spanning 3000 columns. I need to removed a segment of the data fay starting position 300 to position 500 before archiving. This is sensitive part of data that needs to be wiped before I can archive. I am looking for a awk or sed or any similar command that can do the trick for me.

Sample file

003133780 MORNING GLORY DR                                        SOUTHAMPTON         PA18966780 MORNING GLORY DR    
0054381303 MADISON ST                                             RADFORD             VA241411303 MADISON ST         
00586728 CONESTOGA COURT                                          CHADDS FORD         PA1931728 CONESTOGA COURT      
1852921800 SAMER RD                                               MILAN               MI481601800 SAMER RD           
192717175 EVERGREEN CIRCLE                                        HENDERSONVILLE      TN37075175 EVERGREEN CIRCLE    
213673217 EAST BRANCH                                             LONGVIEW            TX75604217 EAST BRANCH         
2490423205 NOTTAGE LANE                                           FALLS CHURCH        VA220423205 NOTTAGE LANE       
249357344 BALOGH PLACE                                            LONGWOOD            FL32750344 BALOGH PLACE        
2502811224 WILFORD HOLLOW ROAD                                    VINTON              VA241791224 WILFORD HOLLOW ROAD
277634210 AMANDA CT                                               WHITEHOUSE          TX7579119726 COPPER OAKS DRIVE 
282482507 B ST.                                                   CHESAPEAKE          VA23324507 B ST.               

Expected output

003133780 MORNING GLORY DR                                        SOUTHAMPTON         PA780 MORNING GLORY DR    
0054381303 MADISON ST                                             RADFORD             VA1303 MADISON ST         
00586728 CONESTOGA COURT                                          CHADDS FORD         PA28 CONESTOGA COURT      
1852921800 SAMER RD                                               MILAN               MI1800 SAMER RD           
192717175 EVERGREEN CIRCLE                                        HENDERSONVILLE      TN175 EVERGREEN CIRCLE    
213673217 EAST BRANCH                                             LONGVIEW            TX217 EAST BRANCH         
2490423205 NOTTAGE LANE                                           FALLS CHURCH        VA3205 NOTTAGE LANE       
249357344 BALOGH PLACE                                            LONGWOOD            FL344 BALOGH PLACE        
2502811224 WILFORD HOLLOW ROAD                                    VINTON              VA1224 WILFORD HOLLOW ROAD
277634210 AMANDA CT                                               WHITEHOUSE          TX19726 COPPER OAKS DRIVE 
282482507 B ST.                                                   CHESAPEAKE          VA507 B ST.               

Here I removed the char between position 89 and 95. One small change, I also need to write the changed content to the same file.

Below is the script I have so far. I am looping through all files, dividing them into files of max rows 20000 and then removing the characters from position X and Y before archiving.

for currentfilename in ls -1 *.[tT][xX][tT] do echo $currentfilename tempfilename=${currentfilename%%.*} awk -v A="$tempfilename" '{filename = A "Part" int((NR-1)/20000) ".txt"; print >> filename}' $currentfilename awk '{print substr($0,1,522) substr($0,953) >> filename}' $currentfilename mv $currentfilename $APP_ROOT/Archive done


Solution

  • Assuming "position" means "character":

    awk '{print substr($0,1,299) substr($0,501)}' file
    

    If it doesn't then edit your question to add some REPRESENTATIVE sample input and expected output (e.g. 5 lines of 6 columns each, not thousands of lines of thousands of columns).