Search code examples
bashvimdd

Reorder lines near the beginning of a huge text file (>20G)


I am a vim user and can use some basic awk or bash commands. Now I have a text (vcf) file with size more than 20G. What I wanted is to move the line #69 to below line#66:

$less huge.vcf
...
    66 ##contig=<ID=9,length=124595110>                                                                                                                                                       
    67 ##contig=<ID=X,length=171031299>                                                                                                                                                       
    68 ##contig=<ID=Y,length=91744698>                                                                                                                                                        
    69 ##contig=<ID=MT,length=16299>
...

What I wanted is:

...
    66 ##contig=<ID=9,length=124595110>     
    67 ##contig=<ID=MT,length=16299>                                                                                                                                                  
    68 ##contig=<ID=X,length=171031299>                                                                                                                                                       
    69 ##contig=<ID=Y,length=91744698>                                                                                                                                                        
...

I tried to open and edit it using vim (LargeFile plugin installed), but still not working very well.


Solution

  • The easy approach is to copy the section you want to edit out of your file, modify it in-place, then copy it back in.

    # extract the first hundred lines
    head -n 100 huge.txt >start.txt
    
    # modify that extracted subset
    vim start.txt
    
    # copy that section back into the beginning of larger file
    dd if=start.txt of=huge.txt conv=notrunc
    

    Note that this only works if your edits don't change the size of the section being modified. That is to say -- make sure that start.txt has the exact same size in bytes after being modified that it had before.