Search code examples
shellawkwc

Count bytes in a field


I have a file that looks like this:

ASDFGHJ|ASDFEW|ASFEWFEWAFEWASDFWE FEWFDWAEWA FEWDWDFEW|EWFEW|ASKOKJE
IOJIKNH|ASFDFEFW|ASKDFJEO JEWIOFJS IEWOFJEO SJFIEWOF WE|WEFEW|ASFEWAS

I'm having trouble with this file because it's written in Cyrillic and the database complains about number of bytes (vs number of characters). I want to check if, for example, the first field is larger than 10 bytes, the second field is larger than 30 bytes, etc.

I've been trying a lot of different things: awc, wc... I know with wc -c I can count bytes but how can I retrieve only the lines that have a field that is larger than X?

Any idea?


Solution

  • If you are open to using perl then this could help. I have added comments to make it easier for you to follow:

    #!/usr/bin/perl
    
    use strict;
    use warnings;
    use bytes;
    
    ## Change the file to path where your file is located
    open my $data, '<', 'file';    
    
    ## Define an array with acceptable sizes for each fields
    my @size = qw( 10 30 ... );        
    
    LINE: while(<$data>) {         ## Read one line at a time      
        chomp;                     ## Remove the newline from each line read
    
        ## Split the line on | and store each fields in an array
        my @fields = split /\|/;   
    
        for ( 0 .. $#fields ) {    ## Iterate over the array
    
            ## If the size is less than desired size move to next line
            next LINE unless bytes::length($fields[$_]) > $size[$_];  
        }
    
        ## If all sizes matched  print the line
        print "$_\n";  
    }