Padding for a file containing Russian Cyrillic characters in a file not working - one Russian character is counted as 2 bytes

I am trying to create a file with fixed column lengths in Unix. The file contains Russian Cyrillic characters and those characters are interpreted different from the normal 1-byte characters.

I am using below script to modify the file (the delimiter of the columns is @-@ and the row delimiter is \r\n):

input_file=$1
output_file=$2

awk -F '@-@' '{printf("%-200s%-200s%-200s%-200s%-200s%-200s%-200s%-200s\r\n", $1, $2, $3, $4, $5, $6, $7, $8)}' $input_file > $output_file

For the columns with normal characters, the output file contains correctly 200 characters columns, but for a column with 30 Cyrillic characters, the output column contains only 170 characters. This way, the lines in the file won't have the same length because the Cyrillic characters occupy 2 bytes and the code will interpret the bytes and not the characters.

Example: НИКОЛАЕВНА has 10 characters, but the script calculates it as having 20 because it occupies 20 bytes.

One input file example:

НИКОЛАЕВНА@-@russ@-@12345@-@asklle@-@НИКОЛАЕВНА@-@454@-@111@-@asdfg

Can you please suggest a way to create the padding so that all the rows have the same number of characters?

Thank you!

Solution

I don't believe awk can do this, but gawk should handle this by default as long as your locale isn't set to "C". For example, LC_ALL=en_US.UTF-8 should provide the expected behavior using gawk.