I had a directory with a lot of PGN chess files, from which I wanted to remove the move times (written as [%emt {a_number}]
. I wrote this script:
regex = /\[.emt[^\]]+\]/
directory = "path/to/files"
extension = ".pgn"
Dir.chdir(directory)
Dir.foreach(directory) do |file_name|
file_object = File.open(file_name, "r+")
contents = file_object.read
new_contents = contents.gsub(regex, "")
File.truncate(directory + "/" + file_name, 0)
file_object.puts(new_contents)
file_object.close
end
This removed all the move times, but curiously it appended a large number of null characters to the beginning of the files (I suspect this number is equal to the number of bytes in the file). So I replaced the line new_contents = contents.gsub(regex, "")
with contents.delete("\0")
, but this only made it worse, appending even more null characters to the beginning of the files. How can I remove them?
It should work OK if you replace:
File.truncate(directory + "/" + file_name, 0)
with:
file_object.rewind
or
file_object.seek(0)
File.truncate
should not be applied to open files (as here), and file_object.truncate
should not be followed by any file operation other than file_object.close
.
If you already have a file with nulls that you want to remove, read the file into a string str
, close the file, execute
str.delete!("\000")
and then write str
back to file.