After half an hour searching for an answer to this, I can't think of a way to do it (without it involving opening each text file individually, selecting all and then lowercase-ing with gedit. I would like to be able to run a script, be it by commandline or preferably to include into nautilus-scripts, so that if I select the files on the GUI and rightclick to scripts and lowercase and it will be done.
I know that tr is able to know how to do it, but I can't figure out how can I turn the following call to tr '[:upper:]' '[:lower:]' < input.txt > output.txt
Normally, I would change input.txt to *.txt and *.txt for output.txt, but it doesn't work. Any ideas?
Extra: once that is solved, how to adapt it for nautilus-scripts? :]
Thanks!
Edit: This turned out to be an encoding issue - the OP's input files are UTF16.
After a discussion in the comments, the OP copy/pasted the data from viewing with less
into a pastebin: http://pastebin.com/uHmYmhpT
It looked like this:
<FF><FE>1^@^M^@
^@0^@0^@:^@0^@0^@:^@0^@9^@,^@4^@4^@2^@ ^@-^@-^@>^@ ^@0^@0^@:^@0^@0^@:^@1^@1^@,^@4^@4^@4^@^M^@
^@j& ^@W^@O^@K^@E^@ ^@U^@P^@^M^@
^@T^@H^@I^@S^@ ^@M^@O^@R^@N^@I^@N^@G^@ ^@j&^M^@
^@^M^@
^@2^@^M^@
... and so on.
This is clearly not an ascii (or utf8) text file, and so most standard tools (sed
, grep
, awk
, etc) will not work on it.
The <FF><FE>
at the start is a Byte Order Mark that indicates that this file is UTF16-encoded text. There is a standard tool for converting between UTF16 and UTF8, and UTF8 is compatible with ascii for alphanumeric characters so if we convert it to UTF8, then sed
/grep
/awk
/etc will be able to edit it.
The tool we need is iconv
. Unfortunately, iconv
has no in-place editing feature so we'll have to write a loop that uses a temporary file to do the conversion:
find . -type f -name '*.srt' -print0 | while read -d '' filename; do
if file "$filename"|grep -q 'UTF-16 Unicode'; then
iconv -f UTF16 -t UTF8 -o "$filename".utf8 "$filename" && mv "$filename".utf8 "$filename"
fi
done
Then you can run the find
/sed
command to lowercase them. Most programs won't care that your files are now UTF8 rather than UTF16, but if you have issues then you can write a similar loop that uses iconv
to put them back into UTF16 after you've lowercased them.
If you just want to lowercase all files matching '*.txt':
sed -i 's/.*/\L&/' *.txt
But note that this will run into issues with the command line length if there's a lot of .txt files.
If you want to do lowercasing on all files recursively, I'd use Diego's approach - but there's a couple of errors to fix:
find . -type f -exec sed -i 's/.*/\L&/' {} +
should do the trick.
If you don't want it to be recursive, you want it to only affect '.txt
' files, and you've got too many files for the sed ... *.txt
to work, then use:
find . -maxdepth 1 -type f -name '*.txt' -exec sed -i 's/.*/\L&/' {} +
(-maxdepth 1
stops the recursion)
Older versions of find won't support the -exec ... +
syntax, so if you run into trouble with that then replace the +
with \;
. The +
is preferable because it makes find
invoke sed
with multiple files per invocation, rather than once per file, so it's slightly more efficient.