I'm trying to run a command line argument on a directory full of files. The files are named by numbers in ascending order.
1815837.xml
1815838.xml
1815839.xml
1815840.xml
Would it be possible to write some kind of script to take all the files in the directory and one by one feed them through the following command (the Stanford NER):
java -mx600m -cp /home/matthias/Workbench/SUTD/nytimes_corpus/NER/stanford-ner-2015-01-30/stanford-ner-3.5.1.jar edu.stanford.nlp.ie.crf.CRFClassifier -loadClassifier /home/matthias/Workbench/SUTD/nytimes_corpus/NER/stanford-ner-2015-01-30/classifiers/english.all.3class.distsim.crf.ser.gz -textFile 1815838.xml -outputFormat inlineXML >> 1815838_output.xml
That code that I'm invoking there outputs the result to the console, so I'm piping it to a specially named file, i.e. >> 1815838_output.xml
It's important that I maintain that naming convention.
Is it feasible to run that code on every file in a directory and save the output accordingly with a short java program or a bash script? What would it look like?
This question is tangentially related to a previous inquiry.
My hazy notion is something like this:
*X* = '1815838'
while(still files in directory)
{
java -mx600m -cp stanford-ner-2015-01-30/stanford-ner-3.5.1.jar edu.stanford.nlp.ie.crf.CRFClassifier -loadClassifier english.all.3class.distsim.crf.ser.gz -textFile *X*.xml -outputFormat inlineXML >> *X* + '_output.xml'
X--
}
In my mind, that works, but I don't know if that's a real thing or if it would work in real life, I googled and didn't find anything like that, but maybe I didn't know exactly what to ask. Is this reasonable? Can someone maybe show me the way?
UPDATE
-rwxr-xr-x 1 matthias matthias 3.8K Apr 10 20:35 1815851.xml*
-rw-r--r-- 1 matthias matthias 4.6K Apr 12 16:25 1815851_output.xml
-rw-r--r-- 1 matthias matthias 5.3K Apr 12 16:25 1815851_output_output.xml
-rwxr-xr-x 1 matthias matthias 3.3K Apr 10 20:35 1815852.xml*
-rw-r--r-- 1 matthias matthias 4.5K Apr 12 16:25 1815852_output.xml
-rw-r--r-- 1 matthias matthias 5.6K Apr 12 16:25 1815852_output_output.xml
-rwxr-xr-x 1 matthias matthias 2.5K Apr 10 20:35 1815853.xml*
-rw-r--r-- 1 matthias matthias 2.9K Apr 12 16:25 1815853_output.xml
-rw-r--r-- 1 matthias matthias 3.3K Apr 12 16:25 1815853_output_output.xml
-rwxr-xr-x 1 matthias matthias 2.4K Apr 10 20:35 1815854.xml*
-rw-r--r-- 1 matthias matthias 2.7K Apr 12 16:25 1815854_output.xml
-rw-r--r-- 1 matthias matthias 2.9K Apr 12 16:25 1815854_output_output.xml
-rwxr-xr-x 1 matthias matthias 2.8K Apr 10 20:35 1815855.xml*
-rw-r--r-- 1 matthias matthias 3.6K Apr 12 16:25 1815855_output.xml
-rw-r--r-- 1 matthias matthias 4.4K Apr 12 16:26 1815855_output_output.xml
without the loop, but also, curiously, nothing written to output
g="$(1816001.xml $f .xml)_output.xml"
java -mx600m -cp /home/matthias/Workbench/SUTD/nytimes_corpus/NER/stanford-ner-2015-01-30/stanford-ner-3.5.1.jar edu.stanford.nlp.ie.crf.CRFClassifier -loadClassifier /home/matthias/Workbench/SUTD/nytimes_corpus/NER/stanford-ner-2015-01-30/classifiers/english.all.3class.distsim.crf.ser.gz -textFile $f -outputFormat inlineXML > $g
That's easily done: Assuming your current directory is where the files are:
for f in *.xml ; do
echo $f | grep -q '_output\.xml$' && continue # skip output files
g="$(basename $f .xml)_output.xml"
command a_lot_of_arguments $f more_arguments >> $g
done
Though I wonder whether you want >>
or >
for redirection. The former will append to the output file if it already exists, for example from a previous run of the same script. The latter will overwrite it.