I have 20000 news documents to run topic modeling on it:
I want to see the topic dynamics and evolution from the documents. I tried to use the following batch script with Topic modeling by mallet but not work.
#!/bin/bash
for filename in /Users/JasonDou/code/internet_finance/bydocafterseg2; do
./bin/mallet import-dir --input /Users/JasonDou/code/internet_finance/bydocafterseg2/159047443.txt --output bydoc-input.mallet --keep-sequence --remove-stopwords
done
You are missing an asterisk:
#!/bin/bash
for filename in "/Users/JasonDou/code/internet_finance/bydocafterseg2/"*; do
[ -e "$filename" ] || continue
./bin/mallet import-dir --input "$filename" \
--output bydoc-input.mallet --keep-sequence --remove-stopwords
done
The above will list iterate over each file in bydocafterseg2
. You can change it to all .txt
files with: "bydocafterseg2/"*".txt"