A basic question. I have a bunch of transcripts (.docx files) I want to read into a corpus. I use readtext() to read in single files no problem.
dat <- readtext("~/ownCloud/NLP/interview_1.docx")
As soon as I put "*.docx" in my readtext statement it spits an error.
dat <- readtext("~/ownCloud/NLP/*.docx")
Error: '/var/folders/bl/61g7ngh55vs79cfhfhnstd4c0000gn/T//RtmpWD6KSx/readtext-aa71916b691c0cf3cabc73a2e04a45f7/word/document.xml' does not exist.
In addition: Warning message:
In utils::unzip(file, exdir = path) : error 1 in extracting from zip file
Why the reference to a zip file? I have only .docx files in the directory.
I was able to reproduce the same problem. The issue was there are some hidden/temp .docx
files in that folder, if you delete them and then try the code it works.
To see the hidden files, go to the folder from where you are reading docx
files and based on your OS select a way to show them. On my mac I used
CMD + SHIFT + .
Once you delete them, try the code again and it should work
library(readtext)
dat <- readtext("~/ownCloud/NLP/*.docx")