Search code examples
rtmmclapply

function tm::tm_map encounter an error


I have a VCorpus "oanc" and I want to change all the words to lower case, so I use the following function

oanc1 <- tm_map(oanc, content_transformer(tolower))

But I got a warning:

Warning message:
In mclapply(content(x), FUN, ...) :
  scheduled cores 2 encountered errors in user code, all values of the jobs will be affected

The VCorpus "oanc" is of size 586MB while "oanc1" is only 4MB. In addition, all the contents, except the first text, are broken, and when I run

writeLines(as.character(oanc1[[2]]))

I got

Error in FUN(content(x), ...) : 
invalid input 'O<8c><be>BĭĪ<e2>=<f3><81>̡@>9<c2>Au<b7>l<99><c5>u <c4>%<a0>[,<9c><93><b8><90>w<b7><97><f7>58<e3><d7>><91><bf>"~WD<cf>2<c3><84>1GQ<dd><ed>ـ\<e2><fb><f3><d3>X]<fe>5t!<9f><89>ٍdH<e3><d6>Zu<bc><e8><b6>_RS<f0><f7><81><eb>E<f0><bd>Ԗ2o<b4>G<a7><b9><d2><fc><8a><f2><89>3<a8>ؗ<d6><c0>.w,<l<b7>}<f8>J<8f><f1><f1>����{p<94><a3>x<9e><89><da>e'<8c><ca>}y<d1><ca>V<f7>v<c3>>S^`<9e><86><f1><b1>E<b8>)<cd>ꅹ<e5><ab><<80><eb><8e>z<d0>}<a3>C<86>(%r<86><f4><e3>i*<da>i V{<94>'<f6>i<f6><a7>{dh<d0>jG۾wO<dd>?<<f7>i<c5>c<84>G<dc>3<bb>-E<e9>L<b1><b6>XG<f5>F<81><97><b1><e5><de>ln<b1><d6><f5><f6><90> DŽ<b2>/j<fc><d9>{£<83><f1><c5>;n7<bb>ɰEG<a9><b0><87>!<b5>5]9<b9><e6><fe>_Q<aa>U<a8><c0><cf>,<d9><dc>wܒ<ba>ɑ<f1>Q<c9>:r<e4><b4><ea>w<be>PCb' in 'utf8towcs'

Does any one can help me? My operating system is ubuntu 14.04LTS, and R version 3.2.0


Solution

  • First, make sure the text is encoded in UTF-8 (if you can open the file in a text editor then you should be able to modify the encoding when you save it). If that doesn't fix the problem, then try adding the argument "mc.cores = 1" to the tm_map function.