I am attempting to create an program to scrape xml files. I'm experimenting with go because of it's goroutines. I have several thousand files, so some type of multiprocessing is almost a necessity...
I got a program to successfully run, and convert xml to csv(as a test, not quite the end result), on a test set of files, but when run with the full set of files, it gives this:
runtime: program exceeds 10000-thread limit
I've been looking for similar problems, and theres a couple, but i haven't found one that was similar enough to solve this.
and finally heres some code im running:
// main func (start threads)
for i := range filelist {
channels = append(channels, make(chan Test))
go Parse(files[i], channels[len(channels)-1])
}
// Parse func (individual threads)
func Parse(fileName string, c chan Test) {
defer close(c)
doc := etree.NewDocument()
if err := doc.ReadFromFile(fileName); err != nil {
return
}
root := doc.SelectElement("trc:TestResultsCollection")
for _, test := range root.FindElements("//trc:TestResults/tr:ResultSet/tr:TestGroup/tr:Test") {
var outcome Test
outcome.StepType = test.FindElement("./tr:Extension/ts:TSStepProperties/ts:StepType").Text()
outcome.Result = test.FindElement("./tr:Outcome").Attr[0].Value
for _, attr := range test.Attr {
if attr.Key == "name" {
outcome.Name = attr.Value
}
}
for _, attr := range test.FindElement("./tr:TestResult/tr:TestData/c:Datum").Attr {
if attr.Key == "value" {
outcome.Value = attr.Value
}
}
c <- outcome
}
}
// main (process results when threads return)
for c := 0; c < len(channels); c++ {
for i := range channels[c] {
// csv processing with i
}
}
I'm sure theres some ugly code in there. I've just picked up go recently from other languages...so i apologize in advance. anyhow
any ideas?
I apologize for not including the correct error. as the comments pointed out i was doing something dumb and creating a routine for every file. Thanks to JimB for correcting me, and torek for providing a solution and this link. https://gobyexample.com/worker-pools
jobs := make(chan string, numJobs)
results := make(chan []Test, numJobs)
for w := 0; w < numWorkers; w++ {
go Worker(w, jobs, results)
wg.Add(1)
}
// give workers jobs
for _, i := range files {
if filepath.Ext(i) == ".xml" {
jobs <- ("Path to files" + i)
}
}
close(jobs)
wg.Wait()
//result processing <- results