Search code examples
javacsvapache-camelsplitter

Combine Apache Camel CSV with the splitter pattern


I have a large CSV file that I want to proccess with a rate limit. The Splitter Pattern provides exactly what I'm looking for except I can't quite figure out how out how to combine it with the CSV Component.

From the splitter documentation you can handle CSV's like:

from("file:inbox")
  .split().tokenize("\n", 1000).streaming()
     .to("activemq:queue:order");

But idealy I'd like to make use of the Apache Camel CSV component to handle the mash do something more like:

from("file:inbox")
    .unmarshal().csv().split()
    .streaming().parallelProcessing()
    .throttle(requestsPerSecond)
    .bean(new ValidateProcess(), "validate")
    .marshal().csv().to("file:outbox");

I know the code above is completely wrong but hopefully it conveys what I'm trying to achieve. Would this be at all feasible?


Solution

  • So for some reason I couldn't figure this out earlier, I think I was having issue with my classpath not picking up the dependency on org.apache.camel:camel-csv. Once I sorted that out everything was fine.

    Here's what I ended up with:

    final CsvDataFormat csv = new CsvDataFormat(";");
    csv.setLazyLoad(true);
    csv.setSkipFirstLine(true);
    
    from(in).unmarshal(csv).split(body()).streaming().parallelProcessing()
                        .bean(validator, "validateNumber")
                        .filter(header(ValidateProcess.Valid).isEqualTo(true))
                        .throttle(tps).bean(validator, "validate")
                        .marshal().csv()
                        .to(out).log("done.").end();
    

    Basicaly I wanted to stream process a CSV containing numbers against an API which is rate limited at 50 TPS and output the result to csv file.