Search code examples
javamachine-learningpipelinetext-mining

Framework for Implement Text Mining Pipeline with Java


I need to develop text document processing pipeline with several processing stages. So I'm looking for a good Java based framework to handle the pipeline with multithreaded processing as I want to focus more on business logic in each processing stage.

After searching few hours, I found following frameworks which goes with some of my requirements. But all of them have their own drawbacks

Apache Commons Pipeline - Seems this is not an active project anymore and doesn't have good documentation

TinkerPop Pipes - Doesn't support multithreaded execution

Spring Batch - Doesn't support several processing stages

Can anyone suggest me any other good lightweight framework for this purpose?


Solution

  • You can take a look at easy batch. It allows you to easily develop pipelines with Java. It also supports parallelism. It was designed to address drawbacks of the frameworks you mentioned with a lightweight and easy to use alternative.

    Hope it helps