Search code examples
javadesign-patternsooparchitecture

Architecture/Design of a pipeline-based system. How to improve this code?


I have a pipeline-based application that analyzes text in different languages (say, English and Chinese). My goal is to have a system that can work in both languages, in a transparent way. NOTE: This question is long because it has many simple code snippets.

The pipeline is composed of three components (let's call them A, B, and C), and I've created them in the following way so that the components are not tightly coupled:

public class Pipeline {
    private A componentA;
    private B componentB;
    private C componentC;

    // I really just need the language attribute of Locale,
    // but I use it because it's useful to load language specific ResourceBundles.
    public Pipeline(Locale locale) {
        componentA = new A();
        componentB = new B();
        componentC = new C();
    }

    public Output runPipeline(Input) {
        Language lang = LanguageIdentifier.identify(Input);
        //
        ResultOfA resultA = componentA.doSomething(Input);
        ResultOfB resultB = componentB.doSomethingElse(resultA); // uses result of A
        return componentC.doFinal(resultA, resultB); // uses result of A and B
    }
}

Now, every component of the pipeline has something inside which is language specific. For example, in order to analyze Chinese text, I need one lib, and for analyzing English text, I need another different lib.

Moreover, some tasks can be done in one language and cannot be done in the other. One solution to this problem is to make every pipeline component abstract (to implement some common methods), and then have a concrete language-specific implementation. Exemplifying with component A, I'd have the following:

public abstract class A {
    private CommonClass x;  // common to all languages
    private AnotherCommonClass y; // common to all languages

    abstract SomeTemporaryResult getTemp(input); // language specific
    abstract AnotherTemporaryResult getAnotherTemp(input); // language specific

    public ResultOfA doSomething(input) {
          // template method
          SomeTemporaryResult t = getTemp(input); // language specific
          AnotherTemporaryResult tt = getAnotherTemp(input); // language specific
          return ResultOfA(t, tt, x.get(), y.get());
    }
}

public class EnglishA extends A {
    private EnglishSpecificClass something;
    // implementation of the abstract methods ... 
}

In addition, since each pipeline component is very heavy and I need to reuse them, I thought of creating a factory that caches up the component for further use, using a map that uses the language as the key, like so (the other components would work in the same manner):

public Enum AFactory {
    SINGLETON;
    
    private Map<String, A> cache; // this map will only have one or two keys, is there anything more efficient that I can use, instead of HashMap?
    
    public A getA(Locale locale) {
        // lookup by locale.language, and insert if it doesn't exist, et cetera
        return cache.get(locale.getLanguage());
    }
}

So, my question is: What do you think of this design? How can it be improved? I need the "transparency" because the language can be changed dynamically, based on the text that it's being analyzed. As you can see from the runPipeline method, I first identify the language of the Input, and then, based on this, I need to change the pipeline components to the identified language. So, instead of invoking the components directly, maybe I should get them from the factory, like so:

public Output runPipeline(Input) {
    Language lang = LanguageIdentifier.identify(Input);
    ResultOfA resultA = AFactory.getA(lang).doSomething(Input);
    ResultOfB resultB = BFactory.getB(lang).doSomethingElse(resultA);
    return CFactory.getC(lang).doFinal(resultA, resultB);
}

Thank you for reading this far. I very much appreciate every suggestion that you can make on this question.


Solution

  • The factory idea is good, as is the idea, if feasible, to encapsulate the A, B, & C components into single classes for each language. One thing that I would urge you to consider is to use Interface inheritance instead of Class inheritance. You could then incorporate an engine that would do the runPipeline process for you. This is similar to the Builder/Director pattern. The steps in this process would be as follows:

    1. get input
    2. use factory method to get correct interface (english/chinese)
    3. pass interface into your engine
    4. runPipeline and get result

    On the extends vs implements topic, Allen Holub goes a bit over the top to explain the preference for Interfaces.


    Follow up to you comments:

    My interpretation of the application of the Builder pattern here would be that you have a Factory that would return a PipelineBuilder. The PipelineBuilder in my design is one that encompases A, B, & C, but you could have separate builders for each if you like. This builder then is given to your PipelineEngine which uses the Builder to generate your results.

    As this makes use of a Factory to provide the Builders, your idea above for a Factory remains in tact, replete with its caching mechanism.

    With regard to your choice of abstract extension, you do have the choice of giving your PipelineEngine ownership of the heavy objects. However, if you do go the abstract way, note that the shared fields that you have declared are private and therefore would not be available to your subclasses.