I have a pipeline-based application that analyzes text in different languages (say, English and Chinese). My goal is to have a system that can work in both languages, in a transparent way. NOTE: This question is long because it has many simple code snippets.
The pipeline is composed of three components (let's call them A, B, and C), and I've created them in the following way so that the components are not tightly coupled:
public class Pipeline {
private A componentA;
private B componentB;
private C componentC;
// I really just need the language attribute of Locale,
// but I use it because it's useful to load language specific ResourceBundles.
public Pipeline(Locale locale) {
componentA = new A();
componentB = new B();
componentC = new C();
}
public Output runPipeline(Input) {
Language lang = LanguageIdentifier.identify(Input);
//
ResultOfA resultA = componentA.doSomething(Input);
ResultOfB resultB = componentB.doSomethingElse(resultA); // uses result of A
return componentC.doFinal(resultA, resultB); // uses result of A and B
}
}
Now, every component of the pipeline has something inside which is language specific. For example, in order to analyze Chinese text, I need one lib, and for analyzing English text, I need another different lib.
Moreover, some tasks can be done in one language and cannot be done in the other. One solution to this problem is to make every pipeline component abstract (to implement some common methods), and then have a concrete language-specific implementation. Exemplifying with component A, I'd have the following:
public abstract class A {
private CommonClass x; // common to all languages
private AnotherCommonClass y; // common to all languages
abstract SomeTemporaryResult getTemp(input); // language specific
abstract AnotherTemporaryResult getAnotherTemp(input); // language specific
public ResultOfA doSomething(input) {
// template method
SomeTemporaryResult t = getTemp(input); // language specific
AnotherTemporaryResult tt = getAnotherTemp(input); // language specific
return ResultOfA(t, tt, x.get(), y.get());
}
}
public class EnglishA extends A {
private EnglishSpecificClass something;
// implementation of the abstract methods ...
}
In addition, since each pipeline component is very heavy and I need to reuse them, I thought of creating a factory that caches up the component for further use, using a map that uses the language as the key, like so (the other components would work in the same manner):
public Enum AFactory {
SINGLETON;
private Map<String, A> cache; // this map will only have one or two keys, is there anything more efficient that I can use, instead of HashMap?
public A getA(Locale locale) {
// lookup by locale.language, and insert if it doesn't exist, et cetera
return cache.get(locale.getLanguage());
}
}
So, my question is: What do you think of this design? How can it be improved? I need the "transparency" because the language can be changed dynamically, based on the text that it's being analyzed. As you can see from the runPipeline
method, I first identify the language of the Input, and then, based on this, I need to change the pipeline components to the identified language. So, instead of invoking the components directly, maybe I should get them from the factory, like so:
public Output runPipeline(Input) {
Language lang = LanguageIdentifier.identify(Input);
ResultOfA resultA = AFactory.getA(lang).doSomething(Input);
ResultOfB resultB = BFactory.getB(lang).doSomethingElse(resultA);
return CFactory.getC(lang).doFinal(resultA, resultB);
}
Thank you for reading this far. I very much appreciate every suggestion that you can make on this question.
The factory idea is good, as is the idea, if feasible, to encapsulate the A, B, & C components into single classes for each language. One thing that I would urge you to consider is to use Interface
inheritance instead of Class
inheritance. You could then incorporate an engine that would do the runPipeline
process for you. This is similar to the Builder/Director pattern. The steps in this process would be as follows:
On the extends
vs implements
topic, Allen Holub goes a bit over the top to explain the preference for Interfaces
.
Follow up to you comments:
My interpretation of the application of the Builder pattern here would be that you have a Factory
that would return a PipelineBuilder
. The PipelineBuilder
in my design is one that encompases A, B, & C, but you could have separate builders for each if you like. This builder then is given to your PipelineEngine
which uses the Builder
to generate your results.
As this makes use of a Factory to provide the Builders, your idea above for a Factory remains in tact, replete with its caching mechanism.
With regard to your choice of abstract
extension, you do have the choice of giving your PipelineEngine
ownership of the heavy objects. However, if you do go the abstract
way, note that the shared fields that you have declared are private
and therefore would not be available to your subclasses.