Search code examples
javanlplinguistics

How to conjugate English words in Java?


Say I have a base form of a word and a tag from the Penn Treebank Tag Set. How can I get the conjugated form? For example for "do" and "VBN" how can I get "done"?

I thinks this task is already implemented in some nlp library, so I'd rather not invent the bicycle. Does something like that exist?


Solution

  • If you have a class:

    public Treebank {
        public String conjugate(String base, String formTag);
    
        ...
    }
    

    Then:

    String conjugated = treebank.conjugate(base, formTag);
    

    If you don't have the Treebank class it might look a bit like this:

    public Treebank {
        private Map<String, Map<String, String>> m_map = new HashMap<String, Map<String, String>>();
    
        public Treebank() {
            populate();
        }
    
        public String conjugate(String base, String formTag) {
            return m_map.get(base, formTag);
        }
    
        private void populate() {
            InputStream istream = openDataFile();
    
            try {
                for (Record record = readRecord(istream); record !== null; record = readRecord(istream)) {
    
                    // Add the entry
                    Map<String, String> entry = m_map.get(record.base);
    
                    if (entry == null)
                        entry = new HashMap<String, String>();
    
                    entry.put(record.formTag, record.conjugatedForm);
                    m_map.put(record.base, entry);
               }
            }
            finally {
                closeDataFile(istream);
            }
        }
    
        // Data management - to be implemented.
        private InputStream openDataFile()                     { ... }
        private Record      readRecord(InputStream istream)    { ... }
        private void        closeDataFile(InputStream istream) { ... }
    
        private static class Record {
            String base;
            String formTag;
            String conjugatedForm;
        }
    }
    

    A better solution might involve a database instead of a data file. I would also refactor the data access code into a Data Access Object.