Search code examples
javaapache-poidocx4j

Read data from the Word document in .docx format as each field and save it in database in Java


Is it able to read data from .docx file as a field so that it was able to save in database? It is required to use Java. As an example we have Word forms document like CV and we should read each field e.g.(Name, Surname, Age, Position, Date) so that it was able to save it in database not in one big text column, but as a separate field. enter image description here There 2 libraries exist in Java one of them is Apache POI the other one is docx4j but it gives an approach to save data in one big piece in one text field in the database. But it should separate each field as an element.

I have done so that data is saved in one big piece. As the result data is saved just only in this way enter image description here

I haven't found any approach to to so. Could you suggest something, please.


Solution

  • You need to parse a Microsoft Word Document with the input example you provided and grab specific values for each line.

    First, here is the format of the test file I used, I placed it in my local directory and it follows the same format of your example image:

    Employee

    Name: Bob

    Surname: Smith

    Age: 28

    Position: Developer

    Date: 6/26/18

    import java.io.File;
    import java.io.FileInputStream;
    import java.util.LinkedList;
    import java.util.List;
    import org.apache.poi.xwpf.usermodel.XWPFDocument;
    import org.apache.poi.xwpf.usermodel.XWPFParagraph;
    
        public class Test {
    
        public static void main(String[] args) {
            //exampleFile is the layout file you provided with data added for testing
            List<String> values = parseWordDocument("exampleFile.docx");
            
            for(String s: values)
                System.out.println(s);
        }
        
        public static List<String> parseWordDocument(String documentPath) {
            FileInputStream fInput = null;
            XWPFDocument document = null;
            List<String> parsedValues = null;
            
            try {
                File file = new File(documentPath);
                
                fInput = new FileInputStream(file.getAbsolutePath());
                document = new XWPFDocument(fInput);
                
                //getParagraphs() will grab each paragraph for you
                List<XWPFParagraph> paragraphs = document.getParagraphs();
    
                parsedValues = new LinkedList<>();
               
                for (XWPFParagraph para : paragraphs) {
                    //remove the title
                    if(!para.getText().equals("Employee")) {
                        //here is where you want to parse your line to get needed values
                        String[] splitLine = para.getText().split(":");
                        //based on example input file [1] is the value you need
                        parsedValues.add(splitLine[1]);
                    }
                }
                
                fInput.close();
                document.close();
            } catch (Exception e) {
                e.printStackTrace();
            }
            return parsedValues;
        }
    
    }
    

    With this, the output I get from the List created by parseWordDocument() is:

    Bob

    Smith

    28

    Developer

    6/26/18

    So now you can simply take the returned list and loop that (instead of printing out the values) and create the appropriate SQLite query.