Search code examples
javajaxbapache-beamapache-beam-io

Reading an xml file in apache beam using XmlIo


problem statement: i am trying to read and print contents of an xml file in beam using direct runner here is the code snippet:

 public  class  BookStore{

 public  static  void  main  (string  args[]){

 BookOptions options = PipelineOptionsFactory.fromArgs(args).withValidation().as(BookOptions .class); 

 Pipeline pipeline = Pipeline.create(options);

 PCollection<Book> output = pipeline.apply(XmlIO.<Book>read().from("sample.xml")
                 .withRootElement("book") 
                 .withRecordElement("name")
                 .withRecordClass(Book.class));  

         output.apply(ParDo.of(new DoFn<Book,String>(){
             @ProcessElement 
             public void processElement(ProcessContext c)
             {
                 System.out.println("xml  data "+c.element().getname());    
             }
          }));
 pipeline.run();
}
}

my pojo class:


@XmlRootElement(name = "book")
@XmlType(propOrder = {"name"})
public class Book{

    private String name;
    @XmlElement(name = "name")
    public String getName ()
    {
    return name;
    }

    public void setName (String name)
    {
    this.name = name;
    }

    @Override
    public String toString()
    {
    return "ClassPojo [name= "+name+"]";
    }

}

my sample.xml file

<?xml version="1.0" encoding="UTF-8"?> 
<book>
   <name>Harrypotter</name>
</book>

when i execute the above code using direct runner i am getting output of "name" as null

can somebody guide me on this.

is there any example i can refer into....?


Solution

  • Your XML file doesn't correspond to XmlIO options that you define in your pipeline - you need to have a root element that includes your records (books). One of the solutions could be something like this:

    PCollection<Book> output = pipeline.apply(
            XmlIO.<Book>read().from("sample.xml")
                .withRootElement("books")
                .withRecordElement("book")
                .withRecordClass(Book.class));
    

    and XML file should look like this:

    <?xml version="1.0" encoding="UTF-8"?>
    <books>
        <book>
            <name>Harrypotter</name>
        </book>
    </books>