Search code examples
javaserializationdeserializationthriftthrift-protocol

Serilialize with thrift to a file performance problem


I'm working on serializations and need to make some benchmarks on Apache Thrift. But I have very long serialization times. Compared to Protobuf, I have more than 100x average times. What am I doing wrong?

Thrift file:

namespace java com.myproject.mybenchmark.thrift

struct StudentThrift {
  1: required double sid
  2: required double grade
  3: required i32 age
  4: required i32 year
}

struct StudentThriftList {
  1: list<StudentThrift> students
}

Serialization method:


    private static void serializeToFile(ArrayList<Student> studentList, String outputFile) {
        StudentThriftList thriftStudentList = new StudentThriftList();
        for(Student stu:studentList) {
            StudentThrift thriftStudent = new StudentThrift();
            thriftStudent.setSid(stu.getSid());
            thriftStudent.setGrade(stu.getGrade());
            thriftStudent.setAge(stu.getAge());
            thriftStudent.setYear(stu.getYear());
            thriftStudentList.addToStudents(thriftStudent);
        }

        // Serializing to disk.
        try (FileOutputStream fos = new FileOutputStream(outputFile);
                TTransport transport = new TIOStreamTransport(fos)) {
                TCompactProtocol protocol = new TCompactProtocol(transport);
                thriftStudentList.write(protocol);  
           } catch (Exception e) {
            e.printStackTrace();
        } 
        
    }

Deserialization method:

    public static StudentThriftList deserializeFromFile(String dataFile)  {
        StudentThriftList thriftList = new StudentThriftList();
        try (FileInputStream fis = new FileInputStream(dataFile);
             TTransport transport = new TIOStreamTransport(fis)) {
            TCompactProtocol protocol = new TCompactProtocol(transport);
            thriftList.read(protocol);
        } catch (Exception e) {
            e.printStackTrace();
        } 
        return thriftList;
    }

main calss:

    public static void main(String[] args) {

        ArrayList<Student> studentList = new ArrayList<Student>();

        long serStartTime = 0L;
        long serEndTime = 0L;
        int serCount = 0;
        long serTotalTime = 0L;

        long desStartTime = 0L;
        long desEndTime = 0L;
        int desCount = 0;
        long desTotalTime = 0L;
        
        
        
        // Create a list with 10000 objects:
        for (int i = 1; i <= 10000; i++) {
            Student student = new Student();
            student.setSid(1000 + i * 1.1);  
            student.setGrade((float) (i * 0.5));
            student.setAge(10 + i);
            student.setYear(2000 + i);
            studentList.add(student);
        }

        // benchmark serilization-deserialization 1000 times:
        for(int i=0; i<1000; i++) {
            // serialize...
            String outputFile = "output/thriftStudents_"+System.currentTimeMillis();
            serStartTime = System.currentTimeMillis();
            serializeToFile(studentList, outputFile);
            serEndTime = System.currentTimeMillis();
            serCount++;
            serTotalTime += (serEndTime-serStartTime);
            
            // deserialize...
            desStartTime = System.currentTimeMillis();
            StudentThriftList stuTriftList =  deserializeFromFile(outputFile);
            desEndTime = System.currentTimeMillis();
            desCount++;
            desTotalTime += (desEndTime-desStartTime);
        }
        
        // print report
        System.out.println("--------------REPORT---------------");
        System.out.println("Serializetion count: " + serCount);
        System.out.println("Serializetion avg time (ms): " + serTotalTime/(long)serCount);
        System.out.println("Deserializetion count: " + desCount);
        System.out.println("Deserializetion avg time (ms): " + desTotalTime/(long)desCount);
    }

And final report:

--------------REPORT---------------
Serializetion count: 1000
Serializetion avg time (ms): 408
Deserializetion count: 1000
Deserializetion avg time (ms): 448

Solution

  • What if we try using TBinaryProtocol instead of TCompactProtocol? It works faster with files since it doesn’t compress the data:

    TBinaryProtocol protocol = new TBinaryProtocol(transport);
    

    We can also add BufferedOutputStream to reduce the number of disk access operations and speed up writing:

    try (FileOutputStream fos = new FileOutputStream(outputFile);
         BufferedOutputStream bos = new BufferedOutputStream(fos);
         TTransport transport = new TIOStreamTransport(bos)) {
        TBinaryProtocol protocol = new TBinaryProtocol(transport);
        thriftStudentList.write(protocol);
    }
    

    And maybe we should remove the extra nesting with StudentThriftList, serializing the student list directly—this way, there’s less data to process, making it faster:

    struct StudentThrift {
      1: required double sid
      2: required double grade
      3: required i32 age
      4: required i32 year
    }
    
    private static void serializeToFile(List<StudentThrift> students, String outputFile) {
        try (FileOutputStream fos = new FileOutputStream(outputFile);
             BufferedOutputStream bos = new BufferedOutputStream(fos);
             TTransport transport = new TIOStreamTransport(bos)) {
            TBinaryProtocol protocol = new TBinaryProtocol(transport);
            for (StudentThrift student : students) {
                student.write(protocol);
            }
        }
    }