I'm working on serializations and need to make some benchmarks on Apache Thrift. But I have very long serialization times. Compared to Protobuf, I have more than 100x average times. What am I doing wrong?
Thrift file:
namespace java com.myproject.mybenchmark.thrift
struct StudentThrift {
1: required double sid
2: required double grade
3: required i32 age
4: required i32 year
}
struct StudentThriftList {
1: list<StudentThrift> students
}
Serialization method:
private static void serializeToFile(ArrayList<Student> studentList, String outputFile) {
StudentThriftList thriftStudentList = new StudentThriftList();
for(Student stu:studentList) {
StudentThrift thriftStudent = new StudentThrift();
thriftStudent.setSid(stu.getSid());
thriftStudent.setGrade(stu.getGrade());
thriftStudent.setAge(stu.getAge());
thriftStudent.setYear(stu.getYear());
thriftStudentList.addToStudents(thriftStudent);
}
// Serializing to disk.
try (FileOutputStream fos = new FileOutputStream(outputFile);
TTransport transport = new TIOStreamTransport(fos)) {
TCompactProtocol protocol = new TCompactProtocol(transport);
thriftStudentList.write(protocol);
} catch (Exception e) {
e.printStackTrace();
}
}
Deserialization method:
public static StudentThriftList deserializeFromFile(String dataFile) {
StudentThriftList thriftList = new StudentThriftList();
try (FileInputStream fis = new FileInputStream(dataFile);
TTransport transport = new TIOStreamTransport(fis)) {
TCompactProtocol protocol = new TCompactProtocol(transport);
thriftList.read(protocol);
} catch (Exception e) {
e.printStackTrace();
}
return thriftList;
}
main calss:
public static void main(String[] args) {
ArrayList<Student> studentList = new ArrayList<Student>();
long serStartTime = 0L;
long serEndTime = 0L;
int serCount = 0;
long serTotalTime = 0L;
long desStartTime = 0L;
long desEndTime = 0L;
int desCount = 0;
long desTotalTime = 0L;
// Create a list with 10000 objects:
for (int i = 1; i <= 10000; i++) {
Student student = new Student();
student.setSid(1000 + i * 1.1);
student.setGrade((float) (i * 0.5));
student.setAge(10 + i);
student.setYear(2000 + i);
studentList.add(student);
}
// benchmark serilization-deserialization 1000 times:
for(int i=0; i<1000; i++) {
// serialize...
String outputFile = "output/thriftStudents_"+System.currentTimeMillis();
serStartTime = System.currentTimeMillis();
serializeToFile(studentList, outputFile);
serEndTime = System.currentTimeMillis();
serCount++;
serTotalTime += (serEndTime-serStartTime);
// deserialize...
desStartTime = System.currentTimeMillis();
StudentThriftList stuTriftList = deserializeFromFile(outputFile);
desEndTime = System.currentTimeMillis();
desCount++;
desTotalTime += (desEndTime-desStartTime);
}
// print report
System.out.println("--------------REPORT---------------");
System.out.println("Serializetion count: " + serCount);
System.out.println("Serializetion avg time (ms): " + serTotalTime/(long)serCount);
System.out.println("Deserializetion count: " + desCount);
System.out.println("Deserializetion avg time (ms): " + desTotalTime/(long)desCount);
}
And final report:
--------------REPORT---------------
Serializetion count: 1000
Serializetion avg time (ms): 408
Deserializetion count: 1000
Deserializetion avg time (ms): 448
What if we try using TBinaryProtocol
instead of TCompactProtocol
? It works faster with files since it doesn’t compress the data:
TBinaryProtocol protocol = new TBinaryProtocol(transport);
We can also add BufferedOutputStream
to reduce the number of disk access operations and speed up writing:
try (FileOutputStream fos = new FileOutputStream(outputFile);
BufferedOutputStream bos = new BufferedOutputStream(fos);
TTransport transport = new TIOStreamTransport(bos)) {
TBinaryProtocol protocol = new TBinaryProtocol(transport);
thriftStudentList.write(protocol);
}
And maybe we should remove the extra nesting with StudentThriftList
, serializing the student list directly—this way, there’s less data to process, making it faster:
struct StudentThrift {
1: required double sid
2: required double grade
3: required i32 age
4: required i32 year
}
private static void serializeToFile(List<StudentThrift> students, String outputFile) {
try (FileOutputStream fos = new FileOutputStream(outputFile);
BufferedOutputStream bos = new BufferedOutputStream(fos);
TTransport transport = new TIOStreamTransport(bos)) {
TBinaryProtocol protocol = new TBinaryProtocol(transport);
for (StudentThrift student : students) {
student.write(protocol);
}
}
}