Search code examples
javams-accessjdbcucanaccessjackcess

UCanAccess seems unable to read OLE Object column using getBytes()


I have a quite large .mdb access database I want to convert to SQLite3 to use it under Linux.

I am unable to transfer any of the BLOBs (mostly containing images) I have. Here is a sample test program:

import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;

public class prova {
        public static void main(String[] args) {
                String url = "jdbc:ucanaccess://data/BookDB-201810.mdb";
                try {
                        Connection c = DriverManager.getConnection(url);
                        PreparedStatement ps;
                        ResultSet rs;
                        String q = "SELECT * FROM PersonImage";
                        ps = c.prepareStatement(q);
                        rs = ps.executeQuery();
                        while (rs.next()) {
                                byte[] i = rs.getBytes("Image");
                                String fn = String.format("data/img/i%05d.%d.jpg", rs.getInt("PersonId"), rs.getInt("Index"));
                                try (FileOutputStream fos = new FileOutputStream(fn)) {
                                        fos.write(i);
                                } catch (FileNotFoundException e) {
                                        e.printStackTrace();
                                } catch (IOException e) {
                                        e.printStackTrace();
                                }
                        }
                } catch (SQLException e) {
                        e.printStackTrace();
                }
        }
}

Program runs without errors, but the produced files are "strange" (surely not images):

$ ls -l i00072.1.jpg 
-rw-r--r-- 1 mcon mcon 369 Nov 23 11:38 i00072.1.jpg
$ file i00072.1.jpg 
i00072.1.jpg: Java serialization data, version 5

Looking int them I find this:

....sr..net.ucanaccess.jdbc.BlobKey...........L.
columnNamet..Ljava/lang/String;L..keyt..Ljava/util/HashMap;L.   tableNameq.~..xpt..Imagesr..java.util.HashMap......`....F.
loadFactorI.    [email protected].⠤...8...I..valuexr..java.lang.Number...........xp...
Ht..Indexsr..java.lang.ShorthM7.4`.R...S..valuexq.~.  ..xt..PersonImage

What am I doing wrong?

Update: Since my aim is to convert a book database (maintained on the net as .mdb using the obsolete program BookCAT for historical reasons) I found AccessConverter which seems to fit the bill; unfortunately there are essentially two problems with that:

  • In my database some columns are named "Index", which is a reserved word for SQLite (and others, but apparently not for MS-Access); this is trivially fixed inserting a filter "Index" -> "Idx".
  • There is a translation switch (type) { case xxx: ... to convert the various MS-Access types to more conventional SQLite3 types; breakpointing on the default: label I discovered (as @ErikvonAsmuth correctly guessed) there are unhandled OLE types. I have no idea about how to handle those.

In the program (BookCAT) those fields contain two kind of data:

  • images
  • formatted text

The latter is less important because there's always a duplicate "plaintext" version (it would be nice to retrieve also the formatted version, but...).

I would really like to be able to extract the images, though.

In image data there is a companion "ImageType" column invariably set to "2" which (if I'm not mistaken) means they should be .jpeg images.

What can I do to retrieve OLE data in a usable format?

Note: AccessConverter does not use ucanaccess, it uses directly the underlying com.healthmarketscience.jackcess lib instead.

Note2: it seems BookCAT is built using Delphi, if that is relevant.


Solution

  • It turns out that, in my specific case, all "OLE" fields are actually BLOBs, unrecognizable as OLE2 objects.

    In this condition jackcess returns a Enum OleBlob.ContentType.UNKNOWN Type and, in this case, it will refuse to access BLOB content (OleBlob.content.getBytes() returns null).

    In order to access stored data is necessary to use directly Column.getBytes(name) (completely bypassing OLE subsystem).

    Why, in this condition, ucanaccess returns an invalid value, instead of failure, is beyond me (probably I should file a bug report; comments welcome).

    Image data is plain jpeg-formatted file, while the "formatted text" seems some custom serialization of Delphi TRichText widget I don't know how to parse, but that is another issue.

    Using jackcess Column.getBytes(name) I was able to retrieve the data I needed.