Search code examples
javaandroidencryptionepubskyepub

Reading a decrypted file into a ZipInputStream truncates first file sometimes


I'm working on an e-reader app (using skyepub) that basically downloads encrypted books into the file system (and it saves is decryption key in the database), and when the user tries to read it it loads the book into memory and decrypts it.

The problem is that some books have their first chapter truncated (epub books are actually zip files, with each chapter being a separate file).. this result in this dreaded error:

this XML file does not appear to have any style information associated with it. The document tree is shown below

what i tried

I've verified that the encrypted book is downloaded properly, b/c if I copy the file over to my desktop (from my rooted android) and run this command on it:

openssl aes-192-cbc -d -K *** -iv *** -in test.epub.encrypted -out test.epub

it works just fine. However if i pretty much try to do the same with the following android code

public ContentData getContentData(String baseDirectory, String contentPath) {
    if( contentPath.startsWith("/fonts/")) {
        ... // handle font suff
    }

    int secondSlash = contentPath.indexOf('/', 1);
    if( secondSlash == -1) return null;

    String bookEditionID = contentPath.substring(1,secondSlash);
    String zipEntryName = contentPath.substring(secondSlash+1);

    final ContentData data = new ContentData();

    try {
        InputStream stream = dbUtil.getBookStream(bookEditionID);
        if( stream == null) return null;

        final ZipInputStream zip = new ZipInputStream(stream);

        ZipEntry entry;
        do {
            entry = zip.getNextEntry();
            Log.e("Abjjad","looping through entry: "+entry);
            if( entry == null) {
                zip.close();
                return null;
            }
        } while( !entry.getName().equals(zipEntryName));

        Log.e("debug","going through data with entry: " +entry+", contentLength: "+entry.getSize());

see the method dbUtil.getBookStream:

public InputStream getBookStream( String bookEditionId) {
    BookInfo book = getBookInfo(bookEditionId);

    InputStream origStream = null;
    try {

        // Open the downloaded ePub
        origStream = openFileInput(bookEditionId + ".epub");

        // De-obfuscate the key
        SecretKeySpec sks = getObfuscationKeySpec(bookEditionId);
        Cipher c = Cipher.getInstance("AES/ECB/PKCS5Padding", "BC");
        c.init(Cipher.DECRYPT_MODE, sks);
        byte[] decodedBytes = c.doFinal(Base64.decode(book.decryptionKey, Base64.DEFAULT));
        String keyPair = new String(decodedBytes);

        // Split the key and parse into binary
        int separator = keyPair.indexOf(':');
        byte[] key = DatatypeConverter.parseHexBinary(keyPair.substring(0, separator));
        byte[] iv = DatatypeConverter.parseHexBinary(keyPair.substring(separator + 1));

        c = Cipher.getInstance("AES/CBC/PKCS7Padding","BC");
        c.init(Cipher.DECRYPT_MODE, new SecretKeySpec(key,"AES"), new IvParameterSpec(iv));
        return new CipherInputStream(origStream, c);
    } catch( Exception e) {
        try {
            if (origStream != null) origStream.close();
        } catch( Exception x) {}
        return null;
    }
}

then the log of entry.getSize() returns -1 in the first code block.

bonus (works on iOS!)

we wrote the same code in iOS, and it works perfectly (on the same book):

+ (NSData *)encryptKey:(NSString *)key ivParam:(NSString *)iv bookId:(NSString *)bookId
{
    NSString *keyPair = [NSString stringWithFormat:@"%@:%@", key, iv];
    NSString *secret = [self getObfuscationSecretWithValue:bookId];

    NSData *data = [keyPair dataUsingEncoding:NSASCIIStringEncoding];

    char keyPtr[kCCKeySizeAES128];
    bzero(keyPtr, sizeof(keyPtr));
    [[NSData dataWithHexString:secret] getBytes:keyPtr length:sizeof(keyPtr)];

    NSUInteger dataLength = [data length];
    size_t bufferSize = dataLength + kCCBlockSizeAES128;
    void *buffer = malloc(bufferSize);

    size_t numBytesEncrypted;
    CCCryptorStatus status = CCCrypt(kCCEncrypt, kCCAlgorithmAES, kCCOptionPKCS7Padding | kCCOptionECBMode, keyPtr, kCCKeySizeAES128,
                                     NULL,
                                     [data bytes], [data length],
                                     buffer, bufferSize, &numBytesEncrypted);
    if (status == kCCSuccess) {
        return [NSData dataWithBytes:buffer length:numBytesEncrypted];
    }
    else {
        free(buffer);
        return nil;
    }
}

update

i noticed that this truncation happens only after reading the toc (which seems like the last chapter from the above?).. from the logs:

:::::::::::::::::::::::::::::::
getInputStream: /24748681/OEBPS/toc.ncx
:::::::::::::::::::::::::::::::
looping through entry: mimetype
looping through entry: OEBPS/hayat-ghayr.html
looping through entry: OEBPS/content.opf
looping through entry: OEBPS/images/978-614-425-313-7-hayat-ghayr-cover.png
looping through entry: OEBPS/images/978-614-425-313-7-hayat_fmt.png
looping through entry: OEBPS/template.css
looping through entry: OEBPS/hayat-ghayr-2.html
looping through entry: OEBPS/hayat-ghayr-1.html
looping through entry: OEBPS/hayat-ghayr-3.html
looping through entry: OEBPS/hayat-ghayr-4.html
looping through entry: OEBPS/hayat-ghayr-5.html
looping through entry: OEBPS/hayat-ghayr-6.html
looping through entry: OEBPS/hayat-ghayr-7.html
looping through entry: OEBPS/hayat-ghayr-8.html
looping through entry: OEBPS/hayat-ghayr-9.html
looping through entry: OEBPS/hayat-ghayr-10.html
looping through entry: OEBPS/hayat-ghayr-11.html
looping through entry: OEBPS/hayat-ghayr-12.html
looping through entry: OEBPS/hayat-ghayr-13.html
looping through entry: OEBPS/hayat-ghayr-14.html
looping through entry: OEBPS/hayat-ghayr-15.html
looping through entry: OEBPS/hayat-ghayr-16.html
looping through entry: OEBPS/hayat-ghayr-17.html
looping through entry: OEBPS/hayat-ghayr-18.html
looping through entry: OEBPS/hayat-ghayr-19.html
looping through entry: OEBPS/hayat-ghayr-20.html
looping through entry: OEBPS/hayat-ghayr-21.html
looping through entry: OEBPS/hayat-ghayr-22.html
looping through entry: META-INF/container.xml
looping through entry: OEBPS/images/277.png
looping through entry: OEBPS/toc.ncx
going through data with entry: OEBPS/toc.ncx, contentLength: 5549
returning data
:::::::::::::::::::::::::::::::
getInputStream: /24748681/OEBPS/hayat-ghayr.html
:::::::::::::::::::::::::::::::
looping through entry: mimetype
looping through entry: OEBPS/hayat-ghayr.html
going through data with entry: OEBPS/hayat-ghayr.html, contentLength: -1
returning data

enter image description here


Solution

  • According to the docs, getSize() may return -1 if the size is unknown. This definitely happens in some zip files. In those cases, you'll need to read the entire entry in order to determine its uncompressed size.

    Analysis

    Red herring

    First of all the whole encryption decryption thing was a red herring.. simply copying the same epub/zip file and reading it using the same code resulted in the same page.. so this is a problem with the zip file itself rather than the decryption of it

    Zip documentation

    As mentioned in the java doc, reading a zip file can actually return -1 if the content is unknown (which is exactly what's going on here).. as a matter of fact, we got the same zip file, unzipped it (on command line) then rezipped it with an increased compression level like so:

    zip -9 -r filename.epub *
    

    then we fed the same zip file to the existing code and it worked perfectly!

    solution

    So this is the final code that worked:

        try {
            InputStream stream = abjjadDb.getBookStream(bookEditionID);
            if( stream == null) return null;
    
            final ZipInputStream zip = new ZipInputStream(stream);
    
            ZipEntry entry;
            do {
                entry = zip.getNextEntry();
                if( entry == null) {
                    zip.close();
                    return null;
                }
            } while( !entry.getName().equals(zipEntryName));
    
            data.contentLength = entry.getSize();
            data.lastModified = entry.getTime();
            data.contentPath = contentPath;
    
            InputStream s = zip;
            if( data.contentLength == -1) {
                Log.e("demo",new Object(){}.getClass().getEnclosingMethod().getName()+":: entry \""+entry+"\" has contentLength -1, so we will work around");
                ByteArrayOutputStream buffer = new ByteArrayOutputStream();
                int nRead;
                // use buf to store data from the zip file entry in fixed size
                byte[] buf = new byte[4096];
                while ((nRead = zip.read(buf)) != -1) {
                    // dump that data into buffer, which is a growing buffer
                    buffer.write(buf, 0, nRead);
                }
                buffer.flush();
    
                byte[] finalBuffer = buffer.toByteArray();
                Log.e("demo",new Object(){}.getClass().getEnclosingMethod().getName()+":: entry \""+entry+"\" final data length: "+finalBuffer.length);
                data.contentLength = finalBuffer.length;
                s = new ByteArrayInputStream(finalBuffer);
            }
            final InputStream finalStream = s;
    

    and the logs give us this

    getContentData:: entry "OEBPS/hayat-ghayr.html" has contentLength -1, so we will work around
    getContentData:: entry "OEBPS/hayat-ghayr.html" final data length: 2378
    getContentData:: entry "OEBPS/hayat-ghayr.html" has contentLength -1, so we will work around
    getContentData:: entry "OEBPS/hayat-ghayr.html" final data length: 2378
    

    interestingly.. that size makes an exact match with the actual content length of that file hayat-ghayr if we run this on the command line:

    $ unzip -l b17c024e-89f1-42f7-a546-91d46610cedb.epub 
    Archive:  b17c024e-89f1-42f7-a546-91d46610cedb.epub
      Length     Date   Time    Name
     --------    ----   ----    ----
           20  01-27-12 11:17   mimetype
         2378  04-20-12 10:12   OEBPS/hayat-ghayr.html
         6436  02-06-12 11:06   OEBPS/content.opf
       112579  01-27-12 11:25   OEBPS/images/978-614-425-313-7-hayat-ghayr-cover.png
       182575  01-27-12 11:25   OEBPS/images/978-614-425-313-7-hayat_fmt.png
         7757  01-27-12 11:21   OEBPS/template.css
         5643  01-27-12 11:18   OEBPS/hayat-ghayr-2.html
        20144  01-27-12 11:17   OEBPS/hayat-ghayr-1.html
        65543  01-27-12 11:17   OEBPS/hayat-ghayr-3.html
        59434  01-27-12 11:17   OEBPS/hayat-ghayr-4.html
        66768  01-27-12 11:17   OEBPS/hayat-ghayr-5.html
        49117  01-27-12 11:17   OEBPS/hayat-ghayr-6.html
        65346  01-27-12 11:17   OEBPS/hayat-ghayr-7.html
        74196  01-27-12 11:17   OEBPS/hayat-ghayr-8.html
        73998  01-27-12 11:17   OEBPS/hayat-ghayr-9.html
        61031  01-27-12 11:17   OEBPS/hayat-ghayr-10.html
        68297  01-27-12 11:17   OEBPS/hayat-ghayr-11.html
        72084  01-27-12 11:17   OEBPS/hayat-ghayr-12.html
         2386  01-27-12 11:17   OEBPS/hayat-ghayr-13.html
        61132  01-27-12 11:17   OEBPS/hayat-ghayr-14.html
        46320  01-27-12 11:17   OEBPS/hayat-ghayr-15.html
        32673  01-27-12 11:17   OEBPS/hayat-ghayr-16.html
        88584  01-27-12 11:17   OEBPS/hayat-ghayr-17.html
        56474  01-27-12 11:17   OEBPS/hayat-ghayr-18.html
        52840  01-27-12 11:17   OEBPS/hayat-ghayr-19.html
        80022  01-27-12 11:17   OEBPS/hayat-ghayr-20.html
        50781  01-27-12 11:17   OEBPS/hayat-ghayr-21.html
         2765  01-27-12 11:17   OEBPS/hayat-ghayr-22.html
          265  01-27-12 11:17   META-INF/container.xml
        54942  01-27-12 11:17   OEBPS/images/277.png
         5549  01-27-12 11:17   OEBPS/toc.ncx
         1072  03-23-12 13:28   iTunesMetadata.plist
     --------                   -------
      1529151                   32 files