Search code examples
objective-ccpdf-generationcgpdfdocumentcgpdf

Duplicate CGPDFStrings from a CGPDFArray generated from a TJ callback on a PDF stream


OK, so I'm parsing through the PDF content stream, discovered that the TJ callback produces an array of strings, so I grab it and start iterating through it to get the string values like so:

static void Op_TJ(CGPDFScannerRef s, void *info)
{
    CGPDFArrayRef array;
    bool success = CGPDFScannerPopArray(s, &array);
    if(success) {
        NSMutableString *actualString = [[NSMutableString alloc] init];
        NSLog(@"array count:%zu",CGPDFArrayGetCount(array));
        for(size_t i = 0; i < CGPDFArrayGetCount(array); i++) {
            CGPDFStringRef string;
            CGPDFArrayGetString(array, i, &string);
            NSString *stringData = (NSString *)CGPDFStringCopyTextString(string);
            [actualString appendString:stringData];
            NSLog(@"string Data:%@",stringData);
        }
        NSLog(@"actual string:%@",actualString);
    }
}

Only problem is, this is my output:

2013-01-11 12:39:49.895 WinPCS Mobile[1617:c07] began text object
2013-01-11 12:39:49.895 WinPCS Mobile[1617:c07] array count:7
2013-01-11 12:39:49.896 WinPCS Mobile[1617:c07] string Data:In
2013-01-11 12:39:49.896 WinPCS Mobile[1617:c07] string Data:In
2013-01-11 12:39:49.896 WinPCS Mobile[1617:c07] string Data:it
2013-01-11 12:39:49.896 WinPCS Mobile[1617:c07] string Data:it
2013-01-11 12:39:49.897 WinPCS Mobile[1617:c07] string Data:ia
2013-01-11 12:39:49.897 WinPCS Mobile[1617:c07] string Data:ia
2013-01-11 12:39:49.897 WinPCS Mobile[1617:c07] string Data:ls
2013-01-11 12:39:49.898 WinPCS Mobile[1617:c07] actual string:InInititiaials
2013-01-11 12:39:49.898 WinPCS Mobile[1617:c07] ended text object

I've resorted to exiting the for loop if i equals a number divisible by 2, but this is extremely sloppy and seems inefficient, so I'm wondering if anyone has a solution or any idea what the problem might be... I've tried multiple PDF files with the same results.

My simple quick fix was to change the for loop from this:

for(int i = 0; i < CGPDFArrayGetCount(array); i++)

to this:

for(int i = 0; i < CGPDFArrayGetCount(array); i+=2)

Solution

  • CGPDFArrayGetString is defined to return a BOOL that's true if there is a PDF string at the specified index, otherwise false.

    You're not checking the return value!

    My guess is than one time every two you don't have a PDF string (and function returns false).

    In those cases the function doesn't overwrite the string variable that remains the same as the previous cycle.

    Just a guess..