I'm having a problem where, reading a PDF with iText, I get:
"'StandardEncoding' is not a supported encoding name."
The PDF is fine - it's just text. Using iText, I can get it into a PdfPage to read page-by-page, but when I use PdfTextExtractor.GetTextFromPage
to read into a string, I get this error. There doesn't seem to be a way to specify what encoding to read as in this function and the error also points me to documentation for the Encoding.RegisterProvider method.
Looking at Encoding.RegisterProvider, the only option I see there is to call CodePagesEncodingProvider.Instance. Calling this does not add ISO-32000-1. Even if I knew what went into creating an EncodingProvider, it won't let me create a new instance of it.
Another StackOverflow answer says to create a new encoding with the name StandardEncoding. I can't figure out any way to do this. If I create an Encoding
, I get read-only WebName
, BodyName
, HeaderName
, and EncodingName
attributes, not anything I can write to.
Every character in the document has the same character code between STD and WinAnsi, so if all else fails, theoretically I could pretend it's 1252. Last resort, but even this doesn't work. If I try to convert it as 1252, I get the same error about StandardEncoding not being valid. Using GetContentBytes instead of GetTextFromPage doesn't seem to get me anything useful, I assume because it would be getting the entire page, not just its text.
I thought this would be a lot simpler since I'm just dealing with a very simple PDF, but this is surprisingly difficult to even search for - I've been reading the same pages over and over. Has anyone had this issue before and figured it out?
You can use Encoding.RegisterProvider to register a custom provider:
Encoding.RegisterProvider(StandardEncodingProvider.Instance);
Here's an example implementation. I'm not sure which encoding should be used, so I use ISO-8859-1. For all other encodings, it returns null
to use the default provider.
public sealed class StandardEncodingProvider : EncodingProvider
{
public static readonly StandardEncodingProvider Instance = new StandardEncodingProvider();
public static readonly Encoding StandardEncoding = Encoding.GetEncoding("ISO-8859-1");
public override Encoding GetEncoding(string name)
{
if (StringComparer.OrdinalIgnoreCase.Equals(name, "StandardEncoding"))
{
return StandardEncoding;
}
return null;
}
public override Encoding GetEncoding(int codepage)
{
return null;
}
}