I am trying to upload a pdf for processing to google's Document AI service. Using google's using Google.Cloud.DocumentAI.V1 for "C#". Looked at the github and docs, not much info. PDF is on the local drive. I converted the pdf to a byte array then converted that to a Bystring. Then set the request mime to "application/pdf" but it return was an error of:
Status(StatusCode="InvalidArgument", Detail="Unsupported input file format.", DebugException="Grpc.Core.Internal.CoreErrorDetailException: {"created":"@1627582435.256000000","description":"Error received from peer ipv4:142.250.72.170:443","file":"......\src\core\lib\surface\call.cc","file_line":1067,"grpc_message":"Unsupported input file format.","grpc_status":3}")
Code:
try
{
//Generate a document
string pdfFilePath = "C:\\Users\\maponte\\Documents\\Projects\\SettonProjects\\OCRSTUFF\\DOC071621-0016.pdf";
var bytes = Encoding.UTF8.GetBytes(pdfFilePath);
ByteString content = ByteString.CopyFrom(bytes);
// Create client
DocumentProcessorServiceClient documentProcessorServiceClient = await DocumentProcessorServiceClient.CreateAsync();
// Initialize request argument(s)
ProcessRequest request = new ProcessRequest
{
ProcessorName = ProcessorName.FromProjectLocationProcessor("*****", "mycountry", "***"),
SkipHumanReview = false,
InlineDocument = new Document(),
RawDocument = new RawDocument(),
};
request.RawDocument.MimeType = "application/pdf";
request.RawDocument.Content = content;
// Make the request
ProcessResponse response = await documentProcessorServiceClient.ProcessDocumentAsync(request);
Document docResponse = response.Document;
Console.WriteLine(docResponse.Text);
}
catch(Exception ex)
{
Console.WriteLine(ex.Message);
}
This is the problem (or at least one problem) - you aren't actually loading the file:
string pdfFilePath = "C:\\Users\\maponte\\Documents\\Projects\\SettonProjects\\OCRSTUFF\\DOC071621-0016.pdf";
var bytes = Encoding.UTF8.GetBytes(pdfFilePath);
ByteString content = ByteString.CopyFrom(bytes);
You instead want:
string pdfFilePath = "path-as-before";
var bytes = File.ReadAllBytes(pdfFilePath);
ByteString content = ByteString.CopyFrom(bytes);
I'd also note, however, that InlineDocument
and RawDocument
are alternatives to each other - specifying either of them removes the other. Your request creation would be better written as:
ProcessRequest request = new ProcessRequest
{
ProcessorName = ProcessorName.FromProjectLocationProcessor("*****", "mycountry", "***"),
SkipHumanReview = false,
RawDocument = new RawDocument
{
MimeType = "application/pdf",
Content = content
}
};