I'm getting an intermittent error when querying certain 'tables' in our Synapse Serverless database. The error is always the same, but the table/csv file referenced can change.
OLE DB or ODBC error: [DataSource.Error] Microsoft SQL: Error encountered while parsing data: 'Unexpected end-of-input within record at [byte: 179132185]. '. Underlying data description: file 'Tables/ProfessionalServices/ProjectManagementAndAccounting/Transaction/ProjInvoiceEmpl/PROJINVOICEEMPL_00003.csv'. Statement ID: {1F343EAF-0F1D-4693-A79B-84EFAEFA3175} | Query hash: 0x5DB5B39FB3B72B73 | Distributed request ID: {2D4C2C9F-8912-4548-8D34-7C62732DEF6D}. Total size of data scanned is 3984 megabytes, total size of data moved is 90 megabytes, total size of data written is 0 megabytes..
Our full setup is:
Dynamics F&O ERP ---> F&O's Built in [Export to Datalake] Functionality ---> Synapse Datalake ---> Synapse Serverless DB (which is constructed of Views that hit the above Datalake files).
I'm 90% certain the issue is the underlying file - in the above error, [Tables/ProfessionalServices/ProjectManagementAndAccounting/Transaction/ProjInvoiceEmpl/PROJINVOICEEMPL_00003.csv] is erroring, as it's being updated by F&O at that same moment I'm querying it.
Reasons I think this:
Edit: Found some documentation, but it appears to say 'pound sand'. I'm thinking there must be something one can do though, else what's the point of real-time-data export from F&O, if it fails 15% of the time it's called?
I'm wondering if there's solution on the Dynamics F&O side, or Synapse Datalake side, to avoid this.
What I've tried / considered
We do have this enabled in our Serverless Views: ROWSET_OPTIONS = '{"READ_OPTIONS":["ALLOW_INCONSISTENT_READS"]}'
I think using CETAS would avoid this issue. But it'd negate advantages as well.
CETAS can only be dropped and recreated. So we'd end up dropping and recreating the CETAS every night
Found Microsoft Documentation
Q: When I read data files by using Synapse serverless, I notice an intermittent error: "Unexpected end-of-input within record at...." A: When tables records are frequently updated or deleted in finance and operations apps, you might notice a read/write contention on CSV files, and this error might occur. This issue occurs when the Synapse serverless SQL service queries a CSV file as it's being updated. In many cases, you might be able to fix this issue by retrying the query
So this appears to be a "working as expected" situation, just the functionality is poor.
I now wonder how other companies work around this. Since, to my thinking - most of the point of Export to Datalake, is that it can be near real-time. But if queries on near real-time data frequently fail, that defeats the purpose.
That is a different question though / will post separately.