when running my extract, got this error:
Found invalid character-encoding for UTF-8 encoding in input. The input file may contain corrupted data, or the specified input encoding in the extractor does not match the actual file encoding. See the DETAILS section for a hexadecimal dump of the file segment containing the invalid character-encoding.
I am not able to read UTF-8 character data through below U-SQL script.
@cgadmdomain =
EXTRACT
row_id string,
orgarea_name string,
last_changed_time string,
start_date string,
stop_date string,
domain_name string,
gui_description string,
media string,
direction string,
distribution string,
threshold1 string,
threshold2 string
FROM @cgadmdomainInPath USING Extractors.Text(delimiter: ';');
File has the data "Test Kö CB" for media column . If I remove this particular record then my script is running fine,please let me know if i need to add anything to the parameters
Are you sure that the file is encoded in UTF-8 and not some other encoding? What is the byte sequence that you see if you open the file with a byte level editor?
Depending on that, you may have to set it to the appropriate Windows-125x encoding or Unicode.
If your data is for example encoded with Windows-1252, you can extract it with the following statement (note we currently only support Windows-125x encoding next to the Unicode encodings):
@data =
EXTRACT ...
FROM ...
USING Extractors.Csv(encoding:System.Text.Encoding.GetEncoding("Windows-1252"));