I am new to u-sql of azure datalake analytics. I want to do what I think is a very simple operations but ran into trouble. Basically: I want to create a query which ignore empty string. using it in select works, but not in WHERE statement.
Below the statement I am making and the cryptic error I get
JOB
@xsel_res_1 =
EXTRACT
x_paper_id long,
x_Rank uint,
x_doi string,
x_doc_type string,
x_paper_title string,
x_original_title string,
x_book_title string,
x_paper_year int,
x_paper_date DateTime?,
x_publisher string,
x_journal_id long?,
x_conference_series_id long?,
x_conference_instance_id long?,
x_volume string,
x_issue string,
x_first_page string,
x_last_page string,
x_reference_count long,
x_citation_count long?,
x_estimated_citation int?
FROM @"adl://xmag.azuredatalakestore.net/graph/2018-02-02/Papers.txt"
USING Extractors.Tsv()
;
@xsel_res_2 =
SELECT
x_paper_id AS x_paper_id,
x_doi.ToLower() AS x_doi,
x_doi.Length AS x_doi_length
FROM @xsel_res_1
WHERE NOT string.IsNullOrEmpty(x_doi)
;
@xsel_res_3 =
SELECT
*
FROM @xsel_res_2
SAMPLE ANY (5)
;
OUTPUT @xsel_res_3
TO @"/graph/2018-02-02/x_output/x_papers_x6.tsv"
USING Outputters.Tsv();
THE ERROR
Vertex failed
Vertex failure triggered quick job abort. Vertex failed: SV1_Extract[0][1] with error: Vertex user code error.
VertexFailedFast: Vertex failed with a fail-fast error
E_RUNTIME_USER_EXTRACT_ROW_ERROR: Error occurred while extracting row after processing 10 record(s) in the vertex' input split. Column index: 5, column name: 'x_original_title'.
E_RUNTIME_USER_EXTRACT_EXTRACT_INVALID_CHARACTER_AFTER_QUOTED_FIELD: Invalid character following the ending quote character in a quoted field.
Row selected
Component
RUNTIME
Message
Invalid character following the ending quote character in a quoted field.
Resolution
Column should be fully surrounded with double-quotes and double-quotes within the field escaped as two double-quotes.
Description
Invalid character is detected following the ending quote character in a quoted field. A column delimiter, row delimiter or EOF is expected. This error can occur if double-quotes within the field are not correctly escaped as two double-quotes.
Details
Row Delimiter: 0x0
Column Delimiter: 0x9
HEX: 61 76 6E 69 20 74 65 72 6D 69 6E 20 75 20 70 6F 76 61 6C 6A 73 6B 6F 6A 20 6C 69 73 74 69 6E 69 20 69 20 6E 61 74 70 69 73 75 20 67 20 31 31 38 35 09 22 50 6F 20 6B 6F 6E 63 75 22 ### 20 28 73 74 61 72 69 20 68 72
UPDATE BY the way, the operations work on other datasets, so the problem is not the syntax as far as I can tell
//Define schema of file, must map all columns
@searchlog =
EXTRACT UserId int,
Start DateTime,
Region string,
Query string,
Duration int,
Urls string,
ClickedUrls string
FROM @"/Samples/Data/SearchLog.tsv"
USING Extractors.Tsv();
@searchlog_1 =
SELECT * FROM @searchlog
WHERE NOT string.IsNullOrEmpty(ClickedUrls );
OUTPUT @searchlog_1
TO @"/Samples/Output/SearchLog_output_x1.tsv"
USING Outputters.Tsv();
This is an unfortunate error display for this case.
Assuming text is utf-8, you can use a site like www.hexutf8.com to convert the hex to:
avni termin u povaljskoj listini natpisu g 1185 "Po koncu" (Stari hr
It looks like the input row contains at least one "
character that is not properly escaped. It should look like this:
avni termin u povaljskoj listini natpisu g 1185 ""Po koncu"" (Stari hr