I have found a strange behaviour of SQL Server 2016 when handling broken pipes inside R scripts. See the T-SQL code below:
DECLARE
@r nvarchar(100)
/* Create a data frame with a broken pipe as one of its fields and a simple ASCII encoded string in another. */
SET @r = N'
df <- data.frame(
a = "¦",
b = "a,b,c"
)';
/* Print @r to detect the inclusion of any unwanted characters. */
PRINT @r;
/* Execute and retrieve the output. */
EXECUTE sp_execute_external_script
@language = N'R',
@script = @r,
@output_data_1_name = N'df'
WITH RESULT SETS ((
BadEncodingColumn varchar(2),
GoodEncodingColumn varchar(5)
));
The PRINT command returns this in the Messages tab:
df <- data.frame(
a = "¦",
b = "a,b,c"
)
However, the final Results tab looks like this:
BadEncodingColumn GoodEncodingColumn
¦ a,b,c
This behaviour seems to emerge at the EXECUTE sp_execute_external_script phase of the script, and I have seen this character (Â) when dealing with other encoding issues with Excel, R and other versions of SQL Server.
Any solutions to this behaviour? And bonus points, what is 'special' about the  character?
Edit: I have tried tinkering with data types inside SQL Server and R to no avail.
The issue appears to be with encoding of non-ASCII characters in the R script (broken pipe is outside the 128 ASCII characters). You can override the encoding using the ‘Encoding’ function explicitly to Unicode(UTF-8) to workaround the issue. For instance your script can be updated as follows
DECLARE
@r nvarchar(100)
/* Create a data frame with a broken pipe as one of its fields and a simple ASCII encoded string in another. */
SET @r = N'
df <- data.frame(
a = "¦",
b = "a,b,c"
)
Encoding(levels(df$a)) <- "UTF-8" ###### Encoding override'
/* Print @r to detect the inclusion of any unwanted characters. */
PRINT @r;
/* Execute and retrieve the output. */
EXECUTE sp_execute_external_script
@language = N'R',
@script = @r,
@output_data_1_name = N'df'
WITH RESULT SETS ((
BadEncodingColumn varchar(2),
GoodEncodingColumn varchar(5)
));
Produces the following results
BadEncodingColumn GoodEncodingColumn
¦ a,b,c