I am fetching tweets via Twitter API in pandas dataframe and writing the data to teradata database. However, unlike other tweets one cell has specific tweet which contains data in bold. When I try to insert it in database, it pops up the following error:
OperationalError: [Version 17.0.0.4] [Session 3046127] [Teradata SQL Driver] [Error 528] A failure occurred while executing rows 1 through 292 of a batch request.
at gosqldriver/teradatasql.(*teradataConnection).makeDriverErrorCode TeradataConnection.go:1120
at gosqldriver/teradatasql.newTeradataRows TeradataRows.go:396
at gosqldriver/teradatasql.(*teradataStatement).QueryContext TeradataStatement.go:122
at gosqldriver/teradatasql.(*teradataConnection).QueryContext TeradataConnection.go:2083
at database/sql.ctxDriverQuery ctxutil.go:48
at database/sql.(*DB).queryDC.func1 sql.go:1579
at database/sql.withLock sql.go:3204
at database/sql.(*DB).queryDC sql.go:1574
at database/sql.(*Conn).QueryContext sql.go:1823
at main.goCreateRows goside.go:654
at main._cgoexpwrap_cfa80c8a3acb_goCreateRows _cgo_gotypes.go:363
at runtime.cgocallbackg1 cgocall.go:332
at runtime.cgocallbackg cgocall.go:207
at runtime.cgocallback_gofunc asm_amd64.s:793
at runtime.goexit asm_amd64.s:1373
Caused by [Version 17.0.0.4] [Session 3046127] [Teradata Database] [Error 6705] An illegally formed character string was encountered during translation.
at gosqldriver/teradatasql.(*teradataConnection).formatDatabaseError TeradataConnection.go:1138
at gosqldriver/teradatasql.(*teradataConnection).makeChainedDatabaseError TeradataConnection.go:1154
The tweets datatype in database is "varchar(1000) CHARACTER SET UNICODE NOT CASESPECIFIC"
Here is the sample data:
The tweet containing bold text is causing the problem in insertion. How do I mitigate this?
To store or retrieve arbitrary Unicode code points, use the Unicode Pass-Through feature both for loading and querying sessions.
SET SESSION CHARACTER SET UNICODE PASS THROUGH ON;
For the specific example given, you might find it useful to "normalize" the Unicode text, e.g. with Python unicodedata.normalize
before loading or Teradata TRANSLATE(...)
after loading if you wanted the corresponding ASCII letter characters - but that would not apply for other Unicode characters such as emoji that may also occur in the input.