Search code examples
google-cloud-data-fusion

Google Cloud Data Fusion is appending a column to original data


When I am loading data encrypted data from GCS source to GCS sink there one additional column getting added.

Original data Employee ID,Employee First Name,Employee Last Name,Employee Joining Date,Employee location 1,Vinay,Argekar,01/01/2017,India 2,Thirukkumaran,Haridass,02/02/2017,USA 3,David,Wu,03/04/2000,Canada 4,Vinod,Kumar,04/02/2002,India 5,Joshua,Abraham,04/15/2010,France 6,Allaudin,Dastigar,09/24/2012,UK 7,Senthil,Kumar,08/15/2009,Germany 8,Sudha,Narayanan,12/14/2016,India 9,Ravi,Prasad,11/11/2011,Costa Rica

Data came to file after running pipeline

0,Employee ID,Employee First Name,Employee Last Name,Employee Joining Date,Employee location
91,1,Vinay,Argekar,01/01/2017,India 124,2,Thirukkumaran,Haridass,02/02/2017,US
164,3,David,Wu,03/04/2000,Canada
193,4,Vinod,Kumar,04/02/2002,India
224,5,Joshua,Abraham,04/15/2010,France
259,6,Allaudin,Dastigar,09/24/2012,UK
293,7,Senthil,Kumar,08/15/2009,Germany
328,8,Sudha,Narayanan,12/14/2016,India 363,9,Ravi,Prasad,11/11/2011,Costa Rica

First column 0 was not present in original file


Solution

  • When you are configuring the GCS source, did you specify the Format to be CSV or was it left as Text? When the Format is Text, the output schema actually contains an offset, which is the first column that first column that you see in the output data. When you specify the format to be CSV, you have to specify the output schema of the file.