Search code examples
sqlgithubcopydatabricksdatabricks-sql

Databricks %sql - Copy into by selecting columns to upload


I am trying to copy the table DimAccount (saved as .csv) from Adventureworks into my table tmpAccount in databricks. Instead of uploading all 9 columns I would like to copy only 4 columns. Unfortunaty my commands are wrong.

Can anyone help me here?

%sql

create schema AdventureWorks;

then

%sql

create table AdventureWorks.tmpAccount (
  AccountKey int,                             -- 1
  AccountCodeAlternateKey int,                -- 3
  AccountDescription string,                  -- 5
  AccountType string                          -- 6
);

and finally

%sql 
copy into AdventureWorks.tmpAccount
from
  (select p.$1, p.$3, p.$5, p.$6
  from "https://raw.githubusercontent.com/anotherrepo/myrepo/main/data/Account.csv" p
  )
fileformat = CSV
format_options ('header' = 'true')

Here is the error:

PARSE_SYNTAX_ERROR


Solution

  • You need to use real column names, not the $<number> syntax, especially if you have header = true - then column names will come from the file itself. Also, you need to have file on the cloud service (S3/ADLS/...), not as HTTP URL. See doc for details.