Search code examples
sqldatabasepostgresqlcsv

How To Import a Big .csv(~2GB) Into a PostgreSQL Table


I am new into databases and PostgreSQL and today I wanted to import this dataset: https://www.kaggle.com/datasets/wilmerarltstrmberg/recipe-dataset-over-2m/data into a PostgreSQL so I can make a recipe finder app.

I started by creating a new database, then creating a table and then creating the exact columns from the dataset:

create table recipes_table (
title text,
    ingredients text,
    directions text,
    link text,
    source text,
    NER text,
    site text
);

Then I proceeded to import all the data from my .csv (by rightclicking on the table and selecting import). I left everything on default and started importing but after 10 seconds I always get errors.

Unfortunately I lost track of every modification I did to my code and table (there were a lot), but I'll leave the results I had from the processes tab:

--command " "\\copy public.recipes_table (title, ingredients, directions, link, source, \"NER\", site) FROM 'C:/Users/silvi/Desktop/WEBPRO~1/archive/RECIPE~1.CSV' DELIMITER ',' CSV HEADER QUOTE '\"' ESCAPE '''';""

ERROR: unquoted newline found in data
HINT: Use quoted CSV field to represent newline.
CONTEXT: COPY recipes_table, line 1274879` - first iteration; I opened my terminal in VS Code and did this to solve it: sed -i 's/\r//g' recipes_data.csv

And then, with this command:

--command " "\\copy public.recipes_table (title, ingredients, directions, link, source, \"NER\", site) FROM 'C:/Users/silvi/Desktop/WEBPRO~1/archive/RECIPE~1.CSV' DELIMITER ',' CSV HEADER QUOTE '\"' ESCAPE '''';""

I got this error

ERROR: extra data after last expected column
CONTEXT: COPY recipes_table, line 1274879: "Spaghetti Sauce,"[""1-2 lb. hamburger"", ""1-2 lb. Italian Sausage"", ""2 clove garlic"", ""salt"", ..."`

Or this similar one:

ERROR: extra data after last expected column
CONTEXT: COPY recipes_table, line 2: "No-Bake Nut Cookies,"[""1 c. firmly packed brown sugar"", ""1/2 c. evaporated milk"", ""1/2 tsp. van..."`

Now I'm as confused as I can get. How can I import this table in my pgAdmin app so I can visualize it and use it in my project?

And even more, I don't need every column from this .csv so how can I get for example only the title, ingredients, directions and NER?


Solution

  •     create table recipes_table (
    title text,
        ingredients text,
        directions text,
        link text,
        source text,
        NER text,
        site text
    );
    

    Then:

    copy recipes_table(title, ingredients, directions, link, source, NER, site)
        from  'C:/Users/silvi/Desktop/WEBPRO~1/archive/RECIPE~1.CSV' CSV HEADER;
    

    PS: I downloaded and tested the same:

    2,231,142 rows affected in 26 s 752 ms