Search code examples
python-3.xpostgresqlpsycopg2pgadminpgadmin-4

Import a random csv as a table on the fly - Postgresql and Python


I am using a pgadmin client. I have multiple csv files.

I would like to import each csv file as a table.

When I tried the below

a) Click create table

b) Enter the name of table and save it.

c) I see the table name

d) Click on "Import csv"

e) selected columns as "header"

f) Clicked "Import"

But I got an error message as below

ERROR:  extra data after last expected column
CONTEXT:  COPY Test_table, line 2: "32,F,52,Single,WHITE,23/7/2180 12:35,25/7/2180..."

I also tried the python psycopg2 version as shown below

import psycopg2
conn = psycopg2.connect("host='xxx.xx.xx.x' port='5432' dbname='postgres' user='abc' password='xxx'")
cur = conn.cursor()
f = open(r'test.csv', 'r')
cur.copy_from(f,public.test, sep=',') #while I see 'test' table under my schema, how can I give here the schema name etc. I don't know wht it says table not defined
f.close()

UndefinedTable: relation "public.test" does not exist

May I check whether it is possible to import some random csv as table using pgadmin import?


Solution

  • Pandas will do this easily. Create a table with a structure as some csv.

    https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html

    The csv is first read by read_csv to a Dataframe

    https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

    Regards Niels

    As I understand the requirement, a new table is wanted for every csv. The code below illustrates that. It can be customized and datatypes can be elaborated, see the documentation for Pandas.DataFrame.to_sql. I think, actually, that the heavy lifting is done by SQLAlchemy

    import io
    import os
    
    import pandas as pd
    import psycopg2
    
    buf_t1 = io.StringIO()
    buf_t1.write("a,b,c,d\n")
    buf_t1.write("1,2,3,4\n")
    buf_t1.seek(0)
    df_t1 = pd.read_csv(buf_t1)
    df_t1.to_sql(name="t1", con="postgresql+psycopg2://host/db", index=False, if_exists='replace')
    #
    buf_t2 = io.StringIO()
    buf_t2.write("x,y,z,t\n")
    buf_t2.write("1,2,3,'Hello World'\n")
    buf_t2.seek(0)
    df_t2 = pd.read_csv(buf_t2)
    df_t2.to_sql(name="t2", con="postgresql+psycopg2://host/db", index=False, if_exists='replace')
    

    This will result in two new tables, t1 and t2. Defined as like this:

    create table t1
    (
        a bigint,
        b bigint,
        c bigint,
        d bigint
    );
    create table t2
    (
        x bigint,
        y bigint,
        z bigint,
        t text
    );