I have a file universities.txt which looks like this:
Alabama Air University Alabama A&M University Alabama State University Concordia College-Selma Faulkner University Huntingdon College Jacksonville State University Judson College Miles College Oakwood College Samford University Southeastern Bible College Southern Christian University Spring Hill College Stillman College Talladega College University of North Alabama University of South Alabama University of West Alabama Alaska Alaska Bible College Alaska Pacific University Sheldon Jackson College University of Alaska - Anchorage University of Alaska - Fairbanks University of Alaska - Southeast Arizona American Indian College of the Assemblies of God Arizona State University Arizona State University East Arizona State University West DeVry University-Phoenix Embry-Riddle Aeronautical University Grand Canyon University Northcentral University Northern Arizona University
.. and so on, where in this case Alabama, Alaska and Arizona are locations and everything else are universities. What I want to do is load the location into a table called Location
and the Universities into a table called University
, where the Id
of the Location
table is a FK to the University
table, like this:
CREATE TABLE Location (
Id SERIAL PRIMARY KEY,
Name TEXT
);
CREATE TABLE University (
Id SERIAL PRIMARY KEY,
Location INTEGER REFERENCES Location (Id) NOT NULL,
Name TEXT
);
So what I want to do in Postgres is something like this:
for (int i=0 until i = universities.size() i++){
//each entry in the universities vector is a tuple with the first entry being the country/state
//and the second entry being a vector of the universities as String's
Vector tuple = (Vector)universities.get(i);
//insert into location table
String state = (String)tuple.get(0);
Vector u = (Vector)tuple.get(1);
for(int j=0; until j =u.size(); j++){
//insert into university table with i as FK to location table
Anyone knows how to do this?
Here is a pure SQL solution.
Use COPY
to import your file into a temporary table and one DML statement with data modifying CTEs (requires PostgreSQL 9.1 or later) to do the rest. Should be fast for both steps:
Temporary table with a single text column ( dropped automatically at end of session):
CREATE TEMP TABLE tmp (txt text);
Import data from file:
COPY tmp FROM '/path/to/file.txt'
If you are doing this from a remote client, use the meta command \copy
of psql instead.
My solution depends on the data format displayed in the question. I.e.: there is an empty row before and after a city. I assume actual empty strings in the import file. Make sure to have a leading row with an empty string before the first city to avoid a special case.
Rows will be inserted in order. I use that for the following window functions without ordering.
WITH x AS (
SELECT txt
,row_number() OVER () AS rn
,lead(txt) OVER () = '' AND
lag(txt) OVER () = '' AS city
FROM tmp -- don't remove empty rows just yet
), y AS (
SELECT txt, city
,sum(city::int) OVER w AS id
FROM x
WHERE txt <> '' -- remove empty rows now
WINDOW w AS (ORDER BY rn)
), l AS (
INSERT INTO location (id, name)
SELECT id, txt
FROM y
WHERE city
), u AS (
INSERT INTO university u (location, name)
SELECT id, txt
FROM y
WHERE NOT city
)
SELECT setval('location_id_seq', max(id))
FROM y;
Voilá.
CTE x
marks cities based on an empty string value in the rows before and after them.
CTE y
adds a running sum of cities (id
), thereby forming a perfectly valid id
for each city and its unis.
CTEs l
and u
do the inserting, which is now easy.
The final SELECT
sets the next value for the sequence attached to location.id
. We have not been using it, so we must set it to the current maximum value or we run into duplicate key errors with future INSERTs to location.