Search code examples
ruby-on-railspostgresqlcsvcloud9-ide

how to save CSV file for COPY to postgresql table


I am trying to copy a CSV file to populate one of the tables and have been struggling. I would appreciate any hints or help understanding how this should work.

Edit: I am working in Cloud9 IDE.

I have tried putting the file in various locations, including this Shared folder which should be accessible:

thenutritionalgorithm_development=# COPY foods FROM '/Users/Shared/rake.csv';
ERROR:  could not open file "/Users/Shared/rake.csv" for reading: No such file or directory

When I try \COPY, the error message is different:

thenutritionalgorithm_development=# \COPY foods FROM '/Users/Shared/rake.csv';
/Users/Shared/rake.csv: No such file or directory

The tables in this database are all owned by ‘ubuntu,’ which is my current user.

              List of relations
 Schema |       Name        | Type  | Owner  
--------+-------------------+-------+--------
 public | foods             | table | ubuntu
 public | lists             | table | ubuntu
 public | quantities        | table | ubuntu
 public | schema_migrations | table | ubuntu

My database.yml file indicates that ‘ubuntu’ should be the user with access to these databases.

development:
  adapter: postgresql
  encoding: UTF8
  pool: 5
  username: ubuntu
  password: password
  database: thenutritionalgorithm_development

test:
  adapter: postgresql
  encoding: UTF8
  pool: 5
  username: ubuntu
  password: password
  database: thenutritionalgorithm_test

production:
  adapter: postgresql
  encoding: UTF8
  pool: 5
  username: ubuntu
  password: password
  database: thenutritionalgorithm_production

I should note that I am fairly confused/fuzzy on the concept of users in this context. Do the user and password in the .yml have to be anything in particular?

I've granted all privileges to the user ubuntu:

                                    Access privileges
 Schema |       Name        |   Type   |   Access privileges   | Column access privileges 
--------+-------------------+----------+-----------------------+--------------------------
 public | foods             | table    | ubuntu=arwdDxt/ubuntu | 
 public | foods_id_seq      | sequence |                       | 
 public | lists             | table    |                       | 
 public | lists_id_seq      | sequence |                       | 
 public | quantities        | table    |                       | 
 public | quantities_id_seq | sequence |                       | 
 public | schema_migrations | table    |                       | 

I thought that maybe the solution would be to somehow save the file in the database cluster, but I'm not sure how to do that in practice (if that even makes sense as something that could be done). I created the databases using a simple bundle exec rake db:create after writing the .yml file.

In general, I am confused about how and where to save the .CSV file so that it can COPY interact with my postgresql database.

Thank you for any suggestions or tips.


Solution

  • After much trial and error, I have devised an approach that consistently works. It may not be the most efficient or elegant and I appreciate suggested improvements. This is for Cloud9, though I think most if not all can be applied generally. I'm assuming you're starting from scratch or able to delete/rebuild your existing PostgreSQL databases.

    1. Start the PostgreSQL server.

      $ sudo service postgresql start
      
    2. If you have existing PostgreSQL databases that do not contain important data, delete them one at a time.

      $ sudo sudo -u postgres psql
      # DROP DATABASE "database_one";
      # DROP DATABASE "database_two";
      
    3. Assuming you have three databases (development, test, and production), write your database.yml file like this. The reason it works this way is related to how PostgreSQL comes preinstalled on every Cloud9 workspace.

      development:
        adapter: postgresql
        encoding: UTF8
        pool: 5
        username: <%= ENV['USERNAME'] %>
        password: <%= ENV['PASSWORD'] %>
        host: <%= ENV['IP'] %>
        database: yourapp_development
      
      test:
        host: <%= ENV['IP'] %>
        adapter: postgresql
        encoding: UTF8
        pool: 5
        username: <%= ENV['USERNAME'] %>
        password: <%= ENV['PASSWORD'] %>
        database: yourapp_test
      
      production:
        host: <%= ENV['IP'] %>
        adapter: postgresql
        encoding: UTF8
        pool: 5
        username: <%= ENV['USERNAME'] %>
        password: <%= ENV['PASSWORD'] %>
        database: yourapp_production`
      
    4. Create your development and test databases.

      $ bundle exec rake db:create
      $ bundle exec rake db:migrate
      

      Creation of the production database requires separate commands.

      $ RAILS_ENV=production rake db:create
      $ RAILS_ENV=production rake db:migrate
      
    5. Enter the PostgreSQL server. For each table, change the owner of the table to ubuntu.* Grant all privileges on each table to ubuntu.

      $ sudo sudo -u postgres psql
      # \c yourapp_development
      # ALTER TABLE table_dining OWNER TO ubuntu;
      # ALTER TABLE table_kitchen OWNER TO ubuntu;
      # ALTER TABLE lists OWNER TO ubuntu;
      # ALTER TABLE schema_migrations OWNER TO ubuntu;
      # GRANT ALL PRIVILEGES ON table_dining TO ubuntu;
      # GRANT ALL PRIVILEGES ON table_kitchen TO ubuntu;
      # \c postgres
      

      Repeat this for each database. To view/verify the owner of a database table, use the following commands.

      # \c yourapp_development
      # \dt
      
    6. Only for the tables you will be populating using the CSV file: Delete the created_at and updated_at columns that get automatically created.

      ALTER TABLE table_dining DROP COLUMN created_at RESTRICT, DROP COLUMN updated_at RESTRICT;
      
    7. Build a CSV file where Row 1 contains the names of the attributes. Name the first column id and manually add id numbers. Be sure that the CSV columns are (left-to-right) in the same order as how the attributes are added to the database table in your migrations. If you want to change the order of the columns, edit or recreate your migration files and clear and recreate your database (repeat steps 2 and 4).

    8. Save your CSV file to the /db folder by dragging and dropping or File/Upload local files from the Cloud9 menu. I think any folder would work, but I refer to the /db folder in these instructions. Since I often repeat this process of uploading/replacing data, I find it useful to always name the file "rake.csv" and replace the file it each time so that I don't end up "storing" or relying on data in /db. This is meant to be just a temporary location.**

    9. Move the newly created rake.csv file to your /tmp folder with the following command.

      $ scp db/rake.csv //tmp/rake.csv
      
    10. Enter the PostgreSQL server and the database where the CSV data will be added. Delete all contents of your database if you mean to recreate it completely.

      $ sudo sudo -u postgres psql
      # \c yourapp_development
      # DELETE FROM table_dining;
      

      If you want to check whether there is any data in the table, use the following command.

      # \d+ table_dining;
      

      Or perhaps it would be enough to see how many rows of data are in the table.

      # SELECT COUNT(*) FROM table_dining;
      
    11. Finally! Add the CSV data to your table.

      # copy table_dining from '/tmp/rake.csv' with CSV HEADER;
      

      Once you've done this with all of your databases, you can view the data in PostgreSQL or on the Cloud9 terminal.

    12. Once you've populated the production database and wish to push it to heroku (assuming you are starting "from scratch" and do not need to preserve heroku data), use the following commands.

      $ heroku pg:reset DATABASE
      $ heroku pg:push yourapp_production DATABASE_URL
      $ git add -A
      $ git c
      $ git commit -m "message"
      $ git push
      $ heroku pg:push yourapp_production DATABASE_URL
      $ git push heroku
      $ heroku restart
      

    I hope this is helpful! Please let me know any edits/methods you recommend.

    *I suspect this step could be avoided if the database.yml file were written differently, but I am too far along/afraid to make this change at this time. I will edit if someone can corroborate this or if I end up trying it.

    **I'm almost certain this step could be replaced by a more efficient approach; suggestions welcome.