Search code examples
psqlknime

Difference between database connector/reader nodes in KNIME


While creating some basic workflow using KNIME and PSQL I have encountered problems with selecting proper node for fetching data from db.

In node repo we can find at least:

  1. PostgreSQL Connector
  2. Database Reader
  3. Database Connector

Actually, we can do the same using 2) alone or connecting either 1) or 2) to node 3) input.

I assumed there are some hidden advantages like improved performance with complex queries or better overall stability but on the other hand we are using exactly the same database driver, anyway..


Solution

  • There is a big difference between the Connector Nodes and the Reader Node. The Database Reader, reads data into KNIME, the data is then on the machine running the workflow. This can be a bad idea for big tables.

    The Connector nodes do not. The data remains where it is (usually on a remote machine in your cluster). You can then connect Database nodes to the connector nodes. All data manipulation will then happen within the database, no data is loaded to your machine (unless you use the output port preview).

    For the difference of the other two: The PostgresSQL Connector is just a special case of the Database Connector, that has pre-set configuration. However you can make the same configuration with the Database Connector, which allows you to choose more detailed options for non standard databases.