Search code examples
postgresqlscalaamazon-web-servicesaws-glueaws-glue-spark

Is it possible writing down to RDS raw sql (PostgreSQL) using AWS/Glue/Spark shell?


I have a Glue/Connection for an RDS/PostgreSQL DB pre-built via CloudFormation, which works fine in a Glue/Scala/Sparkshell via getJDBCSink API to write down a DataFrame to that DB.

But also I need to write down to the same db, plain sql like create index ... or create table ... etc. How can I forward that sort of statements in the same Glue/Spark shell?


Solution

  • In python, you can provide pg8000 dependency to the spark glue jobs and then run the sql commands by establishing the connection to the RDS using pg8000.

    In scala you can directly establish a JDBC connection without the need of any external library as far as driver is concerned, postgres driver is available in aws glue.

    You can create connection as

    import java.sql.{Connection, DriverManager, ResultSet}
    
    object pgconn extends App {
      println("Postgres connector")
    
      classOf[org.postgresql.Driver]
      val con_st = "jdbc:postgresql://localhost:5432/DB_NAME?user=DB_USER"
      val conn = DriverManager.getConnection(con_str)
      try {
        val stm = conn.createStatement(ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY)
    
        val rs = stm.executeQuery("SELECT * from Users")
    
        while(rs.next) {
          println(rs.getString("quote"))
        }
     } finally {
         conn.close()
      }
    }
    

    or follow this blog