Search code examples
jdbcpysparkapache-spark-sqlvertica

Multi url hosts in JDBC connection


We are using a JDBC URL like "jdbc:vertica://80.90..:***/". How can I set a second Vertica host for a separate cluster in this URL? Both clusters have the same table, username and password. The only difference is the host IP.

I have tried to set the URL as shown below but it doesn't work.

jdbc:vertica://00.00.00.2:1111,00.00.00.1:1111/vertica

    url = "jdbc:vertica://****:***/"
    url1 = "jdbc:vertica://***:****/"
    properties = {
        "user": "****",
        "password": "*****",
        "driver": "com.vertica.jdbc.Driver"
    }

    df =spark.read.format("JDBC").options(
        url =url and url1,
        query = "SELECT COUNT(*) from traffic.stats where date(time_stamp) between '2019-03-16  ' and '2019-03-17' ",
        **properties
    ).load().show()

Note: pyspark 2.4 , vertica jar 9.1.1


Solution

  • One way to do this is to specify a backup host.

    url = "jdbc:vertica://00.00.00.2:1111/vertica"
    
    properties = {
        "user": "****",
        "password": "*****",
        "driver": "com.Vertica.jdbc.Driver",
        "ConnectionLoadBalance": 1,
        "BackupServerNode": "00.00.00.1:1111"
    }
    

    This will try the host specified in the URL (00.00.00.2:1111). If that host is unavailable it will try the BackupServerNode. You can specify multiple backup server nodes separated by commas.

    The above solution will only work if the original host is unavailable.

    Another solution is, if you want a random host selected, you can do that logic within python itself.

    import random
    host_list = ["00.00.00.2:1111", "00.00.00.1:1111"]
    host = random.choice(hosts)  # python2 random syntax, lookup random if using a different version of python
    
    url = "jdbc:vertica://{0}/vertica".format(host)
    

    Note: The connection property BackupServerNode is named such because it is usually used to specify an alternate node within the same database cluster, but if—like yourself—you have two databases with the same username, password, etc., it will also work for connecting to a separate database cluster host.