Search code examples
amazon-web-servicesapache-sparkemr

What is a good way to automatically change the hive-site-xml of AWS EMR at launch time


To allow BI Tools like Microstrategy to access data on an AWS EMR cluster with Spark SQL, you have to add a property to the hive-site.xml We are raising EMR clusters automatically with CloudFormation templates, but have not found a proper way (other than scripting a step) to change the xml within this process. Do you have any suggestions?


Solution

  • You can use the configuration API to change settings during launch. The classification that you need is "hive-site". Example:

    {
          "Classification": "hive-site",
          "Properties": {
            "javax.jdo.option.ConnectionURL": "jdbc:mysql:\/\/hostname:3306\/hive?createDatabaseIfNotExist=true",
            "javax.jdo.option.ConnectionDriverName": "org.mariadb.jdbc.Driver",
            "javax.jdo.option.ConnectionUserName": "username",
            "javax.jdo.option.ConnectionPassword": "password"
          }
        }