Search code examples
pluginsemrpresto

How to fix not running presto plugin in AWS EMR


About

I'm trying to use presto plugin like wyukawa/presto-fluentd, it works on localhost(mac os x), but does not work on Amazon EMR.

Detail

on localhost

At first, I tried to work on localhost(mac os x) and just it works.

  • plugin dir

    reizist ...plugin/presto-fluentd $ pwd
    /usr/local/Cellar/presto/0.185/libexec/plugin/presto-fluentd
    reizist ...plugin/presto-fluentd $ ls -1
    fluency-1.3.0.jar
    guava-21.0.jar
    jackson-annotations-2.8.1.jar
    jackson-core-2.7.1.jar
    jackson-databind-2.7.1.jar
    jackson-dataformat-msgpack-0.8.12.jar
    jolokia-jvm-1.3.7-agent.jar
    log-0.148.jar
    msgpack-core-0.8.12.jar
    phi-accural-failure-detector-0.0.4.jar
    presto-fluentd-0.0.1.jar
    slf4j-api-1.7.22.jar
    
  • properties

    reizist ...libexec/etc $ pwd
    /usr/local/Cellar/presto/0.185/libexec/etc
    reizist ...libexec/etc $ ls -1
    catalog
    config.properties
    event-listener.properties
    jvm.config
    log.properties
    node.properties
    reizist ...libexec/etc $ cat event-listener.properties
    event-listener.name=presto-fluentd
    event-listener.fluentd-host=localhost
    event-listener.fluentd-port=24224
    event-listener.fluentd-tag=presto.query
    

left: presto log, center: fluentd log, right: presto-cli

on EMR

Also I tried same on EC2 on EMR, but it did not work. That plugin is correctly loaded, event listener registered, so I feel strange.

  • plugin dir

    [hadoop@ip-172-31-29-54 plugin]$ pwd
    /usr/lib/presto/plugin
    [hadoop@ip-172-31-29-54 plugin]$ ls
    accumulo   cassandra     jmx        memory   mysql           redis                    tpch
    atop       example-http  kafka      ml       postgresql      resource-group-managers
    blackhole  hive-hadoop2  localfile  mongodb  presto-fluentd  teradata-functions
    [hadoop@ip-172-31-29-54 plugin]$ ls -1 presto-fluentd/
    fluency-1.3.0.jar
    guava-21.0.jar
    jackson-annotations-2.8.1.jar
    jackson-core-2.7.1.jar
    jackson-databind-2.7.1.jar
    jackson-dataformat-msgpack-0.8.12.jar
    log-0.148.jar
    msgpack-core-0.8.12.jar
    phi-accural-failure-detector-0.0.4.jar
    presto-fluentd-0.0.1.jar
    slf4j-api-1.7.22.jar
    
  • properties

    [hadoop@ip-172-31-29-54 presto]$ pwd
    /etc/presto
    [hadoop@ip-172-31-29-54 presto]$ tree .
    .
    ├── conf -> /etc/alternatives/presto-conf
    ├── conf.dist
    │   ├── catalog
    │   │   ├── hive.properties
    │   │   └── mysql.properties
    │   ├── config.properties
    │   ├── jvm.config
    │   ├── log.properties
    │   ├── node.properties
    │   └── presto-env.sh
    └── event-listener.properties
    
    3 directories, 8 files
    [hadoop@ip-172-31-29-54 presto]$ cat event-listener.properties
    event-listener.name=presto-fluentd
    event-listener.fluentd-host=localhost
    event-listener.fluentd-port=24224
    event-listener.fluentd-tag=presto.query
    

I also tested by inserting print debugging code, but it looks like not loaded. How I should to work this plugin on EMR?

Thanks.

supplement

here is the fluentd configuration.

<source>
  @type forward
</source>

<match *.**>
  @type stdout
</match>

Solution

  • I resolved by my own.

    Actually I have to locate event-listener.properties on /mnt/var/lib/presto/data/etc , so I did this:

    $s3uri="s3://my-s3-bucket"
    
    # make symbolic link
    sudo mkdir /usr/lib/presto/etc
    sudo ln -s /usr/lib/presto/etc /mnt/var/lib/presto/data
    
    # download presto plugins
    aws s3 sync $s3uri/jar/ /usr/lib/presto/plugin/
    aws s3 sync $s3uri/properties/ /usr/lib/presto/etc/
    
    # make sure all plugins are owned by presto user
    chown -R presto:presto /usr/lib/presto/plugin
    chown -R presto:presto /usr/lib/presto/etc
    
    
    # restart presto
    stop  presto-server
    start presto-server
    

    finally my dir is like below:

    [hadoop@ip-172-31-21-25 presto]$ pwd
    /usr/lib/presto
    [hadoop@ip-172-31-21-25 presto]$ ls -alh
    total 228K
    drwxr-xr-x  7 root   root   4.0K Oct 19 06:52 .
    dr-xr-xr-x 47 root   root   4.0K Oct 19 06:30 ..
    drwxr-xr-x  3 presto presto 4.0K Oct 19 06:30 bin
    drwxr-xr-x  3 presto presto 4.0K Oct 19 06:52 etc
    drwxr-xr-x  2 presto presto  12K Oct 19 06:30 lib
    -rw-r--r--  1 presto presto 188K Sep 22 22:54 NOTICE
    drwxr-xr-x 24 presto presto 4.0K Oct 19 06:45 plugin
    drwxr-xr-x  2 presto presto 4.0K Oct 19 06:30 presto-jdbc
    -rw-r--r--  1 presto presto  119 Sep 22 22:54 README.txt
    
    [hadoop@ip-172-31-21-25 etc]$ pwd
    /usr/lib/presto/etc
    [hadoop@ip-172-31-21-25 etc]$ ls
    event-listener.properties
    
    [hadoop@ip-172-31-21-25 plugin]$ pwd
    /usr/lib/presto/plugin
    [hadoop@ip-172-31-21-25 plugin]$ ls
    accumulo   example-http  localfile  mysql           redis                    tpcds
    atop       hive-hadoop2  memory     postgresql      resource-group-managers  tpch
    blackhole  jmx           ml         presto-fluentd  sqlserver
    cassandra  kafka         mongodb    presto-thrift   teradata-functions
    

    and it works correctly.