Search code examples
amazon-s3apache-zeppelin

How to use dependencies from S3 in Zeppelin?


Is there a way to add jars that are in a bucket on S3 as a dependency of Zeppelin? tried z.load(s3n://...) and z.addRepo(some_name).url(s3n://...) but they don't seem to do the job..


Solution

  • You could download jars from S3 and put it on the local FS. It could be done inside %dep interpreter like this:

    %dep
    import com.amazonaws.services.s3.AmazonS3Client
    import java.io.File
    import java.nio.file.{Files, StandardCopyOption}
    
    val dest = "/tmp/dependency.jar"
    val s3 = new AmazonS3Client()
    val stream = s3.getObject("buckename", "path.jar").getObjectContent
    
    Files.copy(stream, new File(dest).toPath, StandardCopyOption.REPLACE_EXISTING)
    
    z.load(dest)
    

    Note: You must generate fat jar, i.e. include all custom dependencies not provided by default (for example when you have multiple modules in your project). In maven it could be implemented with maven-shade-plugin:

    <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-shade-plugin</artifactId>
        <version>2.4.2</version>
        <executions>
            <execution>
                <phase>package</phase>
                <goals>
                    <goal>shade</goal>
                </goals>
                <configuration>
                    <artifactSet>
                        <includes>
                            <include>com.yourcompany:*</include>
                        </includes>
                    </artifactSet>
                </configuration>
            </execution>
        </executions>
    </plugin>