I am currently working on a project that compiles using JDK1.7, creates and runs Hadoop jobs using Cascading 1.2 (soon to be upgraded to 2.1) and uses a Cloudera distribution of Hadoop (0.20.2-cdh3u3).
I'm looking at how to modify my Cascading/Hadoop jobs to read and write all data to/from a MySQL db. It's looks like SQOOP may be able to handle that.
However from what I've seen so far there is little information or documentation on how to do this in Java (I understand SQOOP is mainly supposed to be used for batch jobs called from within a shell) - the Java examples I have followed haven't worked for me. I have tried using SQOOP 1.4 and switching my project to use JDK1.6 as I believe this is required, (although it will break other parts of my project) but I still couldn't get it to work.
Does anyone know if what I'm trying to achieve is even possible? How are other people dealing with this problem? Will the release of SQOOP2 help at all?
The kind of errors I'm seeing when I try to run an org.apache.sqoop.tool.ExportTool to export a CSV to a table are:
Can't initialize javac processor due to (most likely) a class loader problem: java.lang.NoClassDefFoundError: com/sun/tools/javac/processing/JavacProcessingEnvironment
Note: \tmp\sqoop-my.name\compile\9031edc8e43167c10f9f895b64aa79d5\MyTableName.java uses or overrides a deprecated API.
Encountered IOException running export job: java.io.IOException: Could not load jar \tmp\sqoop-my.name\compile\9031edc8e43167c10f9f895b64aa79d5\MyTableName.jar into JVM. (Could not find class MyTableName.)
Thanks Charles and Vikas. This certainly put me on the right track. I ended up using https://github.com/cwensel/cascading.jdbc which uses Hadoop classes DBInputFormat/DBOutput
to make it easy to set up Cascading jobs that read and write to db.
To write I just changed the output flow of my tap to:
String url = "jdbc:mysql://localhost:3306/mydb?user=myusername&password=mypassword";
String driver = "com.mysql.jdbc.Driver";
String tableName = "mytable";
String[] columnNames = {'col1', 'col2', 'col3'}; //Columns I want to write to
TableDesc tableDesc = new TableDesc( tableName );
JDBCScheme dbScheme = new JDBCScheme( columnNames );
Tap dbOutputTap = new JDBCTap( url, driver, tableDesc, dbScheme );
And to read from the db I just made a tap that looked like this:
String url = "jdbc:mysql://localhost:3306/mydb?user=myusername&password=mypassword";
String driver = "com.mysql.jdbc.Driver";
String tableName = "mytable";
String[] columnNames = {'col1', 'col2', 'col3'}; //Columns I want to read from
TableDesc tableDesc = new TableDesc( tableName );
JDBCScheme dbScheme = new JDBCScheme( columnNames, "col1<40" );
Tap dbInputTap = new JDBCTap( url, driver, tableDesc, dbScheme );
I came across Cascading-DBMigrate as well but it seems this is only for reading from db's and not writing to them.