Multiple table input for mapreduce

I am thinking of doing a mapreduce using accumulo tables as input.
Is there a way to have 2 different tables as input, the same way it exists for the multiple files input like addInputPath ?
Or is it possible to have one input from a file and the other one from a table with AccumuloInputFormat ?

Solution

You probably want to take a look at AccumuloMultiTableInputFormat. The Accumulo manual demonstrates how to use it here.

Example Usage:

job.setInputFormat(AccumuloInputFormat.class);

AccumuloMultiTableInputFormat.setConnectorInfo(job, user, new PasswordToken(pass));
AccumuloMultiTableInputFormat.setMockInstance(job, INSTANCE_NAME);

InputTableConfig tableConfig1 = new InputTableConfig();
InputTableConfig tableConfig2 = new InputTableConfig();

Map<String, InputTableConfig> configMap = new HashMap<String, InputTableConfig>();
configMap.put(table1, tableConfig1);
configMap.put(table2, tableConfig2);

AccumuloMultiTableInputFormat.setInputTableConfigs(job, configMap);

See the unit test for AccumuloMultiTableInputFormat here for some additional information.

Note, that unlike normal multiple inputs, you can't specify different Mappers to run on each table. Although, its not a massive problem in this case since the incoming Key/Value types are the same and you can use:

RangeInputSplit split = (RangeInputSplit)c.getInputSplit();
String tableName = split.getTableName();

To workout which table the records are coming from (taken from the Accumulo manual) in your mapper.