Search code examples
Hadoop UniqValueCount Map and Aggregate Reducer for Large Dataset (1 billion records)...

hadoopmapreducehadoop-streamingelastic-map-reduce

Read More
Hive / Map-Reduce Job on a Hadoop cluster: How to (roughly) calculate the diskspace needed?...

hadoopmapreducehivehdfselastic-map-reduce

Read More
Hadoop Pig save each line of a file to S3...

hadoopamazon-s3apache-pigelastic-map-reduceamazon-emr

Read More
Downloading files from FTP to local using Java makes the file unreadable - encoding issues...

javahadoopftpelastic-map-reduceamazon-emr

Read More
Reading large files using mapreduce in hadoop...

javahadoopmapreduceelastic-map-reduceamazon-emr

Read More
How to specify mapred configurations & java options with custom jar in CLI using Amazon's EM...

javahadoopmapreduceelastic-map-reduceemr

Read More
Too many open files in EMR...

hadoopmapreduceelastic-map-reduceemr

Read More
Best way to have a fast access key-value storage for huge dataset (5 GB)...

javahadoopmapreduceelastic-map-reduceemr

Read More
How do you use Python UDFs with Pig in Elastic MapReduce?...

apache-pigelastic-map-reduce

Read More
Producing ngram frequencies for a large dataset...

postgresqlhadoopmapreducebigdataelastic-map-reduce

Read More
What ports does Apache Hadoop version 1.0.3 use for intracluster communicaion of the daemons...

hadoopmapreducehbaserhelelastic-map-reduce

Read More
Loading data with Hive, S3, EMR, and Recover Partitions...

hadoopamazon-s3amazon-web-serviceshiveelastic-map-reduce

Read More
Sessionized web logs, get previous and next domain...

sessionhadoopamazon-web-servicesapache-pigelastic-map-reduce

Read More
How to decide on number of parallel mapers/reducers along with Heap memory?...

hadoopmapreduceelastic-map-reduceemr

Read More
Easiest way to get started with Hadoop...

hadoopelastic-map-reduce

Read More
Can I access zookeeper from AWS Elastic Mapreduce job...

hadoopamazon-web-servicesapache-zookeeperelastic-map-reduceemr

Read More
When using LZO on Hadoop output on AWS EMR, does it index the files (stored on S3) for future automa...

amazon-s3amazon-web-serviceselastic-map-reducelzo

Read More
Performance Impact on Elastic Map reduce for Scale Up vs Scale Out scenario's...

amazon-web-servicesmapreduceelastic-map-reduce

Read More
Problems using distcp and s3distcp with my EMR job that outputs to HDFS...

amazon-web-serviceselastic-map-reduceamazon-emremr

Read More
How do I pass the Hadoop Streaming -file flag to Amazon ElasticMapreduce?...

elastic-map-reducehadoop-streaming

Read More
Elastic MapReduce fails with: 1: Syntax error: "(" unexpected...

elastic-map-reduce

Read More
How can I share jar libraries with amazon elastic mapreduce?...

hadoopamazon-ec2elastic-map-reduce

Read More
Setting hadoop parameters with boto?...

pythonbotoelastic-map-reduce

Read More
Can you programmatically control Elastic Mapreduce jobs easily?...

rubyhadoopelastic-map-reduceamazon-emr

Read More
Join performance on AWS elastic map reduce running hive...

amazon-ec2hivehdfselastic-map-reduce

Read More
Interface as Mapper value output...

javainterfacehadoopmapreduceelastic-map-reduce

Read More
AWS Elastic Map Reduce: output to SimpleDB...

hadoopamazon-simpledbelastic-map-reduce

Read More
Amazon EMR: Configuring storage on data nodes...

hadoopamazon-ec2amazon-web-serviceselastic-map-reduceemr

Read More
Force one reducer in AWS EMR...

amazon-web-serviceselastic-map-reduce

Read More
Hadoop seems to modify my key object during an iteration over values of a given reduce call ...

hadoopreduceelastic-map-reduce

Read More
BackNext