AWS have recently provided the possibility to attach an EBS volume to specific cluster instance types like m4's. Whilst it is possible to attach an EBS volume using EMR, I cannot seem to find a way to do so via the AWS Data Pipeline. Am I missing something?
We ran into the same situation. Our AWS contacts confirmed that this is not currently supported. You can, however, spin-up your own EMR cluster with attached EBS volume and have Data Pipeline use that cluster (via workgroup). This is less convenient but possibly a workable solution.