I am having an error when trying to automate AWS DataSource creation from S3: I am running a shell script:
#!/bin/bash
for k in 1 2 3 4 5
do
aws machinelearning create-data-source-from-s3 --cli-input-json file://data/cfg/dsrc_training_00$k.json
aws machinelearning create-data-source-from-s3 --cli-input-json file://data/cfg/dsrc_validate_00$k.json
done
and here is an example of the json file it references:
{
"DataSourceId": "Iris_training_00{k}",
"DataSourceName": "[DS Iris] training 00{k}",
"DataSpec": {
"DataLocationS3": "s3://ml-test-predicto-bucket/shuffled_{k}.csv",
"DataSchemaLocationS3": "s3://ml-test-predicto-bucket/dsrc_iris.csv.schema",
"DataRearrangement": {"splitting":{"percentBegin" : 0, "percentEnd" : 70}}
},
"ComputeStatistics": true
}
But when I run my script from the command line I get the error:
Parameter validation failed:
Invalid type for parameter DataSpec.DataRearrangement, value: {u'splitting': {u'percentEnd': u'100', u'percentBegin': u'70'}}, type: <type 'dict'>, valid types: <type 'basestring'>
Can someone please help, I have looked at the API AWS ML documentation and I think I am doing everything right, but I can't seem to solve this error... many thanks !
The DataRearrangement element expects a JSON String object. You are passing a dictionary object.
Change:
"DataRearrangement": {"splitting":{"percentBegin" : 0, "percentEnd" : 70}}
[to]
"DataRearrangement": "{\"splitting\":{\"percentBegin\":0,\"percentEnd\":70}}"