Search code examples
amazon-web-servicesavroamazon-personalize

AVRO schema's JSON looks valid but returns Input is not a valid Avro schema


I am trying to upload data to the user dataset in AWS Personalize. The schema contains the structure of my CSV. I checked it online for JSON and it shows valid JSON.

{
    "type": "record",
    "name": "Users",
    "namespace": "com.amazonaws.personalize.schema",
    "fields": [
        {
            "name": "user_id",
            "type": "string"
        },
        {
            "name": "address",
            "type": "record",
            "fields" : [
                {"name": "address1", "type": "string"},
                {"name": "address2", "type": "string"},
                {"name": "city", "type": "string"},
                {"name": "state", "type": "string"},
                {"name": "postalCode", "type": "int"},
                {"name": "coordinates", "type": "record",
                "fields" : [
                    {"name": "lat", "type": "float"},
                    {"name": "lng", "type": "float"}
                    ]}
                    ]
        },
        {
            "name": "firstName",
            "type": "string"
        },
        {
            "name": "followRequestId",
            "type": "array",
            "items": "string"
        },
        {
            "name": "followers",
            "type": "array",
            "items": "string"
        },
        {
            "name": "fullName",
            "type": "string"
        },
        {
            "name": "gender",
            "type": "string"
        },
        {
            "name": "interests",
            "type": "array",
            "items": "string"
        },
        {
            "name": "lastActive",
            "type": "long"
        },
        {
            "name": "lastName",
            "type": "string"
        },
        {
            "name": "network",
            "type": "list"
        },
        {
            "name": "paymentDetails",
            "type": "int"
        },
        {
            "name": "personalOccasions",
            "type": "array",
            "items": "string"
        },
        {
            "name": "productLikeDislike",
            "type": "array",
            "items": "string"
        },
        {
            "name": "registrationDate",
            "type": "long"
        },
        {
            "name": "rewardId",
            "type": "string"
        },
        {
            "name": "wishList",
            "type": "array",
            "items": "string"
        }
    ],
    "version": "1.0"
}

Solution

  • You are not writing the record field correctly.

    Example:

    ...
    {
    "name":"address1",
    "type":{
         "type":"record",
         "name": "address",
         "fields": [...]
      }
    }
    ...
    

    so when you write a record as a field, first you have the field name and then the "generic" record (which is similar to how a class is: address1 is the variable and address is the class).