Search code examples
curlpostmanschemaapache-pulsar

Curl command failing with uploading particular Avro Schema that uploads fine in Postman


In pulsar, I've been writing some simple BASH scripts to create and upload topics to my namespace using curl commands:

function create_partioned_topic {
    echo -e "\n+++ Creating partioned topic: $TOPIC +++"
    curl --location --request PUT "https://$PULSAR_HOST:$HOST_PULSAR_PORT/admin/v2/persistent/$TENANT/$NAMESPACE/$TOPIC/partitions" \
        --verbose \
        --header "Authorization: Bearer $AUTHORIZATION"  \
        --insecure \
        --header 'Content-Type: application/json' \
        --data-raw '3' 2>&1 | cat | grep "HTTP" # grep -v "Authorization"
}

function uploading_topic {
    echo -e "\n\n+++ Uploading $TOPIC +++"
    curl --location --request POST "https://$PULSAR_HOST:$HOST_PULSAR_PORT/admin/v2/schemas/$TENANT/$NAMESPACE/$TOPIC/schema" \
        --verbose \
        --header "Authorization: Bearer $AUTHORIZATION"  \
        --insecure \
        --header 'Content-Type: application/json' \
        --data-raw $SCHEMA 2>&1 | cat | grep "HTTP" # grep -v "Authorization"
}

These commands work fine except for the following I'm trying to upload:

TOPIC=nested_schema
NESTED_SCHEMA='{"schema":"{\"type\":\"record\",\"name\":\"DatasetEventSchema\",\"namespace\":\"channels.$TENANT.$NAMESPACE.$TOPIC.generated\",\"fields\":[{\"name\":\"DataRecord\",\"type\":{\"name\":\"DataRecordSchema\",\"type\":\"record\",\"fields\":[{\"name\":\"id\",\"type\":\"string\"},{\"name\":\"name\",\"type\":\"string\"},{\"name\":\"description\",\"type\":\"string\"},{\"name\":\"created_by\",\"type\":\"string\"},{\"name\":\"updated_by\",\"type\":\"string\"},{\"name\":\"created_at\",\"type\":\"string\"},{\"name\":\"updated_at\",\"type\":\"string\"},{\"name\":\"data_legitimacy\",\"type\":\"string\"},{\"name\":\"item_status\",\"type\":\"string\"},{\"name\":\"tenant_id\",\"type\":\"string\"},{\"name\":\"tags\",\"type\":\"string\"},{\"name\":\"dataset_series_id\",\"type\":\"string\"},{\"name\":\"location_id\",\"type\":\"string\"},{\"name\":\"provided_by\",\"type\":\"string\"},{\"name\":\"metadata\",\"type\":\"string\"}]}},{\"name\":\"MetadataRecord\",\"type\":{\"name\":\"MetadataRecordSchema\",\"type\":\"record\",\"fields\":[{\"name\":\"timestamp\",\"type\":\"string\"},{\"name\":\"record_type\",\"type\":\"string\"},{\"name\":\"operation\",\"type\":\"string\"},{\"name\":\"partition_key_type\",\"type\":\"string\"},{\"name\":\"schema_name\",\"type\":\"string\"},{\"name\":\"table_name\",\"type\":\"string\"},{\"name\":\"transaction_id\",\"type\":\"string\"}]}}]}","type":"AVRO","properties":{}}'

When I'm uploading this Schema via a curl command, it seems to never complete. I get the notification that it's beginning to post, but it never finishes.

Other schemas I upload work just fine. Two examples are:

TOPIC=double_nested_schema
DOUBLE_NESTED_SCHEMA='{"schema":"{\"type\":\"record\",\"name\":\"DoubleNestedSchema\",\"namespace\":\"channels.$TENANT.$NAMESPACE.$TOPIC.generated\",\"fields\":[{\"name\":\"DataRecord\",\"type\":{\"name\":\"DataRecordSchema\",\"type\":\"record\",\"fields\":[{\"name\":\"name\",\"type\":\"string\"},{\"name\":\"IdRecord\",\"type\":{\"name\":\"IdRecordSchema\",\"type\":\"record\",\"fields\":[{\"name\":\"id\",\"type\":\"string\"}]}}]}},{\"name\":\"MetadataRecord\",\"type\":{\"name\":\"MetadataRecordSchema\",\"type\":\"record\",\"fields\":[{\"name\":\"timestamp\",\"type\":\"string\"}]}}]}","type":"AVRO","properties":{}}'

TOPIC=simplest_schema
SIMPLEST_SCHEMA='{"schema":"{\"type\":\"record\",\"name\":\"SimplestExample\",\"namespace\":\"channels.$TENANT.$NAMESPACE.$TOPIC.generated\",\"fields\":[{\"name\":\"name\",\"type\":\"string\"}]}","type":"AVRO","properties":{}}'

Things I've tried

  • As I mentioned above, I've tried the two different schemas SIMPLEST_SCHEMA and DOUBLE_NESTED_SCHEMA and they work fine
  • I've also tested running the equivalent of uploading my NESTED_SCHEMA in Postman and it works fine. Does this indicate there's something about curl that doesn't like my formatting?

Solution

  • I tried a number of solutions, but I kept finding that this worked fine in Postman but not as curl command.

    When I turned on the --verbose header I saw that I was getting hung up at about this point:

    > Expect: 100-continue
    > 
    * Done waiting for 100-continue
    

    Evidently this behavior is done when:

    • the request is a PUT or
    • the request is a POST and the data size is larger than 1024 bytes

    You can turn off this behavior by setting the Except header to an empty string.

    like so:

    curl -H 'Expect:'

    This version of the curl command ended up working for me:

        curl --location --request POST "https://$PULSAR_HOST:$HOST_PULSAR_PORT/admin/v2/schemas/$TENANT/$NAMESPACE/$TOPIC/schema" \
            --verbose \
            --header 'Expect:' \
            --header "Authorization: Bearer $AUTHORIZATION"  \
            --insecure \
            --header "Content-Type: $content_type" \
            --data-raw $SCHEMA 2>&1 | grep "HTTP" # grep -v "Authorization"
    

    See references: