Search code examples
pythonamazon-web-servicesboto3iot

AWS MQTT Python awsiotsdk and awscrt packages fail sporadically with repeating messages in trace logs


I'm following reasonably close to the pub/sub example in the awsiotsdk package. My basic unittests sometimes succeed and sometimes fail.

I enabled trace logging with awscrt.io.init_logging(6, 'stderr')

When I look through the trace logs they look relatively similar up to this line:

[TRACE] [2021-06-16T23:56:36Z] [00007f5fcfbbc640] [mqtt-topic-tree] - tree=0x5565ac073898: Publishing on topic test/unittest

After that point a run that failed starts producing the following output in a loop until my unittest times out after 30 seconds:

[DEBUG] [2021-06-16T23:52:26Z] [00007fd7ab15b640] [dns] - static: resolving host #############-ats.iot.us-west-2.amazonaws.com
[DEBUG] [2021-06-16T23:52:26Z] [00007fd7ab15b640] [dns] - static: resolved record: 34.213.168.191
[DEBUG] [2021-06-16T23:52:26Z] [00007fd7ab15b640] [dns] - static: resolved record: 52.35.137.184
[DEBUG] [2021-06-16T23:52:26Z] [00007fd7ab15b640] [dns] - static: resolved record: 54.148.88.142
[DEBUG] [2021-06-16T23:52:26Z] [00007fd7ab15b640] [dns] - static: resolved record: 34.214.249.117
[DEBUG] [2021-06-16T23:52:26Z] [00007fd7ab15b640] [dns] - static: resolved record: 54.186.43.162
[DEBUG] [2021-06-16T23:52:26Z] [00007fd7ab15b640] [dns] - static: resolved record: 52.38.221.226
[DEBUG] [2021-06-16T23:52:26Z] [00007fd7ab15b640] [dns] - static: resolved record: 54.244.33.231
[DEBUG] [2021-06-16T23:52:26Z] [00007fd7ab15b640] [dns] - static: resolved record: 34.218.88.107
[DEBUG] [2021-06-16T23:52:26Z] [00007fd7ab15b640] [dns] - static: resolved record: 2620:108:700f::23a6:36ae
[DEBUG] [2021-06-16T23:52:26Z] [00007fd7ab15b640] [dns] - static: resolved record: 2620:108:700f::3424:419e
[DEBUG] [2021-06-16T23:52:26Z] [00007fd7ab15b640] [dns] - static: resolved record: 2620:108:700f::22d2:b01f
[DEBUG] [2021-06-16T23:52:26Z] [00007fd7ab15b640] [dns] - static: resolved record: 2620:108:700f::22d7:fa17
[DEBUG] [2021-06-16T23:52:26Z] [00007fd7ab15b640] [dns] - static: resolved record: 2620:108:700f::6415:4c4c
[DEBUG] [2021-06-16T23:52:26Z] [00007fd7ab15b640] [dns] - static: resolved record: 2620:108:700f::2351:30a6
[DEBUG] [2021-06-16T23:52:26Z] [00007fd7ab15b640] [dns] - static: resolved record: 2620:108:700f::3421:1d4e
[DEBUG] [2021-06-16T23:52:26Z] [00007fd7ab15b640] [dns] - static: resolved record: 2620:108:700f::342b:929
[DEBUG] [2021-06-16T23:52:26Z] [00007fd7ab15b640] [dns] - static, resolving host #############-ats.iot.us-west-2.amazonaws.com successful, returned 16 addresses
[TRACE] [2021-06-16T23:52:26Z] [00007fd7ab15b640] [dns] - static: updating expiry for 34.213.168.191 for host #############-ats.iot.us-west-2.amazonaws.com to 1623887576445112777
[TRACE] [2021-06-16T23:52:26Z] [00007fd7ab15b640] [dns] - static: updating expiry for 52.35.137.184 for host #############-ats.iot.us-west-2.amazonaws.com to 1623887576445112777
[TRACE] [2021-06-16T23:52:26Z] [00007fd7ab15b640] [dns] - static: updating expiry for 54.148.88.142 for host #############-ats.iot.us-west-2.amazonaws.com to 1623887576445112777
[TRACE] [2021-06-16T23:52:26Z] [00007fd7ab15b640] [dns] - static: updating expiry for 34.214.249.117 for host #############-ats.iot.us-west-2.amazonaws.com to 1623887576445112777
[TRACE] [2021-06-16T23:52:26Z] [00007fd7ab15b640] [dns] - static: updating expiry for 54.186.43.162 for host #############-ats.iot.us-west-2.amazonaws.com to 1623887576445112777
[TRACE] [2021-06-16T23:52:26Z] [00007fd7ab15b640] [dns] - static: updating expiry for 52.38.221.226 for host #############-ats.iot.us-west-2.amazonaws.com to 1623887576445112777
[TRACE] [2021-06-16T23:52:26Z] [00007fd7ab15b640] [dns] - static: updating expiry for 54.244.33.231 for host #############-ats.iot.us-west-2.amazonaws.com to 1623887576445112777
[TRACE] [2021-06-16T23:52:26Z] [00007fd7ab15b640] [dns] - static: updating expiry for 34.218.88.107 for host #############-ats.iot.us-west-2.amazonaws.com to 1623887576445112777
[TRACE] [2021-06-16T23:52:26Z] [00007fd7ab15b640] [dns] - static: updating expiry for 2620:108:700f::23a6:36ae for host #############-ats.iot.us-west-2.amazonaws.com to 1623887576445112777
[TRACE] [2021-06-16T23:52:26Z] [00007fd7ab15b640] [dns] - static: updating expiry for 2620:108:700f::3424:419e for host #############-ats.iot.us-west-2.amazonaws.com to 1623887576445112777
[TRACE] [2021-06-16T23:52:26Z] [00007fd7ab15b640] [dns] - static: updating expiry for 2620:108:700f::22d2:b01f for host #############-ats.iot.us-west-2.amazonaws.com to 1623887576445112777
[TRACE] [2021-06-16T23:52:26Z] [00007fd7ab15b640] [dns] - static: updating expiry for 2620:108:700f::22d7:fa17 for host #############-ats.iot.us-west-2.amazonaws.com to 1623887576445112777
[TRACE] [2021-06-16T23:52:26Z] [00007fd7ab15b640] [dns] - static: updating expiry for 2620:108:700f::6415:4c4c for host #############-ats.iot.us-west-2.amazonaws.com to 1623887576445112777
[TRACE] [2021-06-16T23:52:26Z] [00007fd7ab15b640] [dns] - static: updating expiry for 2620:108:700f::2351:30a6 for host #############-ats.iot.us-west-2.amazonaws.com to 1623887576445112777
[TRACE] [2021-06-16T23:52:26Z] [00007fd7ab15b640] [dns] - static: updating expiry for 2620:108:700f::3421:1d4e for host #############-ats.iot.us-west-2.amazonaws.com to 1623887576445112777
[TRACE] [2021-06-16T23:52:26Z] [00007fd7ab15b640] [dns] - static: updating expiry for 2620:108:700f::342b:929 for host #############-ats.iot.us-west-2.amazonaws.com to 1623887576445112777
[TRACE] [2021-06-16T23:52:26Z] [00007fd7ab15b640] [dns] - static: remaining record count for host 7
[TRACE] [2021-06-16T23:52:26Z] [00007fd7ab15b640] [dns] - static: remaining record count for host 8

A run that succeeded continues on from the Publishing on topic test/unittest message with:

[TRACE] [2021-06-16T23:56:36Z] [00007f5fcfbbc640] [mqtt-client] - id=0x5565ac073640: publish received with msg id=1 dup=0 qos=1 retain=0 payload-size=16 topic=test/unittest
[TRACE] [2021-06-16T23:56:36Z] [00007f5fcfbbc640] [mqtt-client] - id=0x5565ac073640: received publish QOS is 1, sending puback
<<<output continues until normal termination>>>

Versions:

  boto3==1.17.96
  awscrt==0.11.22
  awsiotsdk==1.6.0

I created my mqtt connection using:

mqtt_connection_builder.websockets_with_default_aws_signing(
                    endpoint='#############-ats.iot.us-west-2.amazonaws.com',
                    client_bootstrap=client_bootstrap,
                    region=AWS_REGION,
                    credentials_provider=credentials_provider,
                    ca_filepath=f'{self.certs_temp_dir.name}/AmazonRootCA1.pem',
                    on_connection_interrupted=self._on_connection_interrupted,
                    on_connection_resumed=self._on_connection_resumed,
                    client_id=f'test-{uuid.uuid4()}',
                    clean_session=False,
                    keep_alive_secs=6
                )

Solution

  • I don't there is enough info here to identify the issue for certain. However, my concern is that it is unittest that is causing the unexpected behaviour. This is a fairly complex integration test, and you may not be releasing resources correctly or on time, which causes AWS to be unhappy.

    The most obvious culprit I'd be concerned about is having the same client_id for each test. I'd try randomising that for each test.