Apache Geode debug Unknown pdx type=2140705

If I start a GFSH client and connect to Geode. There is a lot of data in myRegion and to check through it then I run:

query --query="select * from /myRegion"

I am getting the response:

Result     : false
startCount : 0
endCount   : 20
Message    : Unknown pdx type=2140705

How does one troubleshoot / debug this problem?

UPDATE: The error in the Geode server log is:

[info 2018/07/04 10:53:07.275 BST IsGeode <Function Execution Processor1> tid=0x48] Exception occurred:
java.lang.IllegalStateException: Unknown pdx type=1318971
  at org.apache.geode.internal.InternalDataSerializer.readPdxSerializable(InternalDataSerializer.java:3042)
  at org.apache.geode.internal.InternalDataSerializer.basicReadObject(InternalDataSerializer.java:2859)
  at org.apache.geode.DataSerializer.readObject(DataSerializer.java:2961)
  at org.apache.geode.internal.util.BlobHelper.deserializeBlob(BlobHelper.java:90)
  at org.apache.geode.internal.cache.EntryEventImpl.deserialize(EntryEventImpl.java:1911)
  at org.apache.geode.internal.cache.EntryEventImpl.deserialize(EntryEventImpl.java:1904)
  at org.apache.geode.internal.cache.PreferBytesCachedDeserializable.getDeserializedValue(PreferBytesCachedDeserializable.java:73)
  at org.apache.geode.internal.cache.LocalRegion.getDeserialized(LocalRegion.java:1269)
  at org.apache.geode.internal.cache.LocalRegion$NonTXEntry.getValue(LocalRegion.java:8771)
  at org.apache.geode.internal.cache.EntriesSet$EntriesIterator.moveNext(EntriesSet.java:179)
  at org.apache.geode.internal.cache.EntriesSet$EntriesIterator.next(EntriesSet.java:134)
  at org.apache.geode.cache.query.internal.CompiledSelect.doNestedIterations(CompiledSelect.java:837)
  at org.apache.geode.cache.query.internal.CompiledSelect.doIterationEvaluate(CompiledSelect.java:699)
  at org.apache.geode.cache.query.internal.CompiledSelect.evaluate(CompiledSelect.java:423)
  at org.apache.geode.cache.query.internal.CompiledSelect.evaluate(CompiledSelect.java:53)
  at org.apache.geode.cache.query.internal.DefaultQuery.executeUsingContext(DefaultQuery.java:558)
  at org.apache.geode.cache.query.internal.DefaultQuery.execute(DefaultQuery.java:385)
  at org.apache.geode.cache.query.internal.DefaultQuery.execute(DefaultQuery.java:319)
  at org.apache.geode.management.internal.cli.functions.DataCommandFunction.select(DataCommandFunction.java:247)
  at org.apache.geode.management.internal.cli.functions.DataCommandFunction.select(DataCommandFunction.java:202)
  at org.apache.geode.management.internal.cli.functions.DataCommandFunction.execute(DataCommandFunction.java:147)
  at org.apache.geode.internal.cache.MemberFunctionStreamingMessage.process(MemberFunctionStreamingMessage.java:185)
  at org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:374)
  at org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:440)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at org.apache.geode.distributed.internal.DistributionManager.runUntilShutdown(DistributionManager.java:662)
  at org.apache.geode.distributed.internal.DistributionManager$9$1.run(DistributionManager.java:1108)
  at java.lang.Thread.run(Thread.java:748)

Solution

You can tell the immediate cause from the stack trace.

A PDX serialized stream contains a type id which is a reference into a repository of type metadata maintained by a GemFire cluster. In this case, the serialized data of the object contained a typeId that is not in the cluster's metadata repository.

So the question becomes, "what serialized that object and why did it use an invalid type id ?"

The only way I've seen this happen before is when a cluster is fully restarted and the pdx metadata goes away, either because it was not persistent or because it was deleted (by clearing out the locator working directory for example).

GemFire clients cache the mapping between a type and it's type ID. This allows them to quickly serialize objects without continually looking up the type id from the server. Client connections can persist across cluster restarts. When a client reconnects it does not flush the cached information and continues to write objects using its cached type ID.

So the combination of a pdx-metadata losing cluster restart and a client that is not restarted (e.g. an app. server) is the only way I have seen this happen before. Does this match your scenario ?

If so, one of the best ways to avoid this is to persist your pdx metadata and never delete it.