Search code examples
aeron

Aeron Archive mark recording as inactive


I want to replay an existing recording. I follow the steps outlined here: https://theaeronfiles.com/aeron-archive/archive-operations. It is explicitly told that: "the Recording must not be active, so it must have stopped and have a stopPosition". But what should I do if the archive was shut down disgracefully (e.g. via kill -9 or maybe the app crashed with seg fault)? Is there any way to mark this recording as not active to be able to extend it?


Solution

  • I made a quick check where I modified this test in a way that it doesn't remove archive folder before and after the start and I put System.exit(42) at the line 158 right before we close the recording. So, it should simulate exactly the scenario you described in the question.

    After first run & kill, if we may see files in the archive folder like

    >>> ls /tmp/archive/
    0-0.rec  archive.catalog  archive-mark.dat
    

    We may use CatalogTool with the folder and call with describe-all parameter. On my end I see something like

    [RecordingDescriptor](sbeTemplateId=22|sbeSchemaId=101|sbeSchemaVersion=10|sbeBlockLength=80):controlSessionId=0|correlationId=0|recordingId=0|startTimestamp=1723808606710|stopTimestamp=-1|startPosition=0|stopPosition=-1|initialTermId=-1855426378|segmentFileLength=134217728|termBufferLength=65536|mtuLength=1408|sessionId=1093862659|streamId=33|strippedChannel='aeron:udp?endpoint=localhost:3333|alias=named-log'|originalChannel='aeron:udp?endpoint=localhost:3333|term-length=65536|alias=named-log'|sourceIdentity='aeron:ipc'|VALID
    

    Note: recordingId=0 and stopPosition=-1 means that we were right and Archive didn't have time to close our recording.

    Now if we run the exactly the same test once again in the folder we will see

    >>> ls /tmp/archive/
    0-0.rec  1-0.rec  archive.catalog  archive-mark.dat
    

    (Note, there is the second .rec file) Let's describe it with the CatalogTool it once again:

    [RecordingDescriptor](sbeTemplateId=22|sbeSchemaId=101|sbeSchemaVersion=10|sbeBlockLength=80):controlSessionId=0|correlationId=0|recordingId=0|startTimestamp=1723808606710|stopTimestamp=1723809009756|startPosition=0|stopPosition=640|initialTermId=-1855426378|segmentFileLength=134217728|termBufferLength=65536|mtuLength=1408|sessionId=1093862659|streamId=33|strippedChannel='aeron:udp?endpoint=localhost:3333|alias=named-log'|originalChannel='aeron:udp?endpoint=localhost:3333|term-length=65536|alias=named-log'|sourceIdentity='aeron:ipc'|VALID
    [RecordingDescriptor](sbeTemplateId=22|sbeSchemaId=101|sbeSchemaVersion=10|sbeBlockLength=80):controlSessionId=0|correlationId=0|recordingId=1|startTimestamp=1723809009876|stopTimestamp=-1|startPosition=0|stopPosition=-1|initialTermId=-2091439482|segmentFileLength=134217728|termBufferLength=65536|mtuLength=1408|sessionId=-1793647428|streamId=33|strippedChannel='aeron:udp?endpoint=localhost:3333|alias=named-log'|originalChannel='aeron:udp?endpoint=localhost:3333|term-length=65536|alias=named-log'|sourceIdentity='aeron:ipc'|VALID
    

    Indeed, now we have 2 recordings, but the most important part here is that our first recording (recordingId=0), which used to be "live", now has
    stopPosition=640 and only recordingId=1 has stopPosition=-1

    To wrap it up, we can say that in order to "close" a recording, we just need to start recording again. When Aeron Archive notices that the previous connection (image) is gone, it will close the recording automatically. If there is another connection (image in the stream), it will create another recording.