Search code examples
pythonregexautocad

Regex to search asset tags on AutoCAD drawings


I have refrained from posting a question here because I wanted to do my bit to find myself a solution to this issue. But unfortunately, after entire nights of searching and reading different Regex articles and docs I couldn't find an answer.

I have written a script that spills out the comment from PDF that have been converted from AutoCAD. Problem is, which drawing has a different pattern of the asset tag information.

Our convention is 99XX9999 (two-numbers, two-letters, four-numbers).

Some drawings preserve that pattern, others don't. We can find things like 99XX(space)9999 OR 99(space)XX(space)9999, etc.

That part of the problem I resolved it, but there's another variant that I can't wrap my head around as per below:

'''

{'/Border': [0, 0, 0], '/Contents': '0811', '/F': 64, '/NM': 'b4499c47-d2d2-4c03-b13c-ec3b7b332ec3', '/P': IndirectObject(52, 0), '/Rect': [714, 304, 698, 314], '/Subtype': '/Square', '/T': 'AutoCAD SHX Text'}-------------------------------------------------------------------------------- {'/Border': [0, 0, 0], '/Contents': '29', '/F': 64, '/NM': 'b518c663-42eb-4861-a717-a00118d61fd2', '/P': IndirectObject(52, 0), '/Rect': [206, 369, 195, 378], '/Subtype': '/Square', '/T': 'AutoCAD SHX Text'}-------------------------------------------------------------------------------- {'/Border': [0, 0, 0], '/Contents': 'HV', '/F': 64, '/NM': '1db0a6b1-6aee-4a2d-bacf-2a996680cdcb', '/P': IndirectObject(52, 0), '/Rect': [212, 369, 201, 378], '/Subtype': '/Square', '/T': 'AutoCAD SHX Text'}-------------------------------------------------------------------------------- {'/Border': [0, 0, 0], '/Contents': '0832', '/F': 64, '/NM': '0250033f-5bc0-4d46-879a-a1ae0147352d', '/P': IndirectObject(52, 0), '/Rect': [212, 365, 195, 374], '/Subtype': '/Square', '/T': 'AutoCAD SHX Text'}-------------------------------------------------------------------------------- {'/Border': [0, 0, 0], '/Contents': '29', '/F': 64, '/NM': '7b372206-1b18-4c9f-813d-4d73ca52be40', '/P': IndirectObject(52, 0), '/Rect': [140, 392, 129, 401], '/Subtype': '/Square', '/T': 'AutoCAD SHX Text'}-------------------------------------------------------------------------------- {'/Border': [0, 0, 0], '/Contents': 'HV', '/F': 64, '/NM': 'bdc97ccd-ee7c-406a-a06c-1649a5f5f712', '/P': IndirectObject(52, 0), '/Rect': [146, 392, 135, 401], '/Subtype': '/Square', '/T': 'AutoCAD SHX Text'}-------------------------------------------------------------------------------- {'/Border': [0, 0, 0], '/Contents': '0824', '/F': 64, '/NM': '9f434537-57a3-4c40-bb8b-a0df6ea087aa', '/P': IndirectObject(52, 0), '/Rect': [146, 388, 129, 397], '/Subtype': '/Square', '/T': 'AutoCAD SHX Text'}-------------------------------------------------------------------------------- {'/Border': [0, 0, 0], '/Contents': '%%C25', '/F': 64, '/NM': 'a2ace5cb-21be-4df7-b541-ce87a8ee81bc', '/P': IndirectObject(52, 0), '/Rect': [145, 379, 132, 388], '/Subtype': '/Square', '/T': 'AutoCAD SHX Text'}-------------------------------------------------------------------------------- {'/Border': [0, 0, 0], '/Contents': 'APOUTO RD', '/F': 64, '/NM': '948a363f-989d-4d00-afa1-a8bc45ad9729', '/P': IndirectObject(52, 0), '/Rect': [1162, 355, 1136, 364], '/Subtype': '/Square', '/T': 'AutoCAD SHX Text'}-------------------------------------------------------------------------------- {'/Border': [0, 0, 0], '/Contents': 'HORTICULTURAL WATER', '/F': 64, '/NM': '5a6cedb6-cdf0-4784-817f-c17f1de58f5a', '/P': IndirectObject(52, 0), '/Rect': [1171, 350, 1126, 358], '/Subtype': '/Square', '/T': 'AutoCAD SHX Text'}-------------------------------------------------------------------------------- {'/Border': [0, 0, 0], '/Contents': '%%C101', '/F': 64, '/NM': '2ee5947a-9062-415f-a242-f4b7b2a6441c', '/P': IndirectObject(52, 0), '/Rect': [767, 428, 758, 443], '/Subtype': '/Square', '/T': 'AutoCAD SHX Text'}--------------------------------------------------------------------------------

'''

We can see '/Contents': '29'...then '/Contents': 'HV'...then '/Contents': '0832'...

I have tried multiple variations of this approach (?<='/Contents':\s')([0-9]{2})(?=') on regex101.com to no avail. I could only capture the first two digits.

In my logic, I should find a way to have multiple 'lookaround' but couldn't achieve that.

Eventually, the regex code should the above plus the ones I already go it ([A-Z]?[0-9]{2,3}\s?[A-Z]{2,3}\s?[0-9]{3,4}).

The line break happens after the three parts of the pattern happen, according to the regex101.com site (see picture).

Just to be clear, I am talking about 100s of drawings with 100s of lines like that each .

Please, any contribution is appreciated. If not regex itself, at least a direction I should take would be nice.

Regards, enter image description here


Solution

  • Try this one.

    s="{'/Border': [0, 0, 0], '/Contents': '0811', '/F': 64, '/NM': 'b4499c47-d2d2-4c03-b13c-ec3b7b332ec3', '/P': IndirectObject(52, 0), '/Rect': [714, 304, 698, 314], '/Subtype': '/Square', '/T': 'AutoCAD SHX Text'}-------------------------------------------------------------------------------- {'/Border': [0, 0, 0], '/Contents': '29', '/F': 64, '/NM': 'b518c663-42eb-4861-a717-a00118d61fd2', '/P': IndirectObject(52, 0), '/Rect': [206, 369, 195, 378], '/Subtype': '/Square', '/T': 'AutoCAD SHX Text'}-------------------------------------------------------------------------------- {'/Border': [0, 0, 0], '/Contents': 'HV', '/F': 64, '/NM': '1db0a6b1-6aee-4a2d-bacf-2a996680cdcb', '/P': IndirectObject(52, 0), '/Rect': [212, 369, 201, 378], '/Subtype': '/Square', '/T': 'AutoCAD SHX Text'}-------------------------------------------------------------------------------- {'/Border': [0, 0, 0], '/Contents': '0832', '/F': 64, '/NM': '0250033f-5bc0-4d46-879a-a1ae0147352d', '/P': IndirectObject(52, 0), '/Rect': [212, 365, 195, 374], '/Subtype': '/Square', '/T': 'AutoCAD SHX Text'}-------------------------------------------------------------------------------- {'/Border': [0, 0, 0], '/Contents': '29', '/F': 64, '/NM': '7b372206-1b18-4c9f-813d-4d73ca52be40', '/P': IndirectObject(52, 0), '/Rect': [140, 392, 129, 401], '/Subtype': '/Square', '/T': 'AutoCAD SHX Text'}-------------------------------------------------------------------------------- {'/Border': [0, 0, 0], '/Contents': 'HV', '/F': 64, '/NM': 'bdc97ccd-ee7c-406a-a06c-1649a5f5f712', '/P': IndirectObject(52, 0), '/Rect': [146, 392, 135, 401], '/Subtype': '/Square', '/T': 'AutoCAD SHX Text'}-------------------------------------------------------------------------------- {'/Border': [0, 0, 0], '/Contents': '0824', '/F': 64, '/NM': '9f434537-57a3-4c40-bb8b-a0df6ea087aa', '/P': IndirectObject(52, 0), '/Rect': [146, 388, 129, 397], '/Subtype': '/Square', '/T': 'AutoCAD SHX Text'}-------------------------------------------------------------------------------- {'/Border': [0, 0, 0], '/Contents': '%%C25', '/F': 64, '/NM': 'a2ace5cb-21be-4df7-b541-ce87a8ee81bc', '/P': IndirectObject(52, 0), '/Rect': [145, 379, 132, 388], '/Subtype': '/Square', '/T': 'AutoCAD SHX Text'}-------------------------------------------------------------------------------- {'/Border': [0, 0, 0], '/Contents': 'APOUTO RD', '/F': 64, '/NM': '948a363f-989d-4d00-afa1-a8bc45ad9729', '/P': IndirectObject(52, 0), '/Rect': [1162, 355, 1136, 364], '/Subtype': '/Square', '/T': 'AutoCAD SHX Text'}-------------------------------------------------------------------------------- {'/Border': [0, 0, 0], '/Contents': 'HORTICULTURAL WATER', '/F': 64, '/NM': '5a6cedb6-cdf0-4784-817f-c17f1de58f5a', '/P': IndirectObject(52, 0), '/Rect': [1171, 350, 1126, 358], '/Subtype': '/Square', '/T': 'AutoCAD SHX Text'}-------------------------------------------------------------------------------- {'/Border': [0, 0, 0], '/Contents': '%%C101', '/F': 64, '/NM': '2ee5947a-9062-415f-a242-f4b7b2a6441c', '/P': IndirectObject(52, 0), '/Rect': [767, 428, 758, 443], '/Subtype': '/Square', '/T': 'AutoCAD SHX Text'}--------------------------------------------------------------------------------}"
    
    
    A=B=C=''
    for i in s.split("/Contents': '"):
      e=i[ 0 : i.index(",")-1]
      if e.isdigit() and len(e) == 2 :
          A=e
      if e.isalpha() and len(e) == 2 :   
          B=e
      if e.isdigit() and len(e) == 4 :
          C=e
          print(A+B+C)
          A=B=C=''
    

    [Output]:

    0811
    29HV0832
    29HV0824