Search code examples
pythonregexlookbehind

Variable length Lookbehinds RegEx don't works in Python


I want extract the text between ':' and '|' characters, but in second and third data there are a space after the ':'

The intput:

Referencia:22726| Referencia Cliente Ak: 233726 | Referencia histórica: 256726 | Suelo | AGOLADA (Pontevedra) -  CARPAZO O PE#A LONJA [EXTRACT]
Referencia:39766| Referencia Cliente Ak: 39767 | Referencia histórica: 39768 | Garaje | MOJACAR (Almería) -  URB.VILLA MIRADOR DEL MAR - MOD. # [EXTRACT]
Referencia:397A5| Referencia Cliente Ak: 397B5 | Referencia histórica: 397C5 | Garaje | MOJACAR (Almería) -  VILLA MIRADOR DEL MAR-MODULO #-PLAZA 4 [EXTRACT]
Referencia:AA39803| Referencia Cliente Ak: P_39803 | Referencia histórica: 200_39803 | Garaje | MOJACAR (Almería) -  VILLA MIRADOR DEL MAR - MODULO [EXTRACT]

Output desired:

22776
233726
256726
39766
39767
39768
397A5
397B5
397C5
AA39803
P_39803
200_39803

My first pattern: (?<=:)(\w{5,12}) This matches only the first column.

My second pattern: (?<=:\s)(\w{5,12}) This matches the second and third columns

So I believed that my third pattern was the correct one: (?<=:\s?)(\w{5,12}) That pattern don't works.


Solution

  • a lookbehind can't be variable length in python. A way to solve this:

    (?:(?<=:\s)|(?<=:))(\w{5,12})
    

    But since you use a capturing group, a lookbehind is useless:

    :\s?(\w{5,12})