Split text using any suggested method

I have a plain text like this:

Cart ID: A3N42M / Copy: A3N42P PO: 5000021337 Invoice: 3110021337
Cart ID: A3N3ZW / Copy: A3N3ZX/ PO: 5000021335 Invoice: 3110021335 
Cart ID: A3N3ZL / Copy: A3N3ZM PO: 5000021336 Invoice: 3110021336 
Original: A3N444 / Copy: A3N445 PO: 5000021340 Invoice: 3130021340
Original: A3N44C / Copy: A3N44D PO: 5000021341 Invoice: 3130021341
Original: A3N44G / Copy: A3N44H PO: 5000021342 Invoice: 3130021342
Cart ID: A3N3ZZ / Copy: A3N428 PO: A3N3ZZ01
Cart ID: A3N3ZQ / Copy: A3N3ZV PO: A3N3ZQ01
Cart ID: A3N336 / Copy: A3N337 PO: A3N33601
E3M49D / E3M49Q - PO: 4620028049
E3M49N / E3M49X
E3M49P / E3M49Y

And I need to split the cart IDs from that text, and end up in something like this:

A3N42M
A3N42P
A3N3ZW
...
E3M49N
...

These IDs are always 6-digit length, and always starts with A, E or P (AXXXXX, EXXXXX, PXXXXX, etc...).

Is there any way (using any script or any program language) that I can achieve this?

This plain text is currently on a spreadsheet table, and I need to separate these IDs to use in a SQL query for later, thanks!

Solution

Regex Data-Munging

You can perform this via regex substitutions (or find and replace). I happened to do this in python, but you could use the same regex patterns ('^[^/]*?: ', '/.*') in any language or text editor that supports them.

Regex Explanation

'^[^/]*?: ' - start and beginning of string (first ^), match multiple non-\ characters in a non greedy way ([^/]*?]), a :, then a .
'/.*' - match all /, then multiple any character (.)

Processing Example (in Python)

import re

text = '''
Cart ID: A3N42M / Copy: A3N42P PO: 5000021337 Invoice: 3110021337
Cart ID: A3N3ZW / Copy: A3N3ZX/ PO: 5000021335 Invoice: 3110021335 
Cart ID: A3N3ZL / Copy: A3N3ZM PO: 5000021336 Invoice: 3110021336 
Original: A3N444 / Copy: A3N445 PO: 5000021340 Invoice: 3130021340
Original: A3N44C / Copy: A3N44D PO: 5000021341 Invoice: 3130021341
Original: A3N44G / Copy: A3N44H PO: 5000021342 Invoice: 3130021342
Cart ID: A3N3ZZ / Copy: A3N428 PO: A3N3ZZ01
Cart ID: A3N3ZQ / Copy: A3N3ZV PO: A3N3ZQ01
Cart ID: A3N336 / Copy: A3N337 PO: A3N33601
E3M49D / E3M49Q - PO: 4620028049
E3M49N / E3M49X
E3M49P / E3M49Y
'''

text = re.sub('^[^/]*?: ([]*?)', '', text, flags=re.MULTILINE)
text = re.sub('/.*', '', text)

print text

A3N42M 
A3N3ZW 
A3N3ZL 
A3N444 
A3N44C 
A3N44G 
A3N3ZZ 
A3N3ZQ 
A3N336 
E3M49D 
E3M49N 
E3M49P

EDIT

Updated Regex Explanation

Updated regex per author's request.

match any character (.*?) (non-greedy), followed by a capture group of ((...)) of a character class ([AEP]) followed by at least one character class ([0-9]+) followed by four word characters (\w{4}), followed by another capture group of any character ((.*))
All of this is matched and replaced with the capture group variables with a newline in between (\1\n\2), essentially splitting rows where IDs occur twice
Repeat steps with only the first capture group to handle the newlines containing the second ID

Updated Processing

text = re.sub(r'.*?([AEP][0-9]+\w{4})(.*)', r'\1\n\2', text, flags=re.MULTILINE)
text = re.sub(r'.*?([AEP][0-9]+\w{4}).*', r'\1', text, flags=re.MULTILINE)

print text

A3N42M
A3N42P
A3N3ZW
A3N3ZX
A3N3ZL
A3N3ZM
A3N444
A3N445
A3N44C
A3N44D
A3N44G
A3N44H
A3N3ZZ
A3N428
A3N3ZQ
A3N3ZV
A3N336
A3N337
E3M49D
E3M49Q
E3M49N
E3M49X
E3M49P
E3M49Y