Search code examples
pythonpython-3.xpython-re

How to extract all text between certain characters with Python re


I'm trying to extract all text between certain characters but my current code simply returns an empty list. Each row has a long text string that looks like this:

"[{'index': 0, 'spent_transaction_hash': '4b3e9741022d4', 'spent_output_index': 68, 'script_asm': '3045022100e9e2280f5e6d965ced44', 'value': Decimal('381094.000000000')}\n {'index': 1, 'spent_transaction_hash': '0cfbd8591a3423', 'spent_output_index': 2, 'script_asm': '3045022100a', 'value': Decimal('3790496.000000000')}]"

I just need the values for "spent_transaction_hash". For example, I'd like to create a new column that has a list of ['4b3e9741022d4', '0cfbd8591a3423']. I'm trying to extract the values between 'spent_transaction_hash': and the comma. Here's my current code:

my_list = []

for row in df['column']:
    value = re.findall(r'''spent_transaction_hash'\: \(\[\'(.*?)\'\]''', row)
    my_list.append(value)

This code simply returns a blank list. Could anyone please tell me which part of my code is wrong?


Solution

  • Is is what you're looking for? 'spent_transaction_hash'\: '([a-z0-9]+)'

    Test: https://regex101.com/r/cnviyS/1