Search code examples
sqlitereplace

Is there a way to replace only specific occurrences of keywords in a field value without replacing all occurrences?


Given a table 'updates' that contains records comprising two fields, both containing text only:

existing_value and replacement_value

This table is used to update a 2nd table 'main' with a single field:

name

whereby the name field is to be updated with a replacement_value, replacing all instances of existing_value with replacement_value. However, what's needed is not a simple update of the 'main' table replacing on the basis of main.name = updates.repleacement_value where main.name = updates.existing_value.

main.name, updates.existing_value and updates.replacement_value all contain strings that take the form of a simple value such as Joe or they can (and mostly do) contain strings that include delimiters to separate multiple values held in a single field e.g. Joe Soap\\Joe Bloggs\\Joe (yes I know field stuffing is a terrible idea, but the source data is what it is).

For the purposes of clarification let's assume an example where the following applies:

updates.existing_value = 'Joe'
updates.replacement = 'Jose'

Main has two records:

main.name = 'Joe Soap\\Joe Bloggs\\Joe'
main.name = 'Joe\\Joe Soap\\Joe\\Joe Little'

The problem I need to solve is to replace only the standalone 'Joe' in each record to 'Jose' without inadvertently affecting 'Joe Soap', 'Joe Bloggs' or 'Joe Little'.

One way to solve the problem is to load each record in 'main' into Python, turn each record into a list, process the list and recompile the string before writing it back to 'main' (a dataframe would be most efficient), but before going this route, I was wondering whether there's a way to achieve the same directly in SQLite?

I've experimented with LIKE, INSTR() and REPLACE(), but there doesn't seem to be a way to curtail REPLACE()'s scope in that it'll replace all instances of 'Joe' with 'Jose'.

Is there some other SQL method I've not considered or is the problem best tackled outside or SQLite?


Solution

  • You can delimit the search term:

    update main
    set name =
        trim(
            replace(
                replace(
                    replace(
                        '\'||name||'\',
                        '\',
                        '\\'
                    ),
                    '\'||map.old||'\',
                    '\'||map.new||'\'
                ),
                '\\',
                '\'
            ),
            '\'
        )
    from (
        select distinct
            main.rowid as id,
            updates.existing_value as old,
            updates.replacement as new
        from main inner join updates
        where instr(
            '\'||main.name||'\',
            '\'||updates.existing_value||'\'
        )
    ) as map
    where main.rowid = map.id;
    
    1. locate the rows to update
    2. prepend and append delimiter so replace can be on \Joe\
    3. double the delimiters to allow consecutive matches (eg. Joe\Joe\Joe)
    4. do the replacement for each existing_value
    5. undo the delimiter doubling from (3)
    6. trim the extra delimiters added in (2)

    Note: If multiple updates apply to the same row, the result is nondeterministic.