Search code examples
postgresqlregexp-replace

Trouble using regexp_replace command


Current command being used:

regexp_replace(b.DOWNCASE_NAME, '[^\w\s]', '', 'g') as CONDITION_NAME_DC

That will change "ovarian sertoli-leydig cell tumor" to "ovarian sertolileydig cell tumor". It will also change "hereditary breast/ovarian cancer (brca1, brca2)" to "hereditary breast ovarian cancer brca1 brca2".

I do not understand why it is removing the dash in the "sertoli-leydig" and putting it into one word instead of keeping a space in between. In the regexp_replace, if I put a space in the ' ' before the , 'g' then it also puts a double space in "hereditary breast ovarian cancer brca1 brca2"

I don't need the double space, I just want one space in there. Any ideas what to do here?


Solution

  • I do not understand why it is removing the dash in the "sertoli-leydig" and putting it into one word instead of keeping a space in between.

    Because that is what you told it to do. Why would it put a space in between when you told it not to?

    Replace punctuation with a space, but capture any spaces already around the punctuation so they get replaced as well with that single space.

    regexp_replace(b.DOWNCASE_NAME, ' *[^\w\s] *', ' ', 'g');