Search code examples
arraysregexif-statementgoogle-sheetsgoogle-sheets-formula

Google Sheets regexextract city names from string


The media team is running some ad campaigns that I need to report on. The reports need to be based on either cities or city groups. A small example of the campaign names are:

Brand-Platform-Safety-YT-DV360-Female-FTP-Non-Skippable-10Sec-Next8-Top10HHI-Android-6th jan'22
Brand-Disinfection-YT-DV360-Male-FTP-Non-Skippable-20Sec-Bengaluru-Top10HHI-Android-6th jan'22
Brand-Disinfection-YT-DV360-Female-FTP-Non-Skippable-6Sec-Bengaluru-Top10HHI-Android-6th jan'22
Brand-Disinfection-Display-DV360-Male-RTP-Display-Top10City-Top10HHI-Android-6th jan'22
Brand-Platform-Safety-YT-DV360-Female-FTP-Non-Skippable-10Sec-Next12-Top10HHI-Android-6th jan'22
Brand-Disinfection-YT-DV360-Female-FTP-Non-Skippable-6Sec-Kolkata-Top10HHI-Android-6th jan'22
Brand-Disinfection-YT-DV360-Male-FTP-Non-Skippable-20Sec-Kolkata-Top10HHI-Android-6th jan'22

I need to extract city names from these campaign names. They’ve setup more than a 100 campaigns based on different criteria so I’ll be difficult to do manually. Assuming that the campaign names are stored in column A in my spreadsheet, I can almost get the entire result with this formula:

=ARRAYFORMULA(IF(ROW(A:A)=1,"City",
IF(ISBLANK(A:A),,
REGEXEXTRACT(A:A,"Sec-(.+)-Top"))))

There are two issues with the formula. It currently relies on a string starting with “Sec-“ and ending with “-Top” and extracts the data between them.

  • The first issue is that sometimes the string will start with “Display-” instead of “Sec-“. How do I make the formula adapt to that
  • The second issue is that sometimes instead of cities like “Delhi” / “Kolkata”, the media team has used city groups like “Next8” / “Top10City”. Since the word “-Top” is used, the regexexract function just picks the entire string. How do I prevent this?

Any help is greatly appreciated!


Solution

  • try:

    =ARRAYFORMULA({"City";
     IF(ISBLANK(A2:A),,REGEXREPLACE(
     IFNA(REGEXEXTRACT(A2:A, "Sec-(.+)-Top"), 
          REGEXEXTRACT(A2:A, "Display-(.+)-Top")), ".*Display-", ))})
    

    enter image description here