Search code examples
excelexcel-formulachinese-localevba

Excel: Count all cells that have doubled (Chinese) characters within the cell


I have a list of a few thousand Chinese brand names, one brand name per cell, and I'm trying to count how many of those names use doubled characters, meaning two of the same Chinese characters one after another. For example, here's a list of 6 brand names (each one would be in its own cell):

  1. 水晶晶
  2. 衣二三
  3. 五五
  4. 淘宝
  5. 哈哈哇
  6. 拉啪拉

Numbers 1, 3 and 5 have doubled characters in them (晶晶 , 五五, 哈哈), so I would like a formula that returns the number “3”, because there are three cells that contain doubled characters. (Note that although #6 contains two of the same characters - 拉, twice - those characters are not next to each other, and therefore wouldn't return as true).

I presume the fact that the characters are Chinese doesn't really matter, the formula would be the same for counting doubled letters, like:

ABB
CC
FDDF

Characters are not separated by a space. I don't know VBA, but happy to learn how to input and run a script.


Solution

  • Not to discourage you from learning VBA (it's a great skill to have!) but I figured I'd whip up a VBA Function that you can start using with your data right away... Once we confirm that everything's working I'll give you some links to explain how this (simple) function works, along with some other good beginners' resources. :)

    I have never worked with Chinese characters but I did a bit of research. Western alphabets generally come from a set of 255 characters called ASCII. Eastern alphabets come from a set of 65533 characters called Unicode. ASCII characters each take 1 byte of storage space, whereas Unicode characters take up 2 bytes each.

    What does this mean to a text function like the one I put together? I'm not entirely sure! Apparently I am supposed to use different builtin formulae than I normally would, but my code wouldn't work with the alternate method, yet seems to work just fine the way I've always done things.

    This may be different with your "complete" data-set, combined with your computer's language settings compared to North America. Theoretically, it should work just fine, but I'd suggest that when you start using it, count a bunch of different cells manually so that you can compare the numbers that the function gives you, and let me know if there are discrepancies. (If you're list isn't "top secret", I'd kind of like to have a copy if you wouldn't mind; this is all foreign to me [pun intended] and I'd like to understand more about the differences data-wise.)

    A small group of Chinese characters are not part of the Unicode standard yet, but apparently they are rarely used anyways (usually for place names?) but you should keep an eye on that. Again, it shouldn't be an issue - in theory - because suspect that "if they aren't in Unicode, your computer won't have them either" (I think!) Here is a link to a list of characters in question.

    Anyhow, the only way to find out if it's working like it should is to try it out with your full set of data!


    How to copy the VBA functions into a module:

    1. Select the VBA Code below, and hit Ctrl+C to copy.

    2. Go to your Excel workbook, and hit Alt+F11 to open the VBA Editor (aka VBE).

    3. Click the Insert menu in the VBE, and choose Module.

    4. Hit Ctrl+V to paste the code.

    5. Click the Debug menu in the VBE, and choose **Compile Project". This checks the code for errors. Ideally "nothing" will happen, meaning it's error-free & good to go.

    6. Close the VBE window with the " " in the top-right corner.

    7. Save your workbook, and the new functions are ready to use! See the screenshot below for example usage.

    .

    Public Function cellHasDups(str_In As String) As Boolean
    'returns TRUE if there are at least 2 identical characters in a row
    
        Dim x As Integer, prevChar As String, dupFound As Boolean
        On Error GoTo dError
    
        prevChar = ""
        dupFound = False
    
        For x = 1 To Len(str_In) 'compare each character to the previous
            If Mid(str_In, x, 1) = prevChar Then dupFound = True
            prevChar = Mid(str_In, x, 1)
        Next x
    
        cellHasDups = dupFound 'return T/F to the calling cell
        Exit Function
    
    dError:
        cellHasDups = False
    End Function
    
    
    Public Function rangeHasDups(rge_In As Range) As Long
    'returns the number of cells in the specified range that have duplicate characters
    '[email protected]
    
        Dim c As Range, countDups As Long
    
        On Error GoTo dError
    
        countDups = 0
        'loop through all cells in selected range; run [cellHasDups] on each one and count "TRUE" responses
        For Each c In rge_In
            countDups = countDups + IIf(cellHasDups(c.Value), 1, 0)
        Next c
    
        rangeHasDups = countDups 'return total to the calling cell
        Exit Function
    
    dError:
        rangeHasDups = 0
    End Function
    

    The function has 2 variations. You can paste them both into a module, and use whichever one one that suits your needs (or if there is another way that would be easier, I can make changes easily at this point):

    • cellHasDups checks an individual cell and returns TRUE if the cell has 2 identical characters in a row, otherwise returns FALSE.

    • rangeHasDups checks an rane of cells (ie, A1:A20 or A5:G99 or A:A etc) and returns a Count of cells which have duplicate characters.

    ChineseDuplicateTest screen shot

    Give it a try, and let me know if you have any issues or questions!