Search code examples
mongodbpymongo

Remove Duplicate character from string in Mongodb


I want to remove duplicate characters from strings in MongoDB. Example: Input string: xxxyzzxcdv Output string: xyzcdv


Solution

  • Query

    • reduce on range(count string)
    • keep 2 values {"previous": [], "string": ""} (initial value of reduce)
    • get the cur-char {"$substrCP": ["$mystring", "$$this", 1]} this is the current index on the string, and i take the next char
    • if it is in previous kep "string" as it is, else concat to add the new character
    heelo
    
    reduce on (0 1 2 3 4) `{"$range": [0, {"$strLenCP": "$mystring"}]}`
    we start from  `{"previous": [], "string": ""}`
    
    - get 1 character start from index 0  
     `{"$substrCP": ["$mystring", "$$this", 1]}}`  = "h"
    - if this character is on previous don't add it  
     `{"$in": ["$$cur_char", "$$value.previous"]}`
    - else add it on previous and on the string the 2 concats in code
    
    Repeat for `index($$this)`= 1
    - get 1 character start from index 1  
     `{"$substrCP": ["$mystring", "$$this", 1]}}` = "e"
    .....
    
    

    PlayMongo

    aggregate(
    [{"$set": 
       {"mystring": 
         {"$getField": 
           {"field": "string",
            "input": 
             {"$reduce": 
               {"input": {"$range": [0, {"$strLenCP": "$mystring"}]},
                "initialValue": {"previous": [], "string": ""},
                "in": 
                 {"$let": 
                   {"vars": {"cur_char": {"$substrCP": ["$mystring", "$$this", 1]}},
                    "in": 
                    {"$cond": 
                      [{"$in": ["$$cur_char", "$$value.previous"]},
                       "$$value",
                       {"previous": 
                         {"$concatArrays": ["$$value.previous", ["$$cur_char"]]},
                       "string": 
                       {"$concat": ["$$value.string", "$$cur_char"]}}]}}}}}}}}}])
    

    Edit

    The second query removed only the duplicates we choose.

    Query

    • removes only the characters in the array, here only ["x"]
    • i removed the $getField because its only for MongoDB 5 +
    aggregate(
    [{"$set": 
        {"mystring": 
          {"$reduce": 
            {"input": {"$range": [0, {"$strLenCP": "$mystring"}]},
              "initialValue": {"previous": [], "string": ""},
              "in": 
              {"$let": 
                {"vars": {"cur_char": {"$substrCP": ["$mystring", "$$this", 1]}},
                  "in": 
                  {"$cond": 
                    [{"$and": 
                        [{"$in": ["$$cur_char", "$$value.previous"]},
                          {"$in": ["$$cur_char", ["x"]]}]},
                      "$$value",
                      {"previous": 
                        {"$concatArrays": ["$$value.previous", ["$$cur_char"]]},
                        "string": 
                        {"$concat": ["$$value.string", "$$cur_char"]}}]}}}}}}},
      {"$set": {"mystring": "$mystring.string"}}])
    

    Edit

    If you need to use this aggregation for update, you can use it as pipeline update like.

    update({},
    [{"$set": ......])
    

    See your driver to find how to do update with pipeline, in Java its like above, alternative run it as database command