Search code examples
javascriptarraysstringsimilarity

JavaScript: algorithm of detect and remove similar strings


Imagine that I have a JS array shown below:

0 - The Big Bang Theory - Fourth Season
1 - The Big Bang Theory - Third Season
2 - The Big Bang Theory - Second Season
3 - The Big Bang Theory - First Season
4 - The Big Bang Theory - First Season (2007)
5 - The Big Bang Theory - Fourth Season (2010)
6 - The Big Bang Theory - Second Season (2008)
7 - The Big Bang Theory - Third Season (2009)
8 - The Big Bang Theory: Access All Areas (2012)
9 - The Big Bang Theory: It All Started with a Big Bang (2012)

and we know some of the items are similar. The output should be like the below array:

0 - The Big Bang Theory - Fourth Season
1 - The Big Bang Theory - Third Season
2 - The Big Bang Theory - Second Season
3 - The Big Bang Theory - First Season
8 - The Big Bang Theory: Access All Areas (2012)
9 - The Big Bang Theory: It All Started with a Big Bang (2012)

What can I do to omit the similar items? What solution do you have?

Thanks


Solution

  • You could remove the bit in parentheses from each title, dump it into a Set -- which eliminates duplicates, and turn it back to an array:

    movies = [...new Set(movies.map(movie => movie.replace(/\s*\(\d+\)\s*$/g, '')))];
    

    movies = [
    'The Big Bang Theory - Fourth Season',
    'The Big Bang Theory - Third Season',
    'The Big Bang Theory - Second Season',
    'The Big Bang Theory - First Season',
    'The Big Bang Theory - First Season (2007)',
    'The Big Bang Theory - Fourth Season (2010)',
    'The Big Bang Theory - Second Season (2008)',
    'The Big Bang Theory - Third Season (2009)',
    'The Big Bang Theory: Access All Areas (2012)',
    'The Big Bang Theory: It All Started with a Big Bang (2012)'];
    
    movies = [...new Set(movies.map(movie => movie.replace(/\s*\(\d+\)\s*$/g, '')))];
    
    console.log(movies);