Search code examples
javascriptdiff

Struggling to store character positions in array


Very sorry about the ambiguous title as I simply don't know how to describe my problem in one line.

I have a string that keeps changing at random intervals. I want to be able to track changes to that string and later be able to reconstruct the final string.

Example:

const text1 = 'Hello everyone, my name is Chris';
//12 seconds later it turns into:
const text2 = 'Hello everyone, my fame is gone';
//... Keeps changing...

I am using this outstanding library called fast-diff to compare the strings and store the diff array. Here is how it works:

const text1 = 'Hello everyone, my name is Chris';
const text2 = 'Hello everyone, my fame is gone';
const diff_array = diff(text1, text2, 1);
console.log(diff_array);

The result is:

[
  [ 0, 'Hello everyone, my ' ],
  [ -1, 'n' ],
  [ 1, 'f' ],
  [ 0, 'ame is ' ],
  [ -1, 'Chris' ],
  [ 1, 'gone' ]
]

Where 0 represents unchanged characters, 1 represents added characters and -1 represents removed characters.

Based on this diff array, I can easily reconstruct 'Hello everyone, my fame is gone' by running something like this:

diff_array.forEach(el => {
    if(el[0] !== -1)
        process.stdout.write(el[1]);
})

All I gotta do is store the array somewhere.

Problem: When dealing with large strings, the diff array, as currently constructed, stores a tremendous amount of redundancy as most times the changes to a 10,000 character strings are minimal.

Goal: What I want to do instead is store the original string AND turn the diff array from

[
  [ 0, 'Hello everyone, my ' ],
  [ -1, 'n' ],
  [ 1, 'f' ],
  [ 0, 'ame is ' ],
  [ -1, 'Chris' ],
  [ 1, 'gone' ]
]

to:

[
    [
        0,
        {
            start: 0,
            end: 19
        }
    ],
    [ -1, 'n' ],
    [ 1, 'f' ],
    [
        0,
        {
            start: 20,
            end: 27
        }
    ],
    [-1, 'Chris' ],
    [ 1, 'gone' ]

]

Where 0s instead make a reference to the start and end character positions of the unchanged original text. That will hugely lower my storage cost. I need a function that will accept the original text and the diff array and spit out a new diff array that looks like the above.

The reason I am struggling here is because it isn't a simple matter of searching the original string for 'Hello everyone, my ' and 'ame is ' and finding their positions, because those substrings can be in other places as well. I have to search it sequentially while keeping track of where the current search position is.


Solution

  • The diffs you get back from fast-diff have extra information you don't need to reconstruct from the original, there likely left in for validation purposes. So what you can do is strip them out, basically for the delete & copy action just store a number instead (no need for start/end, just length), and it's only the the add action you need to keep the string.

    eg. The example diff you showed can be just made into ->

    [[0,19],[-1,1],[1,"f"],[0,7],[-1,5],[1,"gone"]]
    

    Below is a working snippet using this technique.

    const makeShortDifs = d => 
       d.map(([action,value]) => 
       [action,action === 1 ? value: value.length]);
    
    function makeNew(o, difs) {
      let n = '', p = 0;
      for (const d of difs) {
        const [action, value] = d;
        if (action === 0) {
          n += o.slice(p, p + value);
          p += value;
        } else if (action === 1) {
          n += value;
        } else p += value;
      }
      return n;
    }
    
    
    ///Test...
    
    //difs we get from fast-diff
    const bigDifs = [
      [ 0, 'Hello everyone, my ' ],
      [ -1, 'n' ],
      [ 1, 'f' ],
      [ 0, 'ame is ' ],
      [ -1, 'Chris' ],
      [ 1, 'gone' ]
    ];
    
    //make our diffs smalll
    const smallDifs = makeShortDifs(bigDifs);
    
    //now test small diffs on original text.
    const original = 'Hello everyone, my name is Chris';
    console.log(original);
    console.log(makeNew(original, smallDifs));