Search code examples
architecturefrontendbackendreal-time

File manager application: best approach to synchronize multiple users' changes?


I'm working on a file manager application. It is possible to add, remove, copy and paste, rename, move and modify files and folders.

At the beginning, there was the most basic API for that. The way it worked, is that when opening the client-side app, the whole file structure (together with all the files' contents) gets downloaded to the client, and the client works with that copy localy the way they like: they can edit files, create new ones, move files and folders around and so on and so forth.

However, there arised an issue: if there are many people working simultaneously, then even if they work on different files, it's possible that some person would overwrite the changes made by some other person. For example, here's the file structure:

  • a.txt
  • b.txt

Bob opens the file a.txt and modifies its contents. Alice, at the same time, opens b.txt and modifies it. Alice saves the changes first, everything's OK, the Alice's changes are saved correctly. Then Bob saves his state, and because when he loaded the files before Alice has saved her changes, the contents of Bob's b.txt replaces those saved by Alice (the updated version). That's to put it short and easy. Obviously, when you get to the part with renaming and deleting files, it gets worse.

So, the back-end guys decided to release a new version of the API, where instead of saving the whole (updated) file structure as a batch, it's required to save files per-file. This brings difficulties to the client side application, because now it's necessary, if not to calculate the structural diffs, then at least to know which files are "dirty" and which are not.

Even if it's obvious that the new version of the API solves the original issue with overriting other files, it doesn't entirely solve the root issue itself: it's still possible to overwrite someone else's changes when 2 people are working on the same file. For example, if both Bob and Alice open the same a.txt and Bob introduces some changes while Alice introduces some other changes, then whoever saves their changes last, overwrites the changes made by the other person.

Looks like we need some other solution.

What first came to mind is file locking. However, knowing something about programming and the examples of file locking from the past, I can tell that it's definitely not the best solution as even if it solves the original issue, it brings many other ones instead (can list it here if someone's interested, but I guess it's not that hard to imagine).

So far, we have 3 different approaches:

  1. The "Old API" - the easiest to implement, but with the huge drawback
  2. The "New API" - a bit harder to implement and doesn't really solve the original issue. Instead, just narrows it down a bit. Kind of, localizes the issue to a single file
  3. File Locking - completely and properly solves the original issue, but in return brings many more other issues.

I decided to go deeper into thinking and came up with an analogy of what's currently going on and how similar issues are solved in real-world application.

First - version control.

Instead of having a single "local" copy (the word "local" is in quotes because it's actually a remote copy, loaded by the client side, but anyway) we can have a separate local copy per-user (or per-session, or even per browser-tab, or whatever). Everyone can modify their own copies, and then someone would have to resolve the conflicts and decide whose changes make it into the final saved version.

Benefits:

  • Solves the original issue
  • It's still possible to use the "Old API" which makes things easy on the client side

Drawbacks:

  • Requires to implement a dynamic branching on the server side, which is a quite complicated process
  • There will be conflicts. And it'd be required to implement on the level of the client side app some view where it'd be possible to resolve them conflicts

Second - multicursor editors.

If it's not desired to have different "local" copies and there instead should be a single one (just like it is at the moment) but we still need to make sure the users can edit files simultaneously, then why don't we synchronize their changes? There are solutions currently working that way (e.g. things like Repl.it or Codesandbox, or even live collaboration in JetBrains' products), so maybe there's something ready to be used?

Benefits:

Benefits:

  • Solves the original issue
  • It's still possible to use the "Old API" which makes things easy on the client side
  • It's literally the same copy of the files that's being edited, regardless of how many users work on it at the same time. So, compared to the previous (git-based) idea, there's no need in conflicts resolution and having multiple states of the same thing (which would definitely make things difficult at some point. Git's not easy as well)
  • Theoretically, it should be possible to avoid chaning both the existing frontend and backend applications, while only implementing some kind of a middleware that would allow to "extend" things

Drawbacks:

  • Requires to implement that middleware (or, who knows, maybe even a completely different app (or even both the frontend and the backend ones))
  • I have no idea how to implement it. Even where to start, while for the most part it's a task for my front-end side of the application.

So, the questions are:

Which approach would you choose, considering as the major bottleneck the time to implement? We only have something like a month both to decide what to do and actually do it.

Which approach is realistically better? The git-based one or the one with the real-time editing?

If we dicide to take the real-time editing approach, are there any free solutions to use as a base? By "free" I mean licensing mostly. But if it's free in price, that's even better.

And the last one. Am I missing something? Are there other easier approaches to solve our issue that I'm not aware of?


Solution

  • Which approach would you choose, considering as the major bottleneck the time to implement? [...] The git-based one or the one with the real-time editing?

    It seems to me you would eventually be reinventing a source control (version control) system, then why not just use an existing one, such as Git? -- Which I'd suggest even if not in a hurry: Git in particular is also quite amenable to integration into operational workflows.

    On the other hand, collaborative editing is essentially a different issue and orthogonal to the above: namely, even with collaborative editing, there would still be the problem of editing sessions by more than a group of users at the same time. -- So, while this is a cool feature to have, I'd just do it if requirements ask for it, I would certainly not do this, and then restrict editing to only one group of collaborating users at a time, to solve the source control problem.