Search code examples
oopmemorycompiler-constructionheap-memory

Implementing deep clone()


Consider some OOP language (doesn't really matter which one is it) and suppose the language creator would like to add a deep clone() method for every class. Also, it is given that the heap is implemented as a tree.

How should the language creator implement it?

So basically we can (automatically, at compilation time) implement a clone() method and adding it to the proper functions virtual table. It will malloc() the proper size which is known in run-time and copy from source to destination. Finally, it will return the pointer to the malloced memory.

If a field is a pointer to some object, we shall activate it's clone() and put the result as value at that field.

What problem would arise if the heap was not "tree-based"?
So I really don't know what the answer is. I mean, why is it relevant how the heap was implemented? I'd be glad for help here.

Thanks.


Solution

  • The algorithm you describe works fine as soon as the object structure is acyclic, i.e. forms a tree. Then indeed, it's OK to make a shallow clone of an object and then to proceed with updating fields that refer to other objects using the same technique.

    However, if there are cycles (i.e. the object structure does not form a tree), the algorithm will run forever. In the simplest case the initial object o references itself: o -> o. When a shallow clone c1 is created, the new object still references the old one: c1 -> o. Also, the old object is unchanged, i.e. we have c1 -> o -> o. According to the algorithm a clone of the object referenced from the clone should be created. Because this is a new object, it is different from c1. So, the picture becomes c1 -> c2 -> o. This is already wrong, because what a deep clone should create is c1 -> c1. The bad news is that c2 is now also a subject for processing. So, we get c1 -> c2 -> c3 -> o, etc. The algorithm loop forever (until there is no more memory to create new objects).

    To make the algorithm working, it should track what old objects have been processed and what new objects have been created for them. Then instead of blindly making new clones of referenced objects it should check whether an old object has been cloned already and, if yes, use this clone rather than create a new one.