AI good practices in combining tasks?

I'm designing a neural network (pytorch) that accomplishes two different, but entangled, tasks. One is very difficult, one is very easy. While training two different models (two trainings, two set of parameters,...) can crack both of the problems independently, when combined, the hard task fails, no matter what I do. I would like to understand why and if there's a way around it

Example: Let's say that I would like, given a picture, to:

determine if there's a dog in the picture (easy)
get a new picture where the dog has a red hat (if there's a dog) or a cat is placed coherently in the picture (very hard)

Both of this problems are not impossible, however if I ask the NN to crack them simultaneously only the easy one is successful. Why?

My guesses are the following:

the first task makes it so that the overfitting in the validation dataset is reached before the second task is handled
the cost function changes drastically, so that the features given by the second task are basically noise when compared with the first one's.

I know I could just split the tasks, but that's not what is required from me at the moment

what do?

Solution

Without knowing your task/network/data, it's difficult to give advice on your problem specifically. However, generally speaking, the interaction between different tasks can vary. Sometimes they can complement each other and boost learning, sometimes not. If you're interested, three subfields of Machine Learning that investigate this are Transfer Learning (for transferring knowledge from one to domain to another), Curriculum Learning (for transferring knowledge from one task to another in sequence) and Multitask Learning (for learning multiple tasks simultaneously).

Is the input data the same for both tasks? Does the network need to learn them in sequence or simultaneously?

Your suggestions may be right - the network might get stuck in a local minimum of the easy task, and not be able to then learn the difficult task. That sounds as if they are quite different tasks, and don't share overlapping solutions? If that's not true, and they should share parts of the solutions, perhaps the learning algorithm could be tuned (slower learning rate, different ratio of loss functions, different set-up of combining the tasks (sequence/simultaneous/semi-simultaneous).