I want to make a program which takes video and audio and merges them. Video type or audio type is not important for me. How can I make this? Does any library exist for this? I know there are many programs about this topic but I want to learn how to implement such a program.
The technical term for what you are trying to do is 'multiplexing', and commonly referred to as 'muxing'.
FFmpeg is a multiplatform command line tool that does this, and arguable the industry standard. Many projects wrap FFmpeg into libraries and GUIs.
FFmpeg is also open source, so you can download the code and see how they have done it. That siad, it is very big and complex.
If you are interested in the actual mechanics of muxing separate audio and video files together into a destination file, then you will need to learn much about container formats and Codecs.