I am trying to setup a raspberry pi box with a usb camera as a IP Camera that can be viewed from a a generic android IP Camera monitor app. I've found some examples on how to get the video stream, and that works, but what I also need is two-way audio. This seems to come out of the box in standalone network cameras -- any ideas how that works? I want to set it up in a way compatible with typical network cameras so that my cam can be used by any generic ip camera viewer app.
Well, the modern cameras nowadays implement the ONVIF protocol. This protocol specifies that you have a RTSP server that streams audio and video from the camera to the pc, but it also mandates a so called audio backchannel. It's a bit long to explain how it works, check it in the specs.