streaming media RTP and Shoutcast

I try to create an audio streaming server in java. There are several protocols to stream media like RTP, and I'm a little confused with all protocols.

What are the differences between RTP and Shoutcast? Do they use TCP, UDP or HTTP? Does anyone have a clear explanation on this point?

Solution

SHOUTcast and Icecast use a client protocol very similar to HTTP. (In fact, Icecast is compliant with HTTP as spec'ed in RFC2616, and most HTTP clients work with SHOUTcast without modification.) A request comes in for a stream, and they return the stream audio data in the same way as an HTTP response, along with some extra metadata.

GET /radioreddit/main_mp3_128k HTTP/1.1

HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Access-Control-Allow-Headers: X-Requested-With
Server: AudioPump Server/0.8.1 (http://audiopump.co)
Content-Type: audio/mpeg
Cache-Control: no-cache
Pragma: no-cache
Expires: Sat, 15 Aug 2009 22:00:00 GMT
Connection: close
icy-genre: Indie,Rock,Talk
icy-name: Radio Reddit - Main
icy-pub: 1
icy-url: http://radioreddit.com
Date: Tue, 05 Aug 2014 13:40:55 GMT

In this example, the response is purely HTTP. If this were a SHOUTcast server, instead of seeing HTTP/1.1 200 OK in the status line, you would see ICY 200 OK. The headers that start with icy- are those that describe the station. Sometimes there are more, sometimes they don't exist at all. A client would be able to play MP3 data from this station as-is.

Now, sometimes the client will request metadata to be sent in the stream. This allows the player to tell you what is playing. A client does this by sending a icy-metadata: 1 header. The server will respond with icy-metaint: 8192, which means that every 8192 bytes there will be a chunk of metadata. You can read more about the format of that metadata on a previous answer.

I should also note that this form of streaming is often called HTTP Progressive Streaming. To the client, it's no different than playing a media file as it is being downloaded... except that the file is infinite in size.

Now, RTP is a protocol used in conjunction with RTSP. RTP is the actual media data where RTSP is used for control. These protocols are much more complicated, as they are meant for true streaming. That is, if the client can't handle the bandwidth, they can drop down to a lower bitrate. If the client needs to control the remote, that can be done as well. This complexity comes at a price. The servers are complicated to implement. There isn't great client compatibility.

A couple years back when I started to create my own streaming server, I had asked myself the same question. Do I implement a real streaming protocol that has not-so-great client support and will take a long time to figure out, or do I implement a protocol that everything can play, and is easy to build for. I went the HTTP route.

These days, you should also consider HLS which is nothing more than chunking your source stream into pieces at several bitrates and serving it over HTTP. If the client wants to change bitrates due to lack of bandwidth, it can just start requesting the low bitrate chunks. HLS also doesn't have great client support, but it is getting better. I suspect it will surpass all others for media delivery on websites in time.