Draft
This page is not complete.
The Real-time Transport Protocol (RTP), defined in RFC 3550, is an IETF standard protocol to enable real-time connectivity for exchanging data that needs real-time priority. This article provides an overview of what RTP is and how it functions in the context of WebRTC.
Note: WebRTC actually uses SRTP (Secure Real-time Transport Protocol) to ensure that the exchanged data is secure and authenticated as appropriate.
Keeping latency to a minimum is especially important for WebRTC, since face-to-face communication needs to be performed with as little latency as possible. The more time lag there is between one user saying something and another hearing it, the more likely there is to be episodes of cross-talking and other forms of confusion.
Before examining RTP's use in WebRTC contexts, it's useful to have a general idea of what RTP does and does not offer. RTP is a data transport protocol, whose mission is to move data between two endpoints as efficiently as possible under current conditions. Those conditions may be affected by everything from the underlying layers of the network stack to the physical network connection, the intervening networks, the performance of the remote endpoint, noise levels, traffic levels, and so forth.
Since RTP is simply a data transport, it is augmented by the closely-related RTP Control Protocol (RTCP), which is defined in RFC 3550, section 6. RTCP adds features including Quality of Service (QoS) monitoring, participant information sharing, and the like. It isn't adequate for the purposes of fully managing users, memberships, permissions, and so forth, but provides the basics needed for an unrestricted multi-user communication session.
The very fact that RTCP is defined in the same RFC as RTP is a clue as to just how closely-interrelated these two protocols are.
RTP's primary benefits in terms of WebRTC include:
RTP itself doesn't provide every possible feature, which is why other protocols are also used by WebRTC. Some of the more noteworthy things RTP doesn't include:
Where it matters for WebRTC purposes, these are dealt with in a variety of places within the WebRTC infrastructure. For example, RTCP handles QoS monitoring.
Each RTCPeerConnection
has methods which provide access to the list of RTP transports that service the peer connection. These correspond to the following three types of transport supported by RTCPeerConnection
:
RTCRtpSender
RTCRtpSender
s handle the encoding and transmission of MediaStreamTrack
data to a remote peer. The senders for a given connection can be obtained by calling RTCPeerConnection.getSenders()
.RTCRtpReceiver
RTCRtpReceiver
s provide the ability to inspect and obtain information about incoming MediaStreamTrack
data. A connection's receivers can be obtained by calling RTCPeerConnection.getReceivers()
.RTCRtpTransceiver
RTCRtpTransceiver
is a pair of one RTP sender and one RTP receiver which share an SDP mid
attribute, which means they share the same SDP media m-line (representing a bidirectional SRTP stream). These are returned by the RTCPeerConnection.getTransceivers()
method, and each mid
and transceiver share a one-to-one relationship, with the mid
being unique for each RTCPeerConnection
.Because the streams for an RTCPeerConnection
are implemented using RTP and the interfaces above, you can take advantage of the access this gives you to the internals of streams to make adjustments. Among the simplest things you can do is to implement a "hold" feature, wherein a participant in a call can click a button and turn off their microphone, begin sending music to the other peer instead, and stop accepting incoming audio.
Note: This example makes use of modern JavaScript features including async functions and the await
expression. This enormously simplifies and makes far more readable the code dealing with the promises returned by WebRTC methods.
In the examples below, we'll refer to the peer which is turning "hold" mode on and off as the local peer and the user being placed on hold as the remote peer.
When the local user decides to enable hold mode, the enableHold()
method below is called. It accepts as input a MediaStream
containing the audio to play while the call is on hold.
async function enableHold(audioStream) { try { await audioTransceiver.sender.replaceTrack(audioStream.getAudioTracks()[0]); audioTransceiver.receiver.track.enabled = false; audioTransceiver.direction = "sendonly"; } catch(err) { /* handle the error */ } }
The three lines of code within the try
block perform the following steps:
MediaStreamTrack
containing hold music.This triggers renegotiation of the RTCPeerConnection
by sending it a negotiationneeded
event, which your code responds to generating an SDP offer using RTCPeerConnection.createOffer
and sending it through the signaling server to the remote peer.
The audioStream
, containing the audio to play instead of the local peer's microphone audio, can come from anywhere. One possibility is to have a hidden <audio>
element and use HTMLAudioElement.captureStream()
to get its audio stream.
On the remote peer, when we receive an SDP offer with the directionality set to "sendonly"
, we handle it using the holdRequested()
method, which accepts as input an SDP offer string.
async function holdRequested(offer) { try { await peerConnection.setRemoteDescription(offer); await audioTransceiver.sender.replaceTrack(null); audioTransceiver.direction = "recvonly"; await sendAnswer(); } catch(err) { /* handle the error */ } }
The steps taken here are:
offer
by calling RTCPeerConnection.setRemoteDescription()
.RTCRtpSender
's track with null
, meaning no track. This stops sending audio on the transceiver.direction
property to "recvonly"
, instructing the transceiver to only accept audio and not to send any.sendAnswer()
, which generates the answer using createAnswer()
then sends the resulting SDP to the other peer over the signaling service.When the local user clicks the interface widget to disable hold mode, the disableHold()
method is called to begin the process of restoring normal functionality.
async function disableHold(micStream) { await audioTransceiver.sender.replaceTrack(micStream.getAudioTracks()[0]); audioTransceiver.receiver.track.enabled = true; audioTransceiver.direction = "sendrecv"; }
This reverses the steps taken in enableHold()
as follows:
RTCRtpSender
's track is replaced with the specified stream's first audio track."sendrecv"
, indicating that it should return to both sending and receiving streamed audio, instead of only sending.Just like when hold was engaged, this triggers negotiation again, resulting in your code sending a new offer to the remote peer.
When the "sendrecv"
offer is received by the remote peer, it calls its holdEnded()
method:
async function holdEnded(offer, micStream) { try { await peerConnection.setRemoteDescription(offer); await audioTransceiver.sender.replaceTrack(micStream.getAudioTracks()[0]); audioTransceiver.direction = "sendrecv"; await sendAnswer(); } catch(err) { /* handle the error */ } }
The steps taken inside the try
block here are:
setRemoteDescription()
.RTCRtpSender
's replaceTrack()
method is used to set the outgoing audio track to the first track of the microphone's audio stream."sendrecv"
, indicating that it should resume both sending and receiving audio.From this point on, the microphone is re-engaged and the remote user is once again able to hear the local user, as well as speak to them.
© 2005–2018 Mozilla Developer Network and individual contributors.
Licensed under the Creative Commons Attribution-ShareAlike License v2.5 or later.
https://developer.mozilla.org/en-US/docs/Web/API/WebRTC_API/Intro_to_RTP