Wrangling WebRTC: Challenges and Opportunities for Real-Time Communication
At QCon New York 2013, Gustavo Garcia gave a talk on WebRTC, the new real-time communication component of HTML5. WebRTC is a set of technologies that enable real-time, low-latency communication between peers, for instance to used for video and audio conferencing as well as gaming. While real-time communication is typically implemented using websockets, WebRTC lowers latency dramatically by attempting to establish direct peer-to-peer connections, only using hops in between (tunnels) if a direct connection cannot be established, for instance due to NAT or firewalls.
Garcia described the various components one would need to build a video or audio conference system using web technologies. Establishing one-on-one calls is perhaps the simplest use case. This would require support in the browser for:
- The ability to decode and show streaming video and audio
- The ability to capture streaming video and audio from a (web) camera
- Protocols to negotiate the call (such as signalling)
- Codecs for doing efficient audio and video encoding and decoding
- Algorithms for handle echo cancellation, noise surpression, bitrate adaption (adapting the stream to changes in bandwidth) and many other aspects.
Many of these are now part of HTML5, including streaming video and audio and capturing video and audio via getUserMedia. The getUserMedia API can be used to capture audio and video, but there are also experimental extensions, currently implementing in newer builds of Chrome, to capture the screen, or individual browser tabs or parts of web pages.
One notable omission in WebRTC is direct support for signaling. Signaling, handles setting up of the call: who's trying to call who, does the person accept the call etc. This signaling has to be implemented by the developer himself, for instance using websockets or HTTP long polling. Another piece of infrastructure that the developer has to deploy himself are tunnel servers that act as an intermediary for users requiring NAT traversal and users with restrictive firewalls. According to Garcia, 8% of all calls require such intermediate tunnel servers.
Garcia described various use cases for WebRTC:
- one-on-one video and audio calls are the simplest to implement using WebRTC.
- multi-participant calls are more difficult to implement. One option is a full mesh network where each participant connects to all other participants, but due to high CPU and bandwidth usage, this approach does not scale beyond 5-6 participants. An alternative is using an intermediate server that aggregates all streams and broadcasts them to all other participant either separately or merged into a single stream.
- Telephony allows WebRTC to be connected to the landline phone network and make calls to regular landlines via the web browser.
- Gaming enables real-time multi-player gaming in combination with other HTML5 technologies such as WebGL
- File transfers, since WebRTC supports the peer-to-peer transfer of arbitrary data, the technology can be used to transfer files between users. For instance, one could imagine building bittorrent-like applications in this manner.
While WebRTC support is still young in browsers today (currently only newer versions of Chrome and Firefox support it), there many use cases. To learn more about the technology, check out webrtc.org or the related W3C standards.
Skype and WebRTC