What is Web Real Time Communications?

In this article, we will reveal some of the features of using WebRTC and consider the advantages and disadvantages of this technology.

Aleksey Andruschenko

Aleksey Andruschenko

Full-Stack Developer

21 Apr 2023
13 min read

WebRTC (Web Real Time Communications) is a standard that describes the transfer of streaming audio data, video data, and content between browsers (without installing plugins or other extensions) or other applications that support it in real-time. This technology allows you to turn the browser into a video conferencing terminal. To start communication, simply open the web page of the conference.

Tables of content


How WebRTC works

Benefits of WebRTC standard

Disadvantages of the standard

Codecs in WebRTC

Tricks of working with WebRTC technology

WebRTC for the video conferencing market

Examples of services that use WebRTC

How WebRTC works

Consider the operation of the technology using the example of a call between two subscribers through a browser:

  1. The user opens a page containing WebRTC content.
  2. The browser requests access to the webcam and microphone, if necessary. Until the user allows access to the device, it will not be used. In cases where this is optional (for example, when watching broadcasts), no additional permissions are required.

Benefits of WebRTC standard

  • No software installation required.

  • High quality communication thanks to:

    • use of modern video and audio codecs;
    • automatic adjustment of stream quality to connection conditions;
    • built-in echo and noise reduction system;
    • automatic level control of participant microphones (AGC).
  • High level of security: all connections are secure and encrypted according to the DTLS and SRTP protocols. At the same time, WebRTC works only over the HTTPS protocol, and the site using the technology must be signed with a certificate.

  • Support for SVC technology has been added as part of the implementation of the VP9 and AV1 codecs. Despite the fact that at the moment there is still no implementation in the browsers themselves, TrueConf software solutions allow the use of SVC in browser clients.

  • There is a built-in mechanism for capturing content, such as the desktop.

  • Ability to implement any control interface based on HTML5 and JavaScript.

  • An open source project - you can embed it in your product or service.

  • True cross-platform: the same WebRTC application will work equally well on any operating system, desktop or mobile, provided that the browser supports WebRTC. This saves a lot of resources for software development.

Disadvantages of the standard

  • All WebRTC solutions are incompatible with each other, since the standard describes only ways to transmit video and sound, leaving the implementation of methods for addressing subscribers, tracking their availability, exchanging messages and files, scheduling, and other things for the developer. In other words, you won't be able to call from one WebRTC application to another.

  • For users who are concerned about their privacy, the unpleasant discovery will be that WebRTC determines their real IP addresses. At the same time, neither a proxy nor the use of the Tor network will help maintain anonymity. You can hide the IP address using various VPN services, as well as when using a TURN server. If necessary, the use of WebRTC can be disabled.

  • WebRTC does not support remote desktop management. Yes, you can broadcast what is happening on the screen of the device, but it will be the same one-way video stream as the image transmitted from the camera and there is no way to interact with the stream source. This is done for security reasons: Javascript code cannot control anything outside the current browser window. More features, including remote desktop control, can be obtained using specially developed client applications of video conferencing vendors.

Codecs in WebRTC

WebRTC codecs can be divided into mandatory (browsers that implement this technology must support them) and optional (not included in the standard, but added by some browsers).

Audio codecs

To compress audio traffic in WebRTC, mandatory codecs (Opus and G.711) and additional ones (G.722, iLBC, iSAC) are used.

Opus

Opus is an audio codec with low encoding latency (from 2.5ms to 60ms), variable bitrate support and high compression, which is ideal for audio streaming over variable bandwidth networks. It is the main audio codec for WebRTC. Opus is a hybrid solution that combines the best features of SILK (Voice Compression, Human Speech Distortion Elimination) and CELT (Audio Data Encoding) codecs. The codec is freely available, developers who use it do not need to pay royalties to copyright holders. Compared to other audio codecs, Opus certainly wins in many ways. In a number of parameters, it surpasses quite popular low bitrate codecs, such as MP3, Vorbis, AAC LC. Opus restores the "picture" of sound closer to the original than AMR-WB and Speex.

G.711

G.711 is an obsolete high bit rate (64 kbps) voice codec that is most commonly used in traditional telephony systems. The main advantage is the minimal computational load due to the use of lightweight compression algorithms. The codec has a low level of compression of voice signals and does not introduce additional audio delay during communication between users.

G.711 is supported by a large number of devices. Systems that use this codec are easier to use than those based on other audio codecs (G.723, G.726, G.728, etc.). In terms of quality, G.711 received a score of 4.2 in MOS testing (a score of 4-5 is the highest and means good quality, similar to the quality of voice traffic in ISDN and even higher).

G.722

G.722 is an ITU-T standard adopted in 1988 and is currently free. It can operate at 48, 56 and 64 kbps, providing sound quality at the level of G.711. And likewise G.711 is obsolete. Supported in Chrome, Safari and Firefox.

iLBC

iLBC (internet Low Bitrate Codec) is an open source narrowband speech codec. Available in Chrome and Safari. Due to the high compression of the stream, when using this codec, the load on the processor increases.

iSAC

iSAC (internet Speech Audio Codec) is a wideband speech audio codec, formerly proprietary, which is currently part of the WebRTC project, but is not required to be used. Supported in Chrome and Safari. The implementation for WebRTC uses an adaptive bitrate from 10 to 52 kbps with a sampling rate of 32 kHz.

Video codecs

The issues of choosing a video codec for WebRTC took developers several years, as a result, VP8 and H.264 were included in the standard. There are also implementations of optional video codecs (H.265, VP9, AV1).

VP8

VP8 is a free video codec with an open license, featuring high video stream decoding speed and increased resistance to frame loss. The codec is universal, it is easy to implement it into hardware platforms, so developers of video conferencing systems often use it in their products. Compatible with Chrome, Edge, Firefox and Safari (12.1+) browsers.

The paid H.264 video codec became known much earlier than its counterpart. This is a codec with a high degree of compression of the video stream while maintaining high video quality. The widespread use of this codec among hardware video conferencing systems suggests its use in the WebRTC standard. Compatible with Chrome (52+), Edge, Firefox (deprecated for Android 68+), and Safari.

VP9

VP9 is an open and free video compression standard developed in 2012 by Google. It is a development of the ideas embodied in VP8 and was subsequently expanded within the framework of AV1. Compatible with Chrome (48+) and Firefox browsers.

H.265

H.265 is a paid video codec that is the successor to H.264, providing the same visual quality at half the bitrate. This is achieved with more efficient compression algorithms. This codec currently competes with the free AV1.

AV1

AV1 is an open-source video compression codec designed specifically for delivering video over the Internet. Supported in Chrome (70+) and Firefox (67+).

Tricks of working with WebRTC

Signaling server

WebRTC does not provide a way for browsers to find each other. We can generate all the necessary meta-information about our loved ones, but how does one browser know about the existence of another? How to connect them?

A WebRTC signaling server is a server that manages the connections between peers. It is just used for signaling. It helps with enabling one peer to find another in the network, negotiating the connection itself, resetting the connection if needed, and closing it down.

WebRTC does not specify a signaling protocol, you need to develop it yourself or use ready-made solutions. Also, the transport for the signaling protocol is not specified. You can use HTTP, WebSocket or datachanal. Commonly used WebSocket in case it is based on persistent connection and it can transmit data close to real time.

Working with hardware devises

We cannot get the names and characteristics of the cameras until the connection is established. If more than one camera is installed on the client system. For example mobile devices. We can only offer the user the choice of Camera 1 or Camera 2 but do not call these cameras (for example “Logitech”, “Front Camera”, “FullHD Camera”)

If the client connects a new device during the session, the web application will not be informed about this until the user refreshes the page. That is, if you have already opened the conference page and then connected a new USB camera, the application will not know about it.

Capture media

For security reasons, the browser does not provide direct access to camera drivers. Therefore, we cannot insist on the camera, choose the resolution, frame rate, and so on.

Also, we cannot do video post-processing, adjust brightness, mirror video, and other things that are usually included in the camera driver settings.

There is also no single standard solution for desktop sharing. You may have seen in video conferencing applications that when you start sharing a desktop, an additional conference participant is often created and stream the desktop or application window you selected. The problems that we face when working with cameras (the inability to specify the camera name and the inability to get characteristics of device) also apply to working with monitors when broadcasting the Desktop.

P2P connection (SDP session description protocol)

The API for generating SDP is asynchronous, so there may be situations when the parameters of the media stream described in the incoming SDP packet do not correspond to what the client actually sends.

There are two formats of SDP: Plan B used by Chromium-based browsers and Unified Plan used by Firefox.

Plan B has all media streams in the same format. If we do not use an external media server, then there is the possibility that some conference participants will not understand the format of our media stream and will not be able to display it.

The Unified Plan allows you to select a codec for each media stream.

For example, Encode the desktop broadcast with one codec and the camera broadcast with another. You can teach the signal server to translate one SDP to another but its increases server loading.

RTP & SRTP(ssl) media streaming

How to live if there is NAT, if computers stick out under one IP address, but inside they know about each other by others? The ICE framework comes to the rescue - Internet Connectivity Establishment. It describes how to bypass NAT, and how to establish a connection if we have NAT.

This framework uses the STUN server. This is such a special server, referring to which you can find out your external IP address. Thus, in the process of establishing a P2P connection, each of the clients must make a request to this STUN server in order to find out its IP address, generate additional information, IceCandidate, and exchange this IceCandidate using the signaling mechanism. Then the clients will know about each other with the correct IP addresses, and will be able to establish a P2P connection. However, there are more complex cases. For example, when the computer is hidden behind double NAT. In this case, the ICE framework mandates the use of a TURN server.

This is such a special server that turns the client-client connection, P2P, into a client-server-client connection, that is, it acts as a relay. The good news for developers is that regardless of which of the three scenarios the connection was established, whether we are on the local network, or whether we need to contact the STUN or TURN server, the API technology for us will be identical. We simply specify the configuration of the ICE and TURN servers at the beginning, indicate how to access them, and after that the technology does everything for us under the hood.

Here we face some more difficulties. The first is the need to have STUN and TURN servers, respectively, the cost of their support and maintenance. The TURN server, although it is a simple proxy server and does not process video, must have a high-speed Internet connection in order to distribute a real-time media stream to all conference participants.

WebRTC for the video conferencing market

Popularity of technology

To date, WebRTC is the second most popular video communication protocol after the proprietary Zoom protocol and is ahead of all other standards (H.323 and SIP) and proprietary (Microsoft Teams and Cisco Webex) protocols.

An increase in the number of customers

WebRTC technology has had a strong impact on the development of the video conferencing market. After the release of the first browsers with WebRTC support in 2013, the potential number of video conferencing terminals around the world immediately increased by 1 billion devices. In fact, each browser has become a videoconferencing terminal with basic capabilities for participating in videoconferencing. Use in specialized solutions

The use of various JavaScript libraries and cloud service APIs with WebRTC support makes it easy to add video support to any web projects. In the past, real-time data transmission required developers to learn how the protocols worked and to use the work of other companies, which most often required additional licensing, which increased costs. Already, WebRTC is actively used for organizing video contact centers, holding webinars, etc.

Competition with Flash

WebRTC and HTML5 were a death blow for Flash technology, which was already going through its far from the best years. Since 2017, the leading browsers have officially stopped supporting Flash and the technology has finally disappeared from the market.

Examples of services using WebRTC

Google Meet

Google Meet is an instant messaging service, as well as video and audio calls, released in 2017 by Google. Chromium-based browsers (Google Chrome, etc.) use many hidden WebRTC features that are not documented and periodically appear first in its Meet solutions (as in its Hangouts predecessor). So it was with screen capture, background blur, support for hardware encoding on some platforms.

Jitsi Meet

Jitsi Meet is an open source app released by 8x8. Jitsi technology is based on the Simulcast architecture, which means unstable operation on weak communication channels and high connection speed requirements on the server side. Allows you to conduct web conferences only in a browser and does not have full-fledged client applications for collaboration, conferences with a maximum of 75 participants are supported (up to 35 with high call quality). To fully use Jitsi in a corporate environment, you need to independently develop and install additional software.

BigBlueButton

BigBlueButton is free video conferencing software. The developers place a special emphasis on distance education (there are such functions as an interactive whiteboard, displaying content, supporting surveys, etc.). Supports web conferencing up to 100 participants.

What about Zoom?

Contrary to popular belief, Zoom does not use WebRTC technology to transmit and decode media data. This was done to save server resources. On the browser side, other web technologies are involved - low-level WebAssembly and WebSocket. When using such non-standard approaches for transmitting a video stream, some participants may experience problems with picture quality.