WebRTC (Web Real Time Communications) is a standard that describes the transmission of streamed audio, video, and content between browsers (without the installation of plugins or other extensions) or other applications that support it, in real-time. This technology allows the browser to be transformed into a video conferencing terminal. To initiate communication, simply open the conference's web page.
How WebRTC Works
Let's consider how the technology operates using the example of a call between two participants via a browser:
User Opens the Page: The user opens a page containing WebRTC content.
Device Access Request: The browser requests access to the webcam and microphone, if necessary. Until the user grants access to the device, it won't be used. In cases where it's optional (e.g., when watching a broadcast), no additional permissions are required.
Advantages of the WebRTC Standard
No Need for Software Installation: Users don't need to install any additional software.
High-Quality Communication thanks to:
The use of modern video and audio codecs.
Automatic adaptation of data stream quality based on connection conditions.
Built-in echo and noise reduction system.
Automatic gain control (AGC) for participants' microphone levels.
High Level of Security: All connections are secured and encrypted according to DTLS and SRTP protocols. Additionally, WebRTC operates only over the HTTPS protocol, and any web using this technology must be signed with a certificate.
Support for SVC Technology: With the implementation of VP9 and AV1 codecs, support for SVC (Scalable Video Coding) technology has been added. Although there's currently no implementation in the browsers themselves, software solutions like TrueConf allow the use of SVC in browser clients.
Built-in Mechanism for Content Capture: This includes capabilities like desktop sharing.
Flexible Control Interface: The possibility to implement any control interface based on HTML5 and JavaScript.
Open-Source Project: You can embed it into your product or service.
True Multiplatform: The same WebRTC application will work equally well in any operating system, on a computer or mobile device, provided the browser supports WebRTC. This saves significant resources on software development.
Disadvantages of the Standard
Incompatibility Between Solutions: All WebRTC solutions are mutually incompatible because the standard only describes methods for transmitting video and audio. Implementation of methods for reaching participants, monitoring their availability, exchanging messages and files, scheduling, and other functionalities is left to the developer. In other words, it's not possible to call from one WebRTC application to another.
Privacy Concerns: For users concerned about their privacy, it's unsettling that WebRTC can reveal their real IP addresses. Neither a proxy server nor using the Tor network helps maintain anonymity. You can hide your IP address using various VPN services and also when using a TURN server. If necessary, the use of WebRTC can be disabled.
Lack of Remote Desktop Management Support: Yes, you can broadcast what's happening on the device's screen, but it will be a one-way video stream like the image transmitted from a camera, with no way to interact with the stream source. This is done for security reasons: JavaScript code cannot control anything outside the current browser window. Additional features, including remote desktop control, can be obtained using specially developed client applications from video conferencing vendors.
Codecs in WebRTC
WebRTC codecs can be divided into mandatory (browsers implementing this technology must support them) and optional (not part of the standard, but some browsers add them).
Audio Codecs
To compress audio traffic in WebRTC, mandatory codecs (Opus and G.711) and supplementary codecs (G.722, iLBC, iSAC) are used.
Opus: Opus is an audio codec with low encoding latency (from 2.5 ms to 60 ms), support for variable bitrates, and high compression, making it ideal for transmitting audio in networks with variable bandwidth. It's the main audio codec for WebRTC. Opus is a hybrid solution combining the best features of SILK (voice compression, elimination of human speech distortion) and CELT (audio data encoding) codecs. The codec is freely available, and developers using it don't have to pay royalties to copyright holders. Compared to other audio codecs, Opus decidedly wins in many respects. In several parameters, it surpasses quite popular low-bitrate codecs like MP3, Vorbis, AAC LC. Opus restores the "image" of sound closer to the original than AMR-WB and Speex.
G.711: G.711 is an outdated voice codec with a high transmission rate (64 kbps), most commonly used in traditional telephone systems. Its main advantage is minimal computational load due to the use of lightweight compression algorithms. The codec has a low level of voice signal compression and doesn't introduce additional audio delay during user communication.
G.722: G.722 is an ITU-T standard adopted in 1988 and currently free. It can operate at speeds of 48, 56, and 64 kbps and provides sound quality at the G.711 level. Similarly, G.711 is outdated. It's supported in Chrome, Safari, and Firefox browsers.
iLBC: iLBC (internet Low Bitrate Codec) is a narrowband speech codec with open-source. Available in Chrome and Safari browsers. Due to high data stream compression, using this codec increases CPU load.
iSAC: iSAC (internet Speech Audio Codec) is a wideband speech codec, previously proprietary, currently part of the WebRTC project, but its use isn't mandatory. Supported in Chrome and Safari browsers. The WebRTC implementation uses an adaptive bitrate from 10 to 52 kbps with a sampling rate of 32 kHz.
Video Codecs
The issue of selecting a video codec for WebRTC took developers several years, resulting in the inclusion of VP8 and H.264 codecs in the standard. There are also implementations of optional video codecs (H.265, VP9, AV1).
VP8: VP8 is a free video codec with an open license, characterized by high video stream decoding speed and increased resistance to frame loss. The codec is versatile, easily implementable in hardware platforms, and thus often used by video conferencing system developers in their products. Compatible with Chrome, Edge, Firefox, and Safari (12.1+) browsers.
H.264: The paid H.264 video codec became known much earlier than its counterpart. It's a codec with a high degree of video stream compression while maintaining high video quality. Its widespread use among hardware video conferencing systems suggests its use in the WebRTC standard. Compatible with Chrome (52+), Edge, Firefox (deprecated for Android 68+), and Safari browsers.
VP9: VP9 is an open and free video compression standard developed by Google in 2012. It's an evolution of ideas embedded in the VP8 standard and was subsequently expanded within the AV1 standard. Compatible with Chrome (48+) and Firefox browsers.
H.265: H.265 is a paid video codec, the successor to H.264, providing the same visual quality at half the bitrate. This is achieved through more efficient compression algorithms. This codec currently competes with the free AV1 codec.
AV1: AV1 is an open-source video compression codec designed specifically for transmitting video over the internet. Supported in Chrome (70+) and Firefox (67+) browsers.
Nuances When Working with WebRTC
Signaling Server: WebRTC doesn't provide browsers a way to discover each other. While we can generate all the necessary metadata about our peers, how does one browser learn of another's existence? How to connect them? The WebRTC signaling server manages connections between peers. It helps with signaling, assisting one peer in finding another on the network, negotiating the connection, resetting it if necessary, and terminating it. WebRTC doesn't specify a signaling protocol; one must be developed or existing solutions used. The transport for the signaling protocol isn't specified either. HTTP, WebSocket, or Data Channel can be used. WebSocket is commonly used as it’s based on a persistent connection and can transmit data almost in real-time.
Working with Hardware Devices: We can't obtain the names and characteristics of cameras until a connection is established. If more than one camera is installed in the client system, like in mobile devices, we can only offer the user a choice like Camera 1 or Camera 2 without naming these cameras (e.g., "Logitech," "Front Camera," "FullHD Camera"). If a new device is connected during the session, the web application won't be informed until the user refreshes the page. This means if you've already opened the conference page and then connected a new USB camera, the application won't know about it.
Media Capture: For security reasons, the browser doesn't provide direct access to camera drivers. Thus, we can't insist on a specific camera, choose resolution, frame rate, etc. We also can't perform subsequent video processing, set brightness, mirror the video, and other settings typically part of the camera driver's settings. There's also no unified standard solution for desktop sharing. You might have encountered in video conferencing applications that after initiating desktop sharing, an additional conference participant is often created, streaming the selected desktop or application window. Issues we face when working with cameras (inability to specify the camera's name and obtain device characteristics) also apply to working with monitors during desktop broadcasting.
P2P Connection (Session Description Protocol - SDP): The API for generating the SDP protocol is asynchronous, so situations can arise where the media stream parameters described in the incoming SDP packet don't match what the client actually sends. There are two SDP formats: Plan B used by Chromium-based browsers and Unified Plan used by Firefox. Plan B has all media streams in the same format. If we don't use an external media server, there's a possibility that some conference participants won't understand our media stream's format and won't be able to display it. Unified Plan allows selecting a codec for each media stream, e.g., encoding desktop broadcasting with one codec and camera broadcasting with another. You can teach the signaling server to translate one SDP to another, but this increases the server's load.
RTP and SRTP (SSL) Media Streaming: How to operate when there's NAT, where computers appear under one IP address but know different ones internally? The ICE (Interactive Connectivity Establishment) framework comes to the rescue. It describes how to bypass NAT and establish a connection when we have NAT. This framework uses a STUN server, a special server you can contact to find out your external IP address. During the P2P connection establishment process, each client must make a request to this STUN server to determine its IP address, generate additional information (IceCandidate), and exchange this IceCandidate using the signaling mechanism. Clients then learn each other's correct IP addresses and can establish a P2P connection. However, there are more complex cases, like when a computer is hidden behind double NAT. In such cases, the ICE framework mandates using a TURN server, a special server that changes the client-client (P2P) connection to a client-server-client connection, acting as a relay. The good news for developers is that regardless of which of the three scenarios the connection was established under (local network, STUN server, or TURN server), the API technology remains identical. At the start, we simply specify the configuration of ICE and TURN servers, indicate how to access them, and then the technology handles everything under the hood. This introduces other challenges, such as the need to have STUN and TURN servers and the associated costs of their support and maintenance. The TURN server, even though it's a simple proxy and doesn't process video, must have a high-speed internet connection to distribute the real-time media stream to all conference participants.
WebRTC in the Video Conferencing Market
Technology Popularity: To date, WebRTC is the second most popular video communication protocol after the proprietary Zoom protocol, surpassing all other standards (H.323 and SIP) and proprietary protocols (Microsoft Teams and Cisco Webex).
Increase in Customer Numbers: The WebRTC technology has a strong influence on the development of the video conferencing market. After the release of the first browsers with WebRTC support in 2013, the potential number of video conferencing terminals worldwide immediately increased by 1 billion devices. Essentially, every browser became a video conferencing terminal with basic capabilities for participating in a video conference.
Use in Specialized Solutions: The use of various JavaScript libraries and cloud service APIs supporting WebRTC allows for easy addition of video support to any web projects. In the past, real-time data transmission required developers to understand how protocols work and utilize the work of other companies, often requiring additional licenses that increased costs. Currently, WebRTC is actively used for organizing video contact centers, conducting webinars, etc.
Competition with Flash: WebRTC and HTML5 delivered a fatal blow to Flash technology, which was already experiencing its less-than-best years. Since 2017, leading browsers have officially ceased supporting Flash, and this technology has disappeared from the market.
Examples of Services Using WebRTC
Google Meet: Google Meet is a messaging service as well as a platform for video and audio calls, launched by Google in 2017. Chromium-based browsers (Google Chrome, etc.) utilize many hidden WebRTC features that are undocumented and regularly first appear in its Meet solution (as well as its predecessor Hangouts). This was the case with screen capture, background blurring, hardware encoding support on certain platforms.
Jitsi Meet: Jitsi Meet is an open-source application released by 8x8. Jitsi technology is based on a Simulcast architecture, meaning unstable operation on weak communication channels and high demands on connection speed on the server side. It allows for web conferences only in the browser and doesn't have full-fledged client collaboration applications. Conferences with up to 75 participants are supported (up to 35 with high call quality). For full-fledged use of Jitsi in a corporate environment, additional software needs to be developed and installed separately.
BigBlueButton: BigBlueButton is free software for video conferencing. Developers place particular emphasis on distance education (features like interactive whiteboards, content display, poll support, etc., are available). It supports web conferences for up to 100 participants.
What About Zoom?: Contrary to popular belief, Zoom doesn't use WebRTC technology for transmitting and decoding media data. This decision was made to save server resources. On the browser side, other web technologies are involved—low-level WebAssembly and WebSocket. Using these non-standard approaches to video stream transmission can lead to some participants experiencing video quality issues.
Developers in the Czech Republic are consistently ranking as some of the best in the world and many companies are now reaching from across the globe for our assistance and resources to develop projects large and small.
This article delves into the seamless fusion of JavaScript and Google's MediaPipe framework, showcasing their combined potential through practical code examples, real-world use cases, and step-by-step instructions for creating innovative web applications, particularly in the realm of Augmented Reality (AR), with enhanced interactive features.
Explore how dlib, renowned for its facial recognition and object detection capabilities, harnesses the Histogram of Oriented Gradients (HOG) method and Support Vector Machines (SVM) to transform images into condensed vectors for advanced analysis. Learn how the dlib library handles determining which images are similar and which are not.
This is the second part of our short series on technical debt. In this part we look more in depth at how to control technical debt and also how to work with it. Finally, we also look at three different cases of technical debt.