Experimental Hub

Python
TypeScript
React
WebRTC
Chart.Js
Custom web conferencing platform specifically designed for research with filters, featuring a comprehensive filter framework. Can be used to conduct online experiments. I developed the backend and networking architecture as part of my bachelors thesis.

Background

We are able to observe a rise of filters in consumer applications, such as Snapchat, Instagram or conferencing platforms like Zoom and Microsoft Teams. These filters are often used for entertainment or to enhance privacy, e.g. virtual background filters. However we also see more of their negative impacts on society. Especially face filters impact beauty standards and emotions. They can also cause self-discrepancy and body dysmorphia and the use of face filters was linked to an increase in acceptance of cosmetic surgery.
With this in mind, filters are an increasingly interesting topic for researchers. However, researching filters using existing platforms can come with privacy concerns and a lot of limitations, in part because existing platforms are not designed for researchers. Researchers are either unable or extremely limited in their ability to implement custom filters. For example, should a custom filter include some kind of real time analysis or machine learning algorithm, that would be likely impossible within a existing platform. Another problem researchers can face is control over participants. Existing applications leave every user in control over what filters are enabled and how they might be configured. While this is good in there intended use, it can be a problem when a researcher needs a participant to change a filter mid experiment. Especially should the participant not be aware of how to do that, e.g. due to limited technical knowledge.

Custom Conferencing Platform

To solve the problems outlined above, I developed a custom conferencing platform. This platform was specifically designed for researchers in order to address the challenges they face when conducting research with filters using existing applications. The platform has a web frontend to allow participants to join without any additional installation (accessibility). Researchers are in full control over collected data and participants, allowing them to change, edit and apply filters without any input from participants. In addition, the platform features a comprehensive filter framework, which allows the implementation of custom filters without any arbitrary limitations. This includes the ability to use filters for real time analysis, include machine learning algorithms and much more.

Design

The platform differentiates between two user types, experimenters and participants. In order to connect to the server, the client provides a user type and corresponding authentication. Experimenters must provide a password, while participants use a session ID and participant ID to authenticate themselves. Users then have access to two different APIs, depending on their user type. Besides the different user types and APIs, the server also stores a list of planned sessions. Theses sessions are created by experimenters in a form page and send to the server using the experimenter API.
Conferencing Platform Design
Conferencing Platform Design
Experimenters see a list of all planned sessions on their landing page, with the ability to edit, copy, delete or create new sessions. The sessions have a variety of attributes, like a title, description and date, but most importantly they have a list of invited participants ("Participants Data"). This list contains the name and state of all participants within a session, including active filters and their configuration. When it is time to conduct the actual experiment, a experimenter selects one of the sessions and starts it. This creates an Experiment on the server, which users can then connect to.

User Flow

The following image shows the intended user flow. In preparation for an experiment, experimenters can manage sessions and start a session in order to create an experiment. Only after a experiment has been started can the invited participants join. Initially the experiment is in a waiting state. Participants join a waiting room, prepare their camera and can chat with the experimenter. Only the experimenter already sees all participants. During this phase, the Experimenter can e.g. take notes, adjust filters and communicate with participants, should for example adjustments to their webcam setup be required. When the experiment is started by the experimenters, participants join the experiment room and are able to see each other.
Conferencing Platform User Flow

Filter Framework

In order to allow researchers and developers to develop their own custom filters, the platform features a comprehensive filter framework. To ensure compatibility with any filter idea or concept, I conducted a literature analysis and created a list of filter concepts. This list was extended with filter concepts from existing applications and brainstormed ideas. The concepts were then categorized into analysis, manipulation, both or other and sorted regarding the number of involved participants and expected computational complexity.
  • Manipulation: Filters that modify a track. For example a cartoon effect filter that changes the general appearance of a video track.
  • Analysis: Filters that analyze a track. For example a face detection filter.
  • Manipulation and Analysis: Filters that analyze and modify a track. For example a virtual background filter, that detects a person and replaces everything expect that person with an static image.
  • Other: Filters that neither analyze or modify a track. For example a recorder that records a track.
Filter Concepts Affinity Diagram

Filter Pipeline

The filter pipeline executes the filters on a track. Each frame received by the server passes trough it and each user and track (audio/video) has its own pipeline. In essence, the pipeline executes the filters sequentially and uses the output of one filter as the input for the next one. Therefore the order in which filters are configured can make a difference. However tracks can also be muted by the experimenter. In that case, a special mute filter is used to get a still image or silent audio frame, depending on the track type. If muted, "normal" filters are not executed by default. Filters can however be executed even if muted by setting the run_if_muted flag. This can be used to ensure that analysis filters are always executed.
Filter Pipeline

Networking

In order to provide a low latency, high quality video and audio experience the platform is build using WebRCT. WebRCT is a open source, standardized solution that is built upon existing and proven standards. It is built into modern browsers and accessible via a JavaScript API. In essence, it allows websites to access webcams, microphones and screen recorders. Most importantly, it can be used to build connections to other peers, which can be other browsers or servers.
For the networking architecture, I compared three common approaches for WebRTC which can be seen bellow. The first approach is a peer to peer (P2P) based approach. P2P has its advantages and is supported by WebRTC, only requiring a signaling server to build, but not to maintain, a P2P connection. However a P2P approach would mean that clients need to run the computationally intensive filters, which would increase the requirements for participants. Additionally, a P2P approach lacks any central authority that can remove misbehaving users, including possible attackers, and manages the potentially sensitive user data.
The Multipoint Control Unit (MCU) approach introduces a server, which all clients connect to directly. The server acts as a central authority and can run the computationally intensive filters. It merges incoming streams and distributes the merged stream. However this merging process takes up additional resources, limits the clients user interface (UI) in the the way the streams of other users can be displayed and can be more difficult to debug.
The Selective Forwarding Unit (SFU) approach is similar to the MCU, featuring all the above benefits but distributes streams individually instead of merging them on the server. This allows the UI full freedom in how a stream is displayed and avoids unnecessary computational costs for the server.
Conferencing Platform Possible Networking Architectures: P2P, MCU and SFU

Testing

Both Google Chrome and Firefox provide a dedicated page for analyzing and debugging WebRCT connections (chrome://webrtc-internals and about:webrtc). They show statistics and logs for all connections that currently exist within the browser instance. While they can be very helpful in low level connection debugging, they do not allow you to interact with the connection and can not measure more enhanced statistics such as the end-to-end video latency or message round trip time. To fill this gap I developed two testing tools as part of the frontend.

General Connection Test Tool

This tool served as a general testing frontend to test different aspects of the backend. First of all different user types, password or session and participant IDs can be used to connect as different users. After connecting to the sever the tool displays and sends a local steam, which is a local webcam & microphone track, and receives it back from the server (remote stream). The remote stream is what users in the final frontend will see, since the server applied filters to it. A selection of API endpoints and filter presets can then be used.

End-to-End Video Latency & API Latency Test Tool

The second tool allows the measurement of end-to-end video latency and API latency. To measure the video latency, a QR-Code containing a timestamp is attached to each frame before it is sent to the backend. Then, when the tool receives a frame back, it parses the QR-Code and measures the difference between the time the frame was received and the time within the QR-Code.
Measuring the API latency is achieved by sending simple PING messages with every frame and measuring the time it takes until the server sends them back. Additional padding or junk data can be added to the messages to artificially increase their size.
After a measurement was conducted, the tool calculates an overview over the test. This incudes information about the end-to-end video latency, API latency, framerate, QR-Code generation & parsing times and other useful statistics. Latency over time and other statistics are also visualized in graphs made with Chart.js.

Interested? Find out More

© 2023 Alexander Liebald