--- 8 July 2016 ---

The Implementation

The STEAK project extends the open-source software Asterisk, a widely used PBX server. The extensions allows to provide one virtual telephone conferencing space per telephone conference with 3D Audio if desired and technically possible. Asterisk by itself provides connectivity to almost every telephone technology and is thus backwards compatibility on its own.1

And their nice and active community is a plus.

Asterisk was extended with two features:

###1. Stereo Functionality So far Asterisk is only capable of handling mono signals (one channel) and must be extended to be able to transmit spatially rendered stereo signals. This is achieved by enabling support for the speech and audio codec OPUS. Beside stereo transmission, this codec provides state-of-the-art and can also be used to transmit high-end audio content. In fact, until now Asterisk only supported mono signals and thus the internal signal processing needs to be adapted.

###2. Spatial Rendering for Spatial Presentation In addition to stereo support, the default conference bridge of Asterisk (app_confbridge.c) is extended by spatial rendering capability for binaural representation.

The Code

The source code of the STEAK project can be found on Github: https://github.com/SteakConferencing.

This includes:

The Details

In the following, the technical details of the STEAK project are conceptually explained. For more details, please take a look at the source code.

Adding the Stereo-capable Codec OPUS

The audio codec OPUS was selected for the implementation of the STEAK project. This codec allows to transmit two audio signals in one Real-time Transport Protocol (RTP) connection2. In difference to using two RTP-connections, which is possible but a non-standard approach, this avoids to synchronize the received audio streams at the client-side, as the synchronization is handled by the codec itself. OPUS was choosen over alternatives (e. g., AMR-WB+) as (a) it is recommended for WebRTC, (b) it can be used to send speeech and audio content, and (c) also provides on-the-fly adjustments (e. g., bandwidth and compression adjustments). Sadly, Asterisk does not (yet?) include OPUS due to potential issues with patent infringments in the USA and the potential legal risks (see here). Nevertheless, patches for Asterisk are available that add OPUS support (passthrough and signal processing): https://github.com/meetecho/asterisk-opus.

This modification boils down to adding the files codec/codec_opus.h and codec/ex_opus.h as well as adding the include flags and linker flags for libopus to the build process of Asterisk.

Modifying Internal Signal Processing

Asterisk provides signal processing only for mono signals, as almost any telephone-related system. Adding stereo to Asterisk is however straight forward. Out-of-the-box Asterisk provides internal (mono) translation between different sampling rates and also codecs. This functionality enables Asterisk to connect telephone calls between clients that use different codecs while Asterisk handles the codec translation.

Required changes are:

Signaling

Before an actual telephone call, a client connects to the remote party and signals its interest in establishing a call. In this so-called signaling phase the client and the remote party inform each other about their interest, their technical capabilities (mainly supported codecs), and connection information. A client that supports sending and receiving stereo via OPUS needs to annouce this capability and Asterisk was extended to also annouce, understand, and this capability and flag the connection to be stereo-capable.

In terms of Asterisk, the connection between a client and Asterisk is denoted as channel. The data structure describing the channel is extended to contain the information about the stereo capability. Precisely, this modification is done in struct ast_trans_pvt by adding the boolean value stereo.

Internal Audio Signal Handling

If in the signaling phase stereo capability was annouced by both sides and is going to be used, the internal signal processing of Asterisk must be aware of this. Internally Asterisk uses buffers containing mono audio data (depending on the use: compressed or uncompressed). These buffers were not modified but instead can now also be filled with stereo audio data (interleaved). Here, a buffer contains alternatingly one sample per channel (left and right).

A translation between stereo channels and mono channels is also implemented for sake of completeness. However, this is in general only wasting processing power and both parties should negate to a mono-only transmission.

Extending the Default Conference Bridge

The default conference bridge (app_confbridge.c) mixes the incoming mono signals of all connected channels. Here, a voice activity detection algorithm is applied (i. e., non-speaking participants are not mixed), volume adjustments applied, and the resulting mono signals are added together into one mono signal for each participant. The default conference bridge is modified, so mono channels as well as stereo channels can be connected. For stereo-capable channels a spatial presentation is rendered via convolution using libfftw for the left ear and right ear. The rendered signals of all participants are then mixed together by ear, respectively. All mono channels receive the signals mixed with the same algorithm as the default conferencing bridge while all stereo-capable channels will receive a spatial representation.

The following limitation are introduced:

One note about scheduling:

No multi-threading is implemented for the STEAK-enhanced conference bridge (or used in libfftw), i. e., a running conference is rendered using one thread/processor only. This reduces implementation effort and, furthermore, scheduling for the whole system is handled by Asterisk itself.

That’s all.

  1. One reason for choosing Asterisk over other systems was that a lot of prior knowledge about Asterisk and its internal signal processing was available in the team.

  2. For an overview on multi-channel enabled RTP profiles see Wikipedia.