Automated real-time audio monitoring

With live broadcast, it is important to ensure that the output is always “broadcast safe” and does not contain any inappropriate sounds or language. Identifying and dealing with unsafe audio can be very challenging, especially when there are numerous audio sources as the operator must listen to each stream one-by-one. It can often take a minute or more to find the source if the sound is intermittent, during which time the broadcast has to remain muted.

AudioWatch is an audio AI system that uses automated audio tagging to identify potentially problematic audio and highlights the offending source on a video multi-viewer. This system not only alerts an operator to the presence of unsafe audio and the type of sound, but also shows them exactly which stream it appears on. This allows them to deal with the issue within seconds rather than minutes.

Credit: BBC Studios

To create AudioWatch, we used the PANNs pre-trained classifier to detect the probability of various unsafe sounds occurring. We then optimised the trigger thresholds to create the right sensitivity for our target use case. As not much unsafe audio makes it to broadcast, there is very little training data on which to do the optimisation. To get around this, we used Scaper to programmatically generate thousands of hours of unsafe audio by mixing sound effects and speech in with clean broadcast output. As the position of the unsafe audio was known, we could calculate a confusion matrix of true and false positives/negatives so that we could set the right trigger points.

AudioWatch is in regular use by the BBC, and has been since 2021. Most notably, it has been employed by the Natural History Unit for Springwatch, Autumnwatch, and Winterwatch. These programmes are often accompanied by 12-hour live streams, which need to be monitored. With AudioWatch, a single operator is able to monitor the streams and very quickly deal with any problems.

Related links