This paper proposes a deep learning framework for classification of BBC television programmes using audio. The audio is firstly transformed into spectrograms, which are fed into a pre-trained convolutional Neural Network (CNN), obtaining predicted probabilities of sound events occurring in the audio recording. Statistics for the predicted probabilities and detected sound events are then calculated to extract discriminative features representing the television programmes. Finally, the embedded features extracted are fed into a classifier for classifying the programmes into different genres. Our experiments are conducted over a dataset of 6,160 programmes belonging to nine genres labelled by the BBC. We achieve an average classification accuracy of 93.7% over 14-fold cross validation. This demonstrates the efficacy of the proposed framework for the task of audio-based classification of television programmes.
23rd August 2021
Interactive Audio Augmented Reality (AAR) facilitates collaborative storytelling and human interaction in participatory performance. Spatial audio enhances the auditory environment and supports real-time control of media content and the experience. Nevertheless, AAR applied to interactive performance practices remains under-explored. This study examines how audio human-computer interaction can prompt and support actions, and how AAR can contribute to developing new kinds of interactions in participatory performance.This study investigates an AAR participatory performance based on the theater and performance practice by theater maker Augusto Boal. It draws from aspects of multi-player audio-only games and interactive storytelling. A user experience study of the performance shows that people are engaged with interactive content and interact and navigate within the spatial audio content using their whole body. Asymmetric audio cues, playing distinctive content for each participant, prompt verbal and non-verbal communication. The performative aspect was well-received and participants took on roles and responsibilities within their group during the experience.12th February 2021
Podcasts are mostly played using screen-based devices, but despite this, podcasts do not utilize the opportunity for screen-based interaction. There have been efforts to innovate around playback interfaces through the development of “enhanced podcasts”, but there have not been any formal studies on this approach. We developed a prototype podcast player that uses interactive visual elements to enhance the audience’s experience.
In this paper, we describe a formal remote qualitative study that investigated how podcast listeners interacted with our prototype in three different environments. Participants rated chapterisation as the most important feature, followed by links, images and transcripts. The features of our prototype worked best when listening at home, but certain features were valued when used on public transport.19th October 2020
Co-leading the BBC’s audio research team to help the BBC deliver innovative and high-quality audience experiences. Using our expertise in sound, we create and evaluate prototypes, tools, and trial productions. We transfer our research by working with industry, academia, standards bodies, and the wider BBC.
10th February 2020
Developing and steering the content for the main IBC conference programme, to ensure the technology topics covered reflect the real-world priorities and concerns of technology leaders in broadcast and media.
30th January 2020
Award received at the 16th EuroVR International Conference for our poster on “Designing an Interactive and Collaborative Experience in Audio Augmented Reality”.
25th October 2019
Audio Augmented Reality (AAR) consists of adding spatial audio entities into the real environment. Existing mobile applications and technologies open questions around interactive and collaborative AAR. This paper proposes an experiment to examine how spatial audio can prompt and support actions in interactive AAR experiences; how distinct auditory information influence collaborative tasks and group dynamics; and how gamified AAR can enhance participatory storytelling. We are developing an interactive multiplayer experience in AAR using the Bose Frames Audio Sunglasses. Four participants at a time will go through a gamified story that attempts to interfere with group dynamics. We here present our AAR platform and our collaborative game in terms of experience design. Finally, we detail the testing methodology and analysis that we will conduct to answer our research questions and to suggest methodologies for participatory storytelling in AAR.23rd October 2019
Podcast players make poor use of the capabilities of the screen-based devices people use to listen. We present a podcast playback interface that displays charts, links, topics and contributors on an interactive transcript-based interface. We describe how we used paper prototyping to design the interface and what we learnt by doing so. We share preliminary results from a public online evaluation of the interface, which indicate that it was well-received. The new features were considered interesting, informative and useful, with charts and transcripts emerging as the most popular features.30th April 2019
PhD thesis nominated for the British Computer Society “Distinguished Dissertations” competition.
30th April 2019
This document is based upon the previous D2.3 Interim Pilot Progress Report. It describes the final status of both phases of the ORPHEUS pilot. The first phase was divided into three stages, with the first being a live production of an interactive object-based radio drama experienced using a web browser, the second a selection of material encoded using MPEG-H and made available through an iPhone and AV receiver, and the third being an ‘as-live’ broadcast, live encoded using MPEG-H and made available over the Internet. In the second phase object-based audio productions from pilot phase 1 were enhanced with interactive functionalities for on-demand consumption.30th May 2018
The Mermaid’s Tears is an immersive and interactive radio drama created by the BBC as part of Orpheus. Listeners can follow one of three characters and switch between them during the programme, which gives three different perspectives on the same story. Listeners can also experience the drama in stereo, surround sound and binaural. Chris will describe how the BBC produced the drama as a live object-based broadcast, the tools they developed in order to achieve this, and the feedback they received from a large-scale public trial.15th May 2018
Radio production is a creative pursuit that uses sound to inform, educate and entertain an audience. Radio producers use audio editing tools to visually select, re-arrange and assemble sound recordings into programmes. However, current tools represent audio using waveform visualizations that display limited information about the sound. Semantic audio analysis can be used to extract useful information from audio recordings, including when people are speaking and what they are saying. This thesis investigates how such information can be applied to create semantic audio tools that improve the radio production process.30th April 2018
The radio production workflow typically involves recording material, selecting which parts of that material to use, and then editing the desired material down to the final output. Some radio producers find this process easier with paper rather than editing directly on a screen, which makes a transcript the common denominator. However, after deciding which audio they want to use, producers then must use a digital audio workstation to manually execute those editorial decisions, which is a tedious and slow process. In this paper, the authors describe the design, development, and evaluation of PaperClip, a novel system for editing speech recordings directly on a printed transcript using a digital pen. A user study with eight professional radio producers compared editing with the digital pen to editing with a screen interface. The two interfaces each had advantages and disadvantages. The pen interface was better for fast and simple editing of familiar audio when accurate transcripts were available. The screen interface was better for more complex editing with less familiar audio and less accurate transcripts. There was no overall preference.29th April 2018
Object-based audio is a revolutionary approach to broadcasting that enables the production and delivery of immersive, interactive and accessible listening experiences. Chris will start by presenting an overview of the opportunities and challenges of object-based audio. He will describe how BBC R&D designed and built an experimental radio studio and an end-to-end object-based broadcast chain. Finally, he will discuss how the studio was used to deliver the world’s first live interactive object-based radio drama, as part of the Orpheus collaborative project.12th April 2018
Radio production involves editing speech-based audio using tools that represent sound using simple waveforms. Semantic speech editing systems allow users to edit audio using an automatically generated transcript, which has the potential to improve the production workflow. To investigate this, we developed a semantic audio editor based on a pilot study. Through a contextual qualitative study of five professional radio producers at the BBC, we examined the existing radio production process and evaluated our semantic editor by using it to create programmes that were later broadcast. We observed that the participants in our study wrote detailed notes about their recordings and used annotation to mark which parts they wanted to use. They collaborated closely with the presenter of their programme to structure the contents and write narrative elements. Participants reported that they often work away from the office to avoid distractions, and print transcripts so they can work away from screens. They also emphasised that listening is an important part of production, to ensure high sound quality. We found that semantic speech editing with automated speech recognition can be used to improve the radio production workflow, but that annotation, collaboration, portability and listening were not well supported by current semantic speech editing systems. In this paper, we make recommendations on how future semantic speech editing systems can better support the requirements of radio production.22nd March 2018
This document aims to present an overview of work that has been carried by the various partners in the ORPHEUS consortium related to the renderers for object-based broadcasting clients. Description of the demos and documentation is provided to illustrate the development related to the rendering in the web browser, in IP studio, in the pre-processor, in the AV receiver and in the iOS mobile app.8th March 2018
This document describes the final reference architecture of ORPHEUS, a completely object-based, end-to-end broadcast and production workflow. It has been the subject of intensive discussions and several iterations over the duration of the project and has been shaped by considering typical channel-based broadcast workflows as well as the knowledge gained and lessons learned from the pilot phases. The architecture is format and interface agnostic and, as far as is possible, it should be applicable to a range of different infrastructures and ecosystems. Additionally, the pilot implementation and integration is also summarised.7th March 2018
Two different user interfaces have been created to explore personalisation of, and interaction with, object-based broadcasts. An app on an iPhone demonstrates personalisation by enabling the listener to change the balance between different elements of a programme, or to have the sound adapted automatically to one of several pre-defined listening environments. The app also allows the user to choose a shorter or longer version of a programme, to jump to points of interest, and to see transcripts. The web browser interface developed for The Mermaid’s Tears drama allows the listener to choose between parallel narrative threads, supporting the story-telling with images. The choice of stereo, binaural, or surround sound reproduction is also offered.12th January 2018
Object-based media is a revolutionary approach for creating and deploying interactive, personalised, scalable and immersive content. It allows media objects to be assembled in novel ways to create new and enhanced user experiences. Object-based media is flexible and responsive to user needs as well as environmental and platform-specific factors. ORPHEUS is a H2020-funded EU project involving ten European major players – broadcasters, manufacturers and research institutions. During a 30-month project, we develop, implement and validate an object-based end-to-end media chain for audio content. We are running two pilots to demonstrate both linear and non-linear audio experiences using a custom-built broadcast chain. The first pilot was a live radio broadcast with enhanced functionalities, including immersive sound, foreground/ background control, language selection, and in-depth programme metadata. This paper presents initial results of the first pilot, explains the challenges in developing this system and outlines the innovative tools that were created for recording, mixing, monitoring, storing, distributing, playing-out and rendering of object-based audio. To encourage the support of the broadcast industry in adopting this new technology, ORPHEUS is working towards the publication of a reference architecture and general guidelines for successful implementation of object-based audio in a real-life broadcast environment.1st September 2017
The move toward IP end-to-end between media producers and audiences will make the new broadcasting system vastly more agnostic to data formats and a diverse set of consumption and production devices. In this world, object-based media becomes increasingly important, i.e., delivering efficiencies in the production chain, enabling the creation of new experiences that will continue to engage the audience, and giving us the ability to adapt our media to new platforms, services, and devices. This paper describes a series of practical case studies of our work in object-based user experiences since 2014. These projects encompass speech audio, online news, and enhanced drama. In each case, we are working with production teams to develop systems, tools, and algorithms for an object-based world—these technologies and techniques enable its creation (often using traditional linear media assets) and post-production—and transforming user experience for audiences and production.1st August 2017
13th June 2017
This document describes the current status of Orpheus pilot phase 1, which has been divided into three stages. The first stage is a live production of an interactive object-based radio drama that can be experienced using a web browser. The second is a selection of material encoded using MPEG-H and made available through an iPhone and AV receiver. The third is an ‘as-live’ broadcast, live encoded using MPEG-H and made available over the Internet.1st June 2017
When radio podcasts are produced from previously broadcast material, thumbnails of songs that were featured in the original program are often included. Such thumbnails provide a summary of the music content. Because creating thumbnails is a labor-intensive process, this is an ideal application for automatic music editing, but it raises the question of how a piece of music can be best summarized. Researchers asked 120 listeners to rate the quality of thumbnails generated by eight methods (five automatic and three manual). The listeners were asked to rate the editing methods based on the song part selection and transition quality in the edited clips, as well as the perceived overall quality. The listener ratings showed a preference for editing methods where the edit points were quantized to bar positions, but there was no preference for whether the chorus was included or not. Ratings for two automatic editing methods were not significantly different from their manual counterparts. This suggests that automatic editing methods can be used to create production-quality thumbnails.1st June 2017
2nd May 2017
This deliverable describes the progress on representation, archiving and provision of object-based audio. It builds on D4.1 “Requirements for representation, archiving and provision of object-based audio”. It lists the formats which are selected for ORPHEUS and describes the interim status of the implementation of these formats. On the production side, formats like BW64, ADM, NMOS, UMCP are used and explained. BW64 is also used for archiving. For provision or distribution MPEG-H and AAC + ADM metadata are selected. Both solutions use MPEG-DASH for streaming. This Deliverables also serves as documentation for milestone MS12 “Initial implementation and documentation of a format for provision of objected-based audio” which has been achieved on 31/03/17.1st May 2017
Deliverable D2.2 provides an overview about the current state of the pilot implementation architecture and its influence on the final reference architecture. Moreover, the integration activities so far are summarised. The reference architecture will be developed during the project time, based on experiences from the project pilots. A detailed explanation regarding the distinction between reference architecture and pilots is included. The planned workflow of the pilots is described as well as the current state of macroblocks and their components. This version of D2.2 includes updates, which were requested by the EC reviewer after the first submission. Hence, interfaces were identified and where necessary, described in more detail. This document will be updated once more during the project.1st May 2017
This report of a demonstrator gives a brief overview on the live object-based production system being developed as part of the Orpheus project. The design of the fundamental technology that drives the system is discussed, and the details of the current implementation are described.1st December 2016
Object-based audio is a revolutionary new way of broadcasting that unlocks a range of new audience experiences. In his talk, Chris will explain how it works, discuss why it matters and present a number of public-facing experiments the BBC have run. He will also take you behind-the-scenes on a next-generation radio studio the BBC are building to be able to deliver these new experiences.6th October 2016
This document aims to identify the main technical and end-user requirements for the reception, presentation and personalised consumption of broadcast object-based content. Its purpose is to serve as guidelines for the design, implementation and assessment of technical solutions that will be developed during the project. Usability specifications are exemplified through various mock-ups and user scenarios. Several hardware and software solutions are considered for the end-user device in order to cover different audio content consumption situations (e.g. domestic use vs mobility). Personalisation and interactivity features are also listed and will require the design and development of user interfaces handling various input devices (e.g. touch screen, GPS sensors, microphone).1st October 2016
The deliverable D2.1 provides an overview about the current state of the pilot implementation architecture and its influence on the final reference architecture. The reference architecture will be developed during the project time, based on experiences from the project pilots. A detailed explanation regarding the distinction between reference architecture and pilots is included. The planned workflow of the pilots is described as well as the current state of macroblocks and their components. This document will be updated twice during the project.1st July 2016
Leading the BBC’s research into audio production tools. Managed the production workstream of the €4M ORPHEUS European research project on object-based broadcasting. Led the production of the world’s first object-based radio drama. As part of my PhD, I also developed a novel system for digitally editing media using paper and pen.
1st July 2016
This document presents high-level descriptions of live radio production, including workflow diagrams, roles activities and timelines. Proposed workflows for object-based productions are also described.1st April 2016
Received for organising the “Sound: Now and Next” conference.
1st July 2015
Audio editing is performed at scale in the production of radio, but often the tools used are poorly targeted toward the task at hand. There are a number of audio analysis techniques that have the potential to aid radio producers, but without a detailed understanding of their process and requirements, it can be difficult to apply these methods. To aid this understanding, a study of radio production practice was conducted on three varied case studies—a news bulletin, drama, and documentary. It examined the audio/metadata workflow, the roles and motivations of the producers, and environmental factors. The study found that producers prefer to interact with higher-level representations of audio content like transcripts and enjoy working on paper. The study also identified opportunities to improve the work flow with tools that link audio to text, highlight repetitions, compare takes, and segment speakers.1st May 2015
29th April 2015
16th December 2014
Developed and evaluated a ground-breaking production tool that uses speech-to-text to allow professional producers to edit media directly using a transcript. Led the organisation of ‘Sound: Now and Next’ - a two-day event on audio technology with over 180 attendees. Worked with BBC Music to add intelligence to their 30-second music track preview generator.
1st April 2014
Promoted science, technology, engineering and maths subjects by exhibiting at science fairs and giving talks at schools to crowds of up to 600.
1st March 2014
Music emotion recognition typically attempts to map audio features from music to a mood representation using machine learning techniques. In addition to having a good dataset, the key to a successful system is choosing the right inputs and outputs. Often, the inputs are based on a set of audio features extracted from a single software library, which may not be the most suitable combination. This paper describes how 47 different types of audio features were evaluated using a five-dimensional support vector regressor, trained and tested on production music, in order to find the combination which produces the best performance. The results show the minimum number of features that yield optimum performance, and which combinations are strongest for mood prediction.1st January 2014
In this paper we present and evaluate two semantic music mood models relying on metadata extracted from over 180,000 production music tracks sourced from I Like Music (ILM)’s collection. We performed non-metric multidimensional scaling (MDS) analyses of mood stem dissimilarity matrices (1 to 13 dimensions) and devised five different mood tag summarisation methods to map tracks in the dimensional mood spaces. We then conducted a listening test to assess the ability of the proposed models to match tracks by mood in a recommendation task. The models were compared against a classic audio contentbased similarity model relying on Mel Frequency Cepstral Coefficients (MFCCs). The best performance (60% of correct match, on average) was yielded by coupling the fivedimensional MDS model with the term-frequency weighted tag centroid method to map tracks in the mood space.1st November 2013
Received for organising the BBC Audio Research Showcase event.
1st June 2013
4th May 2013
Classification of music by mood is a growing area of research with interesting applications, including navigation of large music collections. Mood classifiers are usually based on acoustic features extracted from the music, but often they are used without knowing which ones are most effective. This paper describes how 63 acoustic features were evaluated using 2,389 music tracks to determine their individual usefulness in mood classification, before using feature selection algorithms to find the optimum combination.1st May 2013
7th March 2013
6th February 2013
Organised monthly evening lectures on audio engineering at venues around London. Began video recording lectures and created the AES UK YouTube channel.
1st January 2013
In this paper we present the design processes of a spatial audio system for Surround Video. Surround Video is a method of reproducing two simultaneous video streams captured by two cameras onto a main television screen and onto the walls of a room via a projector. Through the use of distortion software to correctly map the surround image to the geometry of the viewing room, the user experiences 180 degrees of video reproduction, immersing them in the content. The design of a spatial audio system was necessary to give 360 degree coverage of audio so that, like for the video, the viewer is immersed into the programme world. We discuss the design process and decisions made that concluded in using a mixed reproduction system of Vector Base Amplitude Panning with Ambisonics to match the audio localisation precision with the video precision; high localisation around the main monitor image whilst the surrounding audio is immersive, but with less localisation. Attributes associated with objects in the real world are discussed and methods for recreating the effect of distance, in-head panning, sound scene rotations, reverberation and movement that alter the reverberation placement are presented. The end result is an immersive video and audio system that can be used by the BBC Research & Development department to demonstrate the potential of such technologies where the audio system uses 14 loudspeakers, a subwoofer signal and a discrete ‘4D’ type effects channel.1st December 2012
A system, Vambu Sound, was developed for BBC R&D to create a spatial audio production environment. The specification of the system is to provide good localisation around a main television screen and diffuse sound from around the listener. The developed system uses Vector Base Amplitude Panning for six loudspeakers in front of the listener and Ambisonics for eight loudspeakers in the corners of a cube configuration. The system is made four-dimensional by the incorporation of a dedicated haptic feedback channel within the audio format. The system design and implementation are presented and responses from a demonstration are evaluated.1st March 2012
1st July 2011
10th March 2011
Shared prize for an interactive audio-visual piece called “The Cut-up”, in collaboration with Charlesworth, Lewandowski & Mann.
1st October 2010
Used machine learning to develop a model that maps music to a mood, trained using 128k tracks on a supercomputing cluster. Worked with Olafur Eliasson to create an interactive art piece exhibited in Tate Modern. Led the development of a prototype singing analyser for a proposed new TV show.
1st September 2010
Helped organise and run an independent charitable project in Kilifi, Kenya, sponsored by Google with support from local charity Moving the Goalposts. The project was featured in The Guardian.1st June 2010
Received for my research on spatial audio.
1st June 2010
As the world’s biggest broadcaster, the BBC transmits over 400 hours of audio content every day – the vast majority of which is in stereo. This paper will look at why the BBC is interested in Ambisonics, and describe recent experiences in trying out the technology in its first-order format. Two subjective listening tests are described, which attempt to discover how Ambisonics compares to current technology, and how much the height dimension contributes towards the listening experience. Finally, some suggestions are made on how to make Ambisonics more accessible, in the hope that more Ambisonic content would be created as a result.1st May 2010
This paper considers Ambisonics from a broadcaster’s point of view: to identify barriers preventing its adoption within the broadcast industry and explore the potential advantages were it to be adopted. This paper considers Ambisonics as a potential production and broadcast technology and attempts to assess the impact that the adoption of Ambisonics might have on both production workflows and the audience experience. This is done using two case studies: a large-scale music production of “The Last Night of the Proms” and a smaller scale radio drama production of “The Wonderful Wizard of Oz”. These examples are then used for two subjective listening tests: the first to assess the benefit of representing height allowed by Ambisonics and the second to compare the audience’s enjoyment of first order Ambisonics to stereo and 5.0 mixes.1st May 2010
19th April 2010
2nd March 2010
As part of a two-year graduate scheme, I did four placements around the BBC. I worked with IBM to investigate software-oriented architecture within back-end radio systems, used CUDA to create a real-time objective video quality assessment system for HD video, experimented with Ambisonics as a method for recording and broadcasting spatial audio, and wrote a strategic report for BBC Radio on audio codecs for contribution feeds.
1st September 2008
1st June 2008
1st October 2007
Summer student placement developing test software for Meridian’s room correction system and refurbishing the listening room.
1st July 2007
Managed a team of twelve to run PA and lighting systems for medium-sized venues, including recruitment, training, maintenance and scheduling.
1st July 2007
Setting up and operating PA and lighting systems for medium-to-large venues in and around York. Highly team-focused and late hours.
1st September 2005
1st June 2004