3D Audio Rendering app

An audio solution for android games which support google card board using HRTF and google audio API.

PROBLEM

Over the last few years, virtual reality (VR) industry has gained some growth while its audio experience has been limited to existing stereo sound. The limitation extends towards the ability to localize where the sound is coming from further limiting the ultimate virtual reality experience in gaming and entertainment. Realistically sound is perceived by each individual differently. Therefore the feeling of being in a virtual environment would not be truly possible without a localize sound sources.

SOLUTION

To Fix this problem I wanted to ease the measurement of HRTF and combine it with Head tracking feature in Android devices. To be more exact, Head Related Transfer Function (HRTF), is a frequency response function that characterizes how an ear perceives sound when from a particular source in space. Originally, calculation of an individual's HRTF is done in more sophisticated environments. With technological advancement in machine learning, 3D modelling and cloud computing, it is possible to build HRTF by using high quality phone cameras. Sound localisation is also possible by using real time head tracking and 3D geometry. Applying these transformations in real time on retail hardware will bring an immersive audio experience to a greater population.

MY ROLE

Ideation
Creating the game in Unity
Prototyping the Android app

Project Objective

The objective of Project Odyssey is to create a 3D audio rendering software that uses recent technological advancements in audio engineering for Head-Related Transfer Function (HRTF), 3D modelling, machine learning, and real-time head tracking. This software will personalize sound for each individual, massively expanding the spatial depth and simulating sound localization. A machine learning algorithm would be developed that correlates the user's ears and head anatomy to their HRTF profile. This will require the use of a pre-existing database of Head-Related Transfer Function (HRTF) based on anatomical features which are publicly available.We will be able to synthesize the HRTF profile for the individual user that would otherwise take hours and costly special equipment to measure. The profile will then be paired with real-time head tracking to create immersive 3D sound.

Block Diagram and System Architecture

In the initial proposed design, an android application to run the audio rendering software was proposed. It is connected with Google Cardboard VR (head tracking subsystem), a backend service (HRTF synthesis subsystem), and a data management subsystem. The backend service and the machine learning service were coupled which gave us less flexibility to iteratively change and improve both components separately. I was responsible for developing the android application and the Unity game so I will focus on explaining those in this design

The mobile subsystem comprises 2 major components which work together. The first component is the camera module that is used to take pictures. Second is the mobile application that is used to interact with the camera module of the mobile subsystem to take pictures of their ears and head. This data is securely transmitted to the backend using the HTTP interface provided by the backend. This data will then be stored in the data management subsystem. This application also allows the user to download the HRTF profile after inference is complete and a predicted HRTF profile is available for download. The user can also launch the second app to run 3D audio, which is explained in the head tracking subsystem. The head tracking subsystem includes 2 major components. First is the Google Cardboard VR component that provides sensor data for head tracking. The second component is the Unity API that helps localize the head relative to the environment. It used the HRTF profile and performed data fusion to generate the 3D spatial sound.

Detailed Design

Explanation: The photos represent the setting of Unity game engine which get the HRTF data for user's ear.

The unity part can play the video or switch to the next/previous video by gazing at it for 3seconds. We can turn our head and jump to some positions that we chose. At different points you can feel the 3D audio effect based on the different sofa files and the source of the sound is the TV. Firstly, we found many FBX models (3D model saved in the Autodesk Filmbox format) online and imported those models to unity project. Our previous design was that the person could walk in the room. However, when we started the project, we realized the only way to interact with this software is gazing. We cannot move in the room so we picked some points in the room and we could jump to that point by gazing at it for 3 seconds. We avoided the drift problems by using this way because we can only turn our head and we are not allowed to move. The latency is very small. During the testing ,we cannot feel the latency when we turn our head. Secondly, for handling the hrtf sofa file. We found the steam audio can help us to do it. After we downloaded the steam audio, we could import it to the unity project. Then we implemented google headtrackign audio API so when the user would turn their head it would localize the sound accordingly.

App Functionalities

The user will first prompt with a welcome page that shows how they should position the cardboard on their head and also a visual picture of what to expect from using the app.Next, user will be prompted to the pages where they need to take a picture of their ears and their face. After each step we are using a retrofit library to communicate the data to the backend. This part will be discussed in detail in the Backend portion of the mobile app. After the user has uploaded all 3 pictures, they will be prompted to the final page where they can download the link to their HRTF profile and also use that to launch the Unity app where the demo of 3D audio will be played for them. Therefore, we have 5 different activities where user is guided and the required information to create the HRTF profile is collected from the user. and then the user will be forwarded to the main virtual envirement shown below where they can interact with the environment and play sound from different places in the room.