Carnegie Mellon University

silver microphone with fuzzy rainbow headphones

April 03, 2025

Protecting Audio Privacy at the Source

Kirigrami Provides On-Device Speech Filtering for Audio Sensing

By Charlotte Hu

Sound is a powerful source of information.

By training algorithms to identify distinct sound signatures, sound can reveal what a person is doing, whether it's cooking, vacuuming or washing the dishes. And while it's valuable in some contexts, using sound to identify activities comes with privacy concerns, since microphones can reveal sensitive information.

To allow audio sensing without compromising privacy, researchers at Carnegie Mellon University developed an on-device filter, called Kirigami, that can detect and delete human speech segments collected by audio sensors before they're used for activity recognition.

"The data contained in sound can help power valuable applications like activity recognition, health monitoring and even environmental sensing. That data, however, can also be used to invade people's privacy," said Sudershan Boovaraghavan, who earned his Ph.D. from the Software and Societal Systems Department (S3D) in CMU's School of Computer Science. "Kirigami can be installed on a variety of sensors with a microphone deployed in the field to filter speech before the data is sent off the sensor, thus protecting people's privacy."

Many existing techniques for preserving privacy in audio sensing involve altering or transforming the data — excluding certain frequencies from the audio spectrum or training the computer to ignore human speech. While these methods are fairly effective at making conversations indecipherable to humans, generative AI has complicated matters. Speech recognition programs like Whisper by OpenAI can piece together fragments of conversations from processed audio that were once inscrutable.

"Given the sheer amount of data these models have, some of the prior techniques would leave enough residual information, little snippets, that may help recover part of speech content," said Yuvraj Agarwal, an associate professor in S3D, the Human-Computer Interaction Institute (HCII), and the Electrical and Computer Engineering Department in the College of Engineering. "Kirigami can stop these models from having access to those snippets."

In today's world, devices like smart speakers that prioritize utility over privacy can essentially eavesdrop on everything people say. While the most aggressive privacy-preserving option would be to avoid using microphones, such an action would stop people from reaping the benefits of a powerful sensing medium. Agarwal and his collaborators wanted to find a solution for developers that would allow them to balance privacy and utility.

The researchers' intuition was to design a lightweight filter that could run on even the smallest, most affordable microcontrollers. That filter could then identify and remove likely speech content so the sensitive data never leaves the device — what's often called processing on the edge. 

The filter works as a simple binary classifier of whether there's speech in the audio. The team designed the filter by empirically analyzing the leaked speech content recognition rate from deep-learning-based automatic speech recognition models.

Kirigami also balances how aggressively it removes possible speech content with a configurable threshold. With an aggressive threshold, the filter prioritizes removing speech but may also clip some nonspeech audio that could be useful for other applications. With a less aggressive threshold, the filter allows more environmental and activity sounds to pass for better application values but increases the risk of some speech-related content making it beyond the sensor.

"Kirigami cuts out most of the speech content but not the other ambient sounds that you care about for activity recognition," said Haozhe Zhou, an S3D doctoral student who led the project with Boovaraghavan. "You can still couple it with prior techniques to give you additional privacy."

Researchers are currently exploring many useful applications for activity sensing. For example, Mayank Goel, an associate professor in S3D and the HCII, uses audio sensing to remind people living with dementia of daily tasks, monitor children with attention-deficit/hyperactivity disorder for behavioral abnormalities, and assess students for signs of depression.

"These are just examples that are being done in our labs," Goel said. "You will find similar scenarios all across the world where you need noninvasive data from the person about their daily life."

As the interest in smart home infrastructure and the Internet-of-Things continues to grow, the team believes that developers could easily tweak Kirigami to suit their unique privacy needs.

Papers detailing Kirigami appeared in both the Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies and ACM MobiCom '24: Proceedings of the 30th Annual International Conference on Mobile Computing and Networking.