Let’s look at use-cases beyond music recognition. These are personalized playlists and the ability to relive the concert of your favorite band.
Photo: Immo Wegmann. Source: Unsplash.com
What is an acoustic print
It is a way of presenting an audio recording in a compact form. Essentially, a fingerprint contains a set of values that describe the physical parameters of a sound.
There are different approaches to forming such prints. In most cases, they involve finding frequencies with maximum amplitude in the spectrogram, but the algorithm is defined by application developers. There are open source solutions, such as the .NET soundfingerprinting library, which uses locality-sensitive hashing (LSH) to determine the “similarity” of fingerprints. Another example is the dejavu framework, which implements an algorithm in Python.
Acoustic fingerprints can be used to identify not only royalty-free music if it is playing on the radio or in a shopping mall, but also whistle a tune. There is a whole class of systems for this, called query by humming (QbH). In 2020, Google introduced such functionality to its voice assistant. A similar feature offers the music recognition service SoundHound (which, incidentally, was among the music projects that went to IPO last year — in detail, we told in a previous article).
Track recognition is one of the most common cases with acoustic fingerprints, but there are others.
Acoustic fingerprints can be used to recognize emotions. Last year, Spotify patented technology that uses microphones to analyze intonations in speech, the listener’s stress level, gender, and approximate age. It also assesses the environment — for example, how many people are in the room where the music is playing. The technology is designed to improve the recommendation system and the work of personalized playlists.
In general, the development of the Swedish company was met with coolness. Several consumer advocacy organizations and nearly two hundred performers drafted an open letter urging Spotify to stop implementing it. The list of activists included guitarist Tom Morello of Rise Against the Machine, American rapper Talib Kweli and lead singer of the punk rock band Against Me! Laura Jane Grace.
According to the musicians, emotion detection technology poses a threat to the privacy of listeners. A resident of Hacker News expressed a similar point of view in a topical thread. Another panelist noted that curated playlists and their own selections — perhaps even on physical media — could be the way out in such circumstances. Audio cassettes and CDs are just coming back into the music culture space.
In any case, having a patent doesn’t mean that a company is actually using the technology. Many firms patent the most interesting ideas simply to protect themselves from potential lawsuits.
Working with video
Engineers at Drexel University seem to believe in a speedy return to normal and are busy developing a system that allows you to “glue” together dozens of videos from a concert. To synchronize videos shot on smartphones from different angles, the authors of the project use acoustic fingerprints, which help find “overlapping” moments.
Photo: Fábio Alves. Source: Unsplash.com
The already mentioned open algorithm dejavu is responsible for building the prints. It looks for frequency peaks in the spectrogram and calculates the distance between them, creating a unique pattern. Maybe in the future, such technology, combined with AR and VR capabilities, will allow you to immerse yourself in the atmosphere of past performances over and over again.