• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer

My TechDecisions

  • Best of Tech Decisions
  • Topics
    • Video
    • Audio
    • Mobility
    • Unified Communications
    • IT Infrastructure
    • Network Security
    • Physical Security
    • Facility
    • Compliance
  • RFP Resources
  • Resources
  • Podcasts
  • Subscribe
  • Project of the Week
  • About Us
    SEARCH
Audio, Video

Google Proposes New AI Metrics for Video and Audio

Google researchers have proposed a new set of metrics that can help measure the quality of AI-generated audio and video content.

October 28, 2019 Zachary Comeau Leave a Comment

AI Metrics Google

Google has proposed new a new metric for evaluating AI-generated audio and video quality in a bid to develop a more widely adopted method of synthesizing audio and video.

The tech behemoth said in a Tuesday blog post that it has made strides in establishing a more accurate way of measuring AI-generated content against what the machine was trained on.

To explain this, Google researchers Kevin Kilgour and Thomas Unterthiner used the example of a model that generates videos of StarCraft video game sequences.

“Clearly some of the videos shown below look more realistic than others, but can the differences between them be quantified?” the researchers wrote. “Access to robust metrics for evaluation of generative models is crucial for measuring (and making) progress in the fields of audio and video understanding, but currently no such metrics exist.”

Which is best?

To better quantify the accuracy of machine-generated content, Google proposes two new metrics: the Fréchet Audio Distance (FAD) and Fréchet Video Distance (FVD).

“We document our large-scale human evaluations using 10k video and 69k audio clip pairwise comparisons that demonstrate high correlations between our metrics and human perception,” the researchers said in a blog post.

The company also released the source code for both on github (FVD; FAD).

Building on the Fréchet Inception Distance

The two metrics were built on the principles of the Fréchet Inception Distance, a similar metric specifically designed for images that takes a large number of image from bot the target distribution and generative model, and uses the Inception object recognition network to embed each image into a lower-dimensional space to capture important features.

Unlike other popular metrics, FVD looks at entire videos to avoid the drawback of framewise metrics, and FAD is reference-free and can be used on any type of audio, unlike existing metrics that either require a time-aligned ground truth signal or target a specific domain like speech quality, Google said.

AI metrics
Examples of videos of a robot arm, judged by the new FVD metric. FVD values were found to be approximately 2000, 1000, 600, 400, 300 and 150 (left-to-right; top-to-bottom). A lower FVD clearly correlates with higher video quality.

Human study

Since human judgement is the gold standard for what looks and sounds realistic, Google’s team of researchers conducted a large-scale human study to determine how well the proposed metrics align with human judgement of AI-generated audio and video.

Humans examined 10,000 video pairs and 69,000 five-second audio clips. For FAD, they compared the effect of two different distortions on the same audio clip, randomizing both the pair and the order in which they appeared.

Read Next: Avoiding AI Bias Requires Diverse Workers, Research

Testers were asked which audio clip sounds most like a studio-produced recording, and the study found that FAD “correlates quite well” with human judgement.

“We are currently making great strides in generative models. FAD and FVD will help us keeping this progress measurable, and will hopefully lead us to improve our models for audio and video generation,” the team said.

Tagged With: Artificial Intelligence, Google, Machine Learning

Related Content:

  • Rendering of Nureva HDL310 in a conference space. Nureva Adds HDL310 Sound Bar for Mid-Size Spaces
  • Crestron AirMedia Adapter plugged into laptop Crestron Unveils AirMedia Connect Adaptor
  • Broadsign Digital Concession Stand Megaplex Theatre Miller Megaplex Theatres Increases Digital Concession Sales Using…
  • Jabra Evolve2 headset Jabra Expands Line of Evolve2 Series Headsets

Free downloadable guide you may like:

  • Harnessing the Power of Digital SignageHarnessing the Power of Digital Signage

    Choosing the best solutions for messaging, branding, and communicating in today’s content-everywhere landscape

Reader Interactions

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Get the FREE Tech Decisions eNewsletter

Sign up Today!

Latest Downloads

Four IT Trends That Will Define 2023
Expert Series: Four IT Trends That Will Define 2023

Learn about four key technologies we identified as critical to your IT organization’s success in 2023, as well as how to invest in new innovations ...

Harnessing the Power of Digital Signage
Harnessing the Power of Digital Signage

Choosing the best solutions for messaging, branding, and communicating in today’s content-everywhere landscape

Blueprint Series Cover: What works for hybrid work
Blueprint Series: What Works for Hybrid Work

Download this free resource to learn about how IT leaders can effectively manage and implement a hybrid work model.

View All Downloads

Would you like your latest project featured on TechDecisions as Project of the Week?

Apply Today!
Sharp Microsoft Collaboration HQ Logo

Learn More About the
Windows Collaboration Display

More from Our Sister Publications

Get the latest news about AV integrators and Security installers from our sister publications:

Commercial IntegratorSecurity Sales

AV-iQ

Footer

TechDecisions

  • Home
  • Welcome to TechDecisions
  • Subscribe to the Newsletter
  • Contact Us
  • Media Solutions & Advertising
  • Comment Guidelines
  • RSS Feeds
  • Twitter
  • Facebook
  • Linkedin

Free Technology Guides

FREE Downloadable resources from TechDecisions provide timely insight into the issues that IT, A/V, and Security end-users, managers, and decision makers are facing in commercial, corporate, education, institutional, and other vertical markets

View all Guides
TD Project of the Week

Get your latest project featured on TechDecisions Project of the Week. Submit your work once and it will be eligible for all upcoming weeks.

Enter Today!
Emerald Logo
ABOUTCAREERSAUTHORIZED SERVICE PROVIDERSTERMS OF USEPRIVACY POLICY

© 2023 Emerald X, LLC. All rights reserved.