• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer

My TechDecisions

  • Best of Tech Decisions
  • Topics
    • Video
    • Audio
    • Mobility
    • Unified Communications
    • IT Infrastructure
    • Network Security
    • Physical Security
    • Facility
    • Compliance
  • RFP Resources
  • Resources
  • Podcasts
  • Project of the Week
  • About Us
    SEARCH
Audio, Video

Google Proposes New AI Metrics for Video and Audio

Google researchers have proposed a new set of metrics that can help measure the quality of AI-generated audio and video content.

October 28, 2019 Zachary Comeau Leave a Comment

AI Metrics Google

Google has proposed new a new metric for evaluating AI-generated audio and video quality in a bid to develop a more widely adopted method of synthesizing audio and video.

The tech behemoth said in a Tuesday blog post that it has made strides in establishing a more accurate way of measuring AI-generated content against what the machine was trained on.

To explain this, Google researchers Kevin Kilgour and Thomas Unterthiner used the example of a model that generates videos of StarCraft video game sequences.

“Clearly some of the videos shown below look more realistic than others, but can the differences between them be quantified?” the researchers wrote. “Access to robust metrics for evaluation of generative models is crucial for measuring (and making) progress in the fields of audio and video understanding, but currently no such metrics exist.”

Which is best?

To better quantify the accuracy of machine-generated content, Google proposes two new metrics: the Fréchet Audio Distance (FAD) and Fréchet Video Distance (FVD).

“We document our large-scale human evaluations using 10k video and 69k audio clip pairwise comparisons that demonstrate high correlations between our metrics and human perception,” the researchers said in a blog post.

The company also released the source code for both on github (FVD; FAD).

Building on the Fréchet Inception Distance

The two metrics were built on the principles of the Fréchet Inception Distance, a similar metric specifically designed for images that takes a large number of image from bot the target distribution and generative model, and uses the Inception object recognition network to embed each image into a lower-dimensional space to capture important features.

Unlike other popular metrics, FVD looks at entire videos to avoid the drawback of framewise metrics, and FAD is reference-free and can be used on any type of audio, unlike existing metrics that either require a time-aligned ground truth signal or target a specific domain like speech quality, Google said.

AI metrics
Examples of videos of a robot arm, judged by the new FVD metric. FVD values were found to be approximately 2000, 1000, 600, 400, 300 and 150 (left-to-right; top-to-bottom). A lower FVD clearly correlates with higher video quality.

Human study

Since human judgement is the gold standard for what looks and sounds realistic, Google’s team of researchers conducted a large-scale human study to determine how well the proposed metrics align with human judgement of AI-generated audio and video.

Humans examined 10,000 video pairs and 69,000 five-second audio clips. For FAD, they compared the effect of two different distortions on the same audio clip, randomizing both the pair and the order in which they appeared.

Read Next: Avoiding AI Bias Requires Diverse Workers, Research

Testers were asked which audio clip sounds most like a studio-produced recording, and the study found that FAD “correlates quite well” with human judgement.

“We are currently making great strides in generative models. FAD and FVD will help us keeping this progress measurable, and will hopefully lead us to improve our models for audio and video generation,” the team said.

If you enjoyed this article and want to receive more valuable industry content like this, click here to sign up for our digital newsletters!

Tagged With: Artificial Intelligence, Google, Machine Learning

Related Content:

  • Yealink banner WH64 Hybrid Wirless Headset Yealink Introduces WH64 Hybrid DECT & Bluetooth Wireless…
  • ADI SnapOne logos Snap One, ADI Expands Conferencing Solutions with Yealink…
  • Duke Kunshan University Deploys 116 Sennheiser TCC 2 Ceiling Array Microphones 1 Duke Kunshan University Selects Sennheiser for Audio Upgrade
  • Nureva as HETMA sponsor poster. Nureva Backs HETMA as Annual Platinum Sponsor

Free downloadable guide you may like:

  • Practical Design Guide for Office SpacesPractical Design Guide for Office Spaces

    Recent Gartner research shows that workers prefer to return to the office for in-person meetings for relevant milestones, as well as for face-to-face time with co-workers. When designing the office spaces — and meeting spaces in particular — enabling that connection between co-workers is crucial. But introducing the right collaboration technology in meeting spaces can […]

Reader Interactions

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Latest Downloads

Practical Design Guide for Office Spaces
Practical Design Guide for Office Spaces

Recent Gartner research shows that workers prefer to return to the office for in-person meetings for relevant milestones, as well as for face-to-fa...

New Camera Can Transform Your Live Production Workflow
New Camera System Can Transform Your Live Production Workflow

Sony's HXC-FZ90 studio camera system combines flexibility and exceptional image quality with entry-level pricing.

Creating Great User Experience and Ultimate Flexibility with Clickshare

Working and collaborating in any office environment today should be meaningful, as workers today go to office for very specific reasons. When desig...

View All Downloads

Would you like your latest project featured on TechDecisions as Project of the Week?

Apply Today!

More from Our Sister Publications

Get the latest news about AV integrators and Security installers from our sister publications:

Commercial IntegratorSecurity Sales

AV-iQ

Footer

TechDecisions

  • Home
  • Welcome to TechDecisions
  • Contact Us
  • Comment Guidelines
  • RSS Feeds
  • Twitter
  • Facebook
  • Linkedin

Free Technology Guides

FREE Downloadable resources from TechDecisions provide timely insight into the issues that IT, A/V, and Security end-users, managers, and decision makers are facing in commercial, corporate, education, institutional, and other vertical markets

View all Guides
TD Project of the Week

Get your latest project featured on TechDecisions Project of the Week. Submit your work once and it will be eligible for all upcoming weeks.

Enter Today!
Emerald Logo
ABOUTCAREERSAUTHORIZED SERVICE PROVIDERSYour Privacy ChoicesTERMS OF USEPRIVACY POLICY

© 2025 Emerald X, LLC. All rights reserved.