Rare is the chance for us editors to share with you what it is really like to do our jobs. But a recent article from Venture Beat on photo caption AI caught our attention — and we found something in it we can really relate to.
Something you probably wouldn’t know unless you’re a journalist is that us editors constantly have to come up with captions for images. It may seem like such a small thing to complain about, but when we’re already writing and re-writing so many things all day, coming up with yet another way to re-hash what we already said for the sake of a picture having a label can become monotonous.
Fortunately for us lazy editors, IBM AI is here to help.
Photo caption AI could create “humanlike” captions
According to the Venture Beat article, a research paper at the 2019 Conference in Computer Vision and Pattern Recognition by a team of IBM AI researchers describes a model that could craft “diverse, creative, and convincingly humanlike captions.”
“Architecting the system required addressing a chief shortcoming of automatic captioning systems: sequential language generation resulting in syntactically correct — but homogeneous, unnatural, and semantically irrelevant — structures. The coauthors’ approach gets around this with an attention captioning model, which allows the captioner to use fragments of scenes in the photos it’s observing to compose sentences. At every generating step, the team’s AI model has the choice of attending to either visual or textual cues from the last step.” — original Venture Beat article
But the IBM AI researchers wanted to ensure that the captions didn’t sound robotic, so they used two-part neural networks that produce discriminatory samples attempting to distinguish between the generated samples and real examples.
This means the photo caption AI is trained during the captioning process.
Another discriminating function scores the “naturalness” of sentences with a model which matches with generated words, allowing the AI to judge the image and sentence in pairs.
Read Next: Artificial Intelligence Uses Machine Learning to Fake Photos
Researchers say their photo caption AI achieves “good” performance overall. They believe their work makes way for new computer vision systems, which they also wish to explore, says the article.
If you enjoyed this article and want to receive more valuable industry content like this, click here to sign up for our digital newsletters!
[…] Photo Caption AI Model from IBM Researches Tries Not to Sound Too Robotic – TechDecisions (mytechdecisions.com) […]