Microsoft’s Research On Captioning Photos Automatically

Posted on 5/29/2015 by Jim Pickerell | Printable Version | Comments (1)

Microsoft recently published an article about the advancements they are making in developing technology that can automatically caption pictures. (See here.)

However, from the point of finding images on the Internet there is one big flaw in where they are headed. Captions are not what’s needed. What’s needed is a way to find an image that fulfills a specific need among a huge pile of images that might be appropriately described using all the same words.  

In most cases there will be a huge number of choices that can reasonably have the same caption. Nobody wants to take the time to look through all of them. Image users (customers) want (need) someone to narrow the search for them.

One of Microsoft’s examples is “A woman holding camera in a crowd.” To get there they first identify elements in the picture with words. Then they use the words to create sentences. And finally they rank the sentences and pick the one that seems most logical for the caption.

In the example shown the computer vision software thought the woman’s hair was a cat.
Someone searching for just the word “cat” would not be happy with this picture. Fortunately, the computer program decided that “A woman holding a cat” was not the proper caption for this image, but since it thinks there is a cat in the picture that becomes a legitimate keyword to include in a list of tags.

Searching For Images

While I didn’t expect to find this specific picture, I would hope that if I use basically the same words I would find something similar. I went to, searched for “woman holding camera crowd” and got the 151 images you get using those keywords. Most are either “woman holding camera” or “woman in crowd,” not all 4 elements together. Most aren’t at all appropriate to the image I have in my minds eye.

At iStock I only get 56 returns, but many of them are not appropriate. Some are of men with a camera. At I got 138 images that seem to have little of no relationship to what I am looking for. Then I went to Flickr and started my search with  “woman camera crowd” because I don’t think many image searchers would use that word “holding.” I got 5,983 returns. This is way more images than anyone would want to review, but most of the early returns not similar at all to the image I was looking for. When I added holding it narrowed the search to 672 images, but nothing like what I wanted.

Also it seems to me that “brown” is just as dominate a color as “purple” in the original picture. Flickr allows me to easily narrow the search for those dominant colors, but using either color did not narrow the search in any useful way. I also get a lot of pictures that either had no camera, or no crowd in the picture.

Finally I went to and The results in both these searches were even more disappointing.

The key is not in just writing generic captions for pictures you’ve found, or have in hand. The key is finding pictures among a host of other images of wide ranging quality and relevancy that legitimately use the same words to describe their elements.

If this is the best technology has to offer it will be a long time before there is likely to be anything that will those searching the Internet for images narrow their searches and a long time before customers stop needing editors to help them find useful images.

Copyright © 2015 Jim Pickerell. The above article may not be copied, reproduced, excerpted or distributed in any manner without written permission from the author. All requests should be submitted to Selling Stock at 10319 Westlake Drive, Suite 162, Bethesda, MD 20817, phone 301-461-7627, e-mail: wvz@fpcubgbf.pbz

Jim Pickerell is founder of, an online newsletter that publishes daily. He is also available for personal telephone consultations on pricing and other matters related to stock photography. He occasionally acts as an expert witness on matters related to stock photography. For his current curriculum vitae go to:  


  • Sheron Resnick Posted May 29, 2015
    Hi Jim, Can you try this search (woman holding camera crowd) on

    We and they have put a lot of effort into the keywording technology that runs the searches and I often find their search results to be significantly better than the "big boys".

    You'll see a few images that are glaringly mis-keyworded, and some that are of a crowd taking pictures of a woman -- but the majority of pictures in the result-set are of women taking pictures in a crowd.



Post Comment

Please log in or create an account to post comments.

Stay Connected

Sign up to receive email notification when new stories are posted.

Follow Us

Free Stuff

Stock Photo Pricing: The Future
In the last two years I have written a lot about stock photo pricing and its downward slide. If you have time over the holidays you may want to review some of these stories as you plan your strategy ...
Read More
Future Of Stock Photography
If you’re a photographer that counts on the licensing of stock images to provide a portion of your annual income the following are a few stories you should read. In the past decade stock photography ...
Read More
Blockchain Stories
The opening session at this year’s CEPIC Congress in Berlin on May 30, 2018 is entitled “Can Blockchain be applied to the Photo Industry?” For those who would like to know more about the existing blo...
Read More
2017 Stories Worth Reviewing
The following are links to some 2017 and early 2018 stories that might be worth reviewing as we move into the new year.
Read More
Stories Related To Stock Photo Pricing
The following are links to stories that deal with stock photo pricing trends. Probably the biggest problem the industry has faced in recent years has been the steady decline in prices for the use of ...
Read More
Stock Photo Prices: The Future
This story is FREE. Feel free to pass it along to anyone interested in licensing their work as stock photography. On October 23rd at the DMLA 2017 Conference in New York there will be a panel discuss...
Read More
Important Stock Photo Industry Issues
Here are links to recent stories that deal with three major issues for the stock photo industry – Revenue Growth Potential, Setting Bottom Line On Pricing and Future Production Sources.
Read More
Recent Stories – Summer 2016
If you’ve been shooting all summer and haven’t had time to keep up with your reading here are links to a few stories you might want to check out as we move into the fall. To begin, be sure to complet...
Read More
Corbis Acquisition by VCG/Getty Images
This story provides links to several stories that relate to the Visual China Group (VCG) acquisition of Corbis and the role Getty Images has been assigned in the transfer of Corbis assets to the Gett...
Read More
Finding The Right Image
Many think search will be solved with better Metadata. While metadata is important, there are limits to how far it can take the customer toward finding the right piece of content. This story provides...
Read More

More from Free Stuff