214
ARRIBA VISTA
April 13, 1999
In Novermber 1998 Arriba Soft Corporation launched the Arriba Vista Image Searcher
on the Web at www.arribavista.com .
To date it has cataloged about 1.5 million images. They estimate they have crawled
about 30% of the web which means they might
be able to catalog 5 million images once they search the entire web. This does not
include the images they are not allowed to include because they are excluded from
certain web sites.
In one sense this should be what individual photographers have been waiting for
because it can help make their individual sites become known to the world at large.
Arriba Vista had about 3 million page hits in February and based on growth curves
they expect that to reach 10 million page hits per month in the near future.
On the other hand, many photographers who have heard anything about this site are up
in arms over its existence and the fact that Arriba Vista store thumbnails of their
images on the Arriba Vista web site without the photographer's copyright.
We will try to explain how the Arribavista site works and the pros and cons of such
sites. One of the things to recognize is that Arriba Vista is not the only search
engine to use this technique. Alta Vista has the "AV Photo Finder" that works in
more or less the same way.
Note: Alta Vista has signed an agreement with Corbis. In exchange for allowing
their images to be shown on the AV Photo Finder site, Corbis is guaranteed that
their images will appear first on any search. This may be great for Corbis
photographers, but the images of all other photographers will appear so far down in
the pile that the chances of anyone ever looking at them is slim.
Getting Images Seen
The big problem for photographers selling stock through individual sites is in
letting potential customers know that their site exists. They can send postcards or
e-mail to their regular clients. For photographers who are primarily interested in
developing assignment business that may be perfectly satisfactory. But, stock
photographers need to cast a wider net because many of the people who will be
interested in using their images will be people with whom the photographer has had
no previous contact.
In the past one solution for many photographers was to get listed on the major text
search engines. That isn't always satisfactory from a user point of view for two
reasons. Text listings tell so little about the specifics on a given site that it
is often difficult to know what you will find when you go there. In addition, if
after reading the text the user thinks there might be something of interest at the
site the user still has to click again in order to see an image. This makes
searching for images deadly slow and frustrating. Anyone who is under any type of
time pressure is simply unlikely to do it. It is here that Arriba Vista and other
sites like them offer a major breakthrough in searching for information in general,
or looking for images.
Professional users are quickly turned off if they have to open lots of sites just to
discover they offer nothing useful. If the photographer has a tight specialization,
a text listing may work, but photographers with broad general files have trouble
explaining the specifics of their file in the few words of description that are
found on one of the major search engines.
Given the large number of sites to choose from there is little likelihood that a
user will bother to open the johnjonesimages site unless the user knows something
about John Jones' work already.
Advantages Over Text Sites
The advantage for the user in using the Arriba Vista site is that it focuses just on
images, not everything related to a particular subject. Instead of getting a
general description or what is on a site the user sees thumbnails of individual
images based on the keyword search.
The eye can then determine by looking at a thumbnail image whether there is likely
to be relevant information on that site. This visual information enables the user
to make much more rapid and accurate decisions about which sites to visit that is
possible using a text search system.
Moreover, if the user is looking for a picture or illustration he or she doesn't
have to wade through thousands of articles on the subject in order to find images.
When the user clicks on the thumbnail they are taken directly to the page where that
photo appears. For most photographer sites that page may be the image itself with
some caption or copyright information connected to it. However, for the vast
majority of web sites where images are found the image is inserted in a page of text
that relates to the image.
Peter Spicer, Chief Technology Officer at Arriba Vista, points out that he received
a call recently from a mother who had been helping her son produce a paper on
Abraham Lincoln. She liked the Arriba Vista site, because there were "fewer" items
to look through and she got the information she needed faster. She started out on
the text search engines and got too much information related to the keywords
"Abraham Lincoln."
At Arriba Vista she felt she was able to get some very good information based on
pictures of Abraham Lincoln, and the text associated with those pictures, rather
than being overwhelmed by all the options at a text site. She still got
inappropriate hits, but possibly not quite as many of them as she would get on the
text search sites.
Searching on Arriba Vista for Abraham Lincoln we got 635 hits which were a
preponderance of pictures of Abraham Lincoln. However, we also got pictures of the
"Lincon cent," "Lincoln Park," people whose first name was Abraham and a picture of
a Curtis biplane. This last was really interesting as it appeared in the airplane
section of book called "Practical Mechanics For Boys." This picture came up when
you search for Abraham Lincoln because the caption under the biplane picture was
"Lincoln Beachey in a Curtis Biplane."
By way of comparison if we search for Abraham Lincoln on Altavista's main text
search area we get 45,234 pages. On their "AV Photo Finder" Abraham Lincoln gets
2985 hits.
It should be recognized that the vast majority of users of the Arriba Vista site are
consumers, not professional users. Thus, the chances of commercially licensing
rights to your images as a result of their appearance on Arriba Vista is slim.
Media Commerce
Arriba Vista acknowledges that site in its current format may be of little value to
professional photographers trying to license rights to their work, or to
professional researchers who are trying to find images to license. The professional
researcher has to go through too much chaff in order to find a few useful bits of
information and thus is likely to be turned off.
To meet the needs of these two groups Arriba Vista is developing a companion site
directed toward Media Commerce. Initially this site will focus on licensing rights
to royalty free images for fixed prices. Later, they will develop a Rights
Protected section of this site where usage fees can be negotiated for individual
uses.
PhotoSphere will be one of the first participants on the site. Arriba Vista is
talking to other RF producers now to try to get others to sign on.
According to Sue Clemons, formerly with Superstock, they will start with RF because
it is easier to get that side of their business going since they don't have to deal
with negotiating sales. On the Rights Protected site they may allow individual
agencies to handle their own negotiations, but that has yet to be decided.
Every participant on the site will have to specifically request that their images be
included and initially they will only work with agencies. At some later date they
may accept images from individual photographers. All aspects of the Rights
Protected site are still in the initial planning stages.
If they eventually accept images from individual photographers it could make it
possible for many photographers with individual sites to overcome the marketing
hurdles of a personal web site. This could be extremely important to photographers
who have been unable to get agent representation.
How The Site Works In Brief
The following are the basics of how the current site works.
They use a "spider" to search the net for image files with extensions like
.jpeg .tiff .gif. This is a continuous process of looking for new information and
can either be random, or targeted at specific sites by the webmaster. Individuals
can request that their site be indexed by sending a message to the webmaster at
Arriba Vista. Recent studies indicate that the largest engine, Alta Vista, has
probably crawled no more than 50% of the Web although no one is really certain given
the phenomenal growth of the web.
The software captures each image found, creates a thumbnail which is stored on
the Arriba Vista site along with the path back to the original site.
It creates keywords for the image by using Meta data found on the site. Meta
tags are hidden words that appear in the head element of an HTML document and can be
extracted by servers/clients for use in dentifying, indexing and cataloging
specialized documents. This is used to advertise the contents of the document.
All pictures on the Arriba Vista site will be keyworded with all the words selected
from the Meta tags, even though some of these words may not directly apply to
specific pictures. Arriba Vista's goal is to ensure that keywords have semantic
relevancy to the image but that often can not be accomplished without human
intervention to view the image. Currently there is no way to attach meta tags to
individual .jpeg files.
In addition to the Meta tags which typically do not supply enough detail,
Arriba Vista employs a nine (9) step relevancy ranking of other text data in context
with the images. In particular it looks at headlines, sub heads and captions that
are close to the image.
This can work for editorial images when the spider finds them inserted in a
document. It is unlikely to work well for concept images used on a commercial site
because the preponderance of the text will probably not relate to the images. It
will be of little help at all at most photographer sites because they tend not to
use text to describe or amplify their images.
On average the spider softward generates about 10 keywords per image. Many
images have fewer words. Given the automated system for collecting these keywords
many of the words are inappropriate to describing the specific image. When
searching for a particular word string the user gets a high percentage of inaccurate
hits.
Arriba Vista also uses a manual process where a human looks at some of the
images to validate the (semantic) relevancy of keyword/image pairs. Manual review
is extremely expensive compared with the automated system. It is unclear what
percentage of the images get this manual review, but from looking at the number of
inappropriate keywords attached to the images in most searches it would appear that
there is very little manual review at this time.
There is a "connection list" of words used at the search level that tries to
capture the intent of the user from the words used. For example if the user enters
ecology the search engine will look for the words like "air," "water," etc. This
connection list was developed by Arriba Vista and is proprietary.
To get an idea of the confusing connections this list can produce consider the
following. A shoe store site promotes the fact that they have "Air Jordan" shoes in
their META tags. They show pictures of all their shoes on their site. Someone
searches for "ecology". In Arriba Vista's "connection list", which the search engine
automatically uses on every search, the words "air," "water" as well as many other
things are attached to "ecology." Thus, in the search for ecology the search engine
pulls all the images that have the keyword "air" and gets a whole bunch of shoes.
Currently allow for boolean searches (AND, OR, + and -) according to the Help
menu. This is a useful tool, but it does not seem to be fully operational. John
Treacy, VP of Marketing, says,
"Today, you much enter a + sign to produce correct 'phrases"'.
The example used in the Help menu is "Cat+Dog". That didn't work for me, but "Cat +
Dog" did. There must be a space on either side of the "+" sign. I couldn't get
"and" or "or" to work and I can not explain why. This is an extremely important
function in refining searches.
It is also interesting to note that on first glance when doing the "Cat + Dog"
search you would think that it is not working because you get a lot of pictures of
single animals. This happens because one of the sources of these images is
"petcraft.com" which is a pet store catering to all kinds of pets. Their Meta tags
include: " african, bird, canary, cat, cichlids, dog, friends, kitten, meet,
petcraft, pig, potbelly, puppy, red, send, sheridan, want and world" All these
words are attached to every image that comes from this site. Consequently, the
picture of a pig has the keywords "cat" and "dog".
All searches look for "exact phrase matching" of any keyword.
Thus, if the word entered was "Catskills" and you search on "Catskill" you won't
find it. The important thing to note here is that if you were putting Meta tags on
your site in the hopes of being indexed by Arriba Vista you should put in both the
singlar and plural forms of important words. Many searchers tend to use plural when
they are looking for a singular subject and visa versa.
Some search engines automatically look for plurals. At this stage, this one does
not.
Currently all images on the site are sequenced on a first
on, first to be pulled up basis depending on the search criteria. This means that
any new images are likely to be at the bottom of the pack and there is a good chance
that users will not look beyond the first couple hundred images for any search
criteria. Some search engines use a reverse order process where the newest images
added (in date order) are looked at first. This system has more appeal to image
provider, but in a subject area where lots of new images are added even relatively
new images will work their way down into the pile very rapidly. Arriba Vista is
considering changing the way they order images.
When they first started, Arriba Vista was going directly to the image file at
the image owner's site once someone clicked on the thumbnail. In many cases this
was removing the context of the rest of the information that the site creator had
put on his page around the picture. Photographers complained and Arriba Vista
listened. They adjusted their search parameters so the page acquired when the
thumbnail is clicked is one step back from the actual image file itself. This
usually gives all of the textual information that relates to the image and thus all
the context is preserved. If, in a photographer's site, his or her copyright
information appears on the screen next to the preview size image it should now be
preserved.
Advantages And Disadvantages For Photographers
The advantage is that photographers are charged nothing to participate. Arriba
Vista earns all their revenue by selling ad space on their site. They expect that
to be their sole source of income. Some photographers have complained that Arriba
Vista is
profiting by using their images, but it seems to me that what they are doing differs
little from what all the major text search engines like Yahoo, Alta Vista, Excite,
Infoseek, Lycos, etc are doing. The main difference is that they provide a more
efficient method for users to search for certain types of data.
The disadvantage is that in its current form it seems unlikely that it will aid
photographers in earning revenue. It may bring more non-revenue traffic to certain
pages within their site.
It seems likely that a high percentage of vists will be from those wanting to make
small personal uses of the information. Thus far no one has worked out a successful
model for collecting for these uses. Arriba Vista says they hope to find ways to
charge fees for consumer use. If they can work out a system the photographers who
created the images will receive a share of any fees collected.
There is also a fear that this increased traffic from the general consumers
population will lead to more misuse. Most stock photographers would like to find
ways to draw increase traffic from professional users and keep the consumer interest
in their sites to a minimum. Its called target marketing. The goal of Arriba Vista
is to reach all potential users and not target any specific group.
Some photographers are concerned that visitors to the site will think these pictures
are free to use for any purpose and will not recognize that some of them are
copyrighted. Arriba Vista has placed the following rights notice under each
thumbnail once it is selected from a group of thumbnails returned from a search:
Arriba Vista provides a visual mechanism to search the Web using images instead of
text. Users are directed to the originating web site on which the images are
located. Should you wish to use any image, photo or artwork you see during the
search process, you must obtain the appropriate permission from the owner of the
material.
Problems
As I see it there are both technological and philosophical problems. The
technological problems may be relatively easy to solve in time. They include:
--Lack of boolean searches.
--Developing a better system for making new images on the site available to the
user, rather than having them always fall at the bottom of the pack. This system
might involve putting all images acquired at the top based on the date of
acquisition.
--Improved Natural Language technology.
--Improving the quality of the connection list and the thesaurus.
--A system for attaching specific meta words to specific image files at the
photographers site thus making it possible for the creator of the web site to
provide more accurate data about each individual image. In some cases they are
already accepting keywords from a few photographers.
However, implementing new and improved versions of the "natural language," the
"connection list," and to a great extent the "boolean" searches will result in
little improvement in the site unless more accurate and extensive keywording is
provided for each image.
The keywording issue is a difficult one to overcome. The people at Arriba Vista
believe they can automate this process to a great extent. I am skeptical. The
inappropriate words on the current site would tend to justify my skepticism. Better
keywording can be achieved in one of two ways. Arriba Vista could hire humans to
look at each image and the text related to it and make judgements about what words
should be added or deleted from the list automatically produced. This is time
consuming, and probably not cost effective.
The second way is for those who created the web site to provide appropriate keywords
and captions. Peter Spicer says they are already accepting keywords and image files
supplied on disc from some photographers. However, it is my belief that there is
not enough commercial incentive for the vast majority of people who created the
pages Arriba Vista is currently indexing to go to this trouble. Most have no
interest in licensing rights to their images. They will not perceive that they will
achieve enough benefit from increased eyeballs to their site to justify the expense
of this keywording. They will spend their promotional dollars in other ways.
Professional image sellers will recognize the value of keywording and will go to the
trouble. But, these people will want to be on the Media Commerce site, not the
current search engine.
Image sellers need to think carefully about how best to market their images. The
people at Arriba Vista seem to believe that more eyeballs looking at your images the
better, but for the professional photographer that may not necessarily be the case.
It may be more important to have the right kind of eyeballs, not just more. The
Media Commerce site will probably provide the "right kind of eyeballs," the current
site doesn't. Some individual photographers whose sites have been promoted on
other search engines, but have not yet benefited from the increased traffic Arriba
Vista might generate, are already finding that they have to spend too much time
fielding requests from individuals who do not want to pay enough for the use on an
image to justify handling the transaction. These photographers do not need more of
this type of traffic.
Even if school children began to show an interest in actually paying to use images
there is no guarantee that they would be willing to pay enough to offset the cost of
supplying the service.
Such payments might be enough for the consolidator (Arriba Vista) to make a profit,
but not enough for the thousands of individual suppliers of images to individually
make enough to justify participation in the project.
Based on the experiences so far it seems that everyone who has tried to reach the
consumer market finds it costs much more to service than the revenue generated.
Individual photographer certainly don't want to have to field calls or e-mails from
customers who want to buy rights to an image for $1.00 or $2.00.
Because Arriba Vista goes after every image regardless of quality or demand for the
subject matter they clutter the site with a huge amount of imagery for which there
is little or no demand. They are getting about 3 million hits per day (thumbnails
served), but that is probably heavily weighted toward the educational market not the
commercial market. As we pointed out earlier the site may be helpful for those
looking for a little general information about a topic because it narrows the search
to only those pages that include pictures.
Getting Images From Sites Where We Have Licensed Use
A major problem for photographers will be when engines like this capture our images from
sites where we have licensed legal use of our images. Here's how that will work.
Since Arriba Vista searches for all URL's with the .jpeg or .gif extensions they also find all
images at magazine or newspaper sites. These sites could prevent their images from
being picked up if they use the "robots.txt" command to prevent robots from indexing
their site. Some use this command, but many want their site indexed and listed by as
many search engines as possible so users can find them.
Thus, if you have allowed your pictures to be used on magazine or newspaper sites,
or licensed a use on some commercial site, there is a good chance your images are
already in the Arriba Vista index. If that is the case, in all likelihood your name
will not be attached to the image because either your name did not appear at all on
the site, or if it did it was not included in the information that Arriba Vista's
spider picks up.
Even if the photographer's name is listed, anyone who finds that image will be
referred back to the URL where the images was used, not to the photographer's URL.
The first contact with that other company will be the webmaster, not with the person
with whom the photographer or stock agent negotiated the deal. At this point their
are several ways this whole thing can fall apart as far as the photographer is
concerned. The webmaster probably has no idea what agreements were negotiated. He
may say "OK" as long as his company is credited because his goal is to get his
company's message out to as many eyeballs as possible. If the use is for another
web site the size of the file on the web is probably perfectly satisfactory.
If the webmaster wants to check on clearance he may have no idea who within his
company he should go to. The chances that the request will get back to the
photographer are slim. It seems to me that in all likelihood there will be a huge
amount of misuse resulting from this system.
This is a problem, not just for photographers with their personal sites, but for The
Image Bank, Tony Stone Images, The Stock Market, Photodisc, Corbis and all the rest
of rest of the major image suppliers. At present, there may not be that many web
uses of images that will be available to be sucked up by the search engine spiders,
but there are stong indications that this usage is going to grow quite rapidly in
the next few years.
Lessons Learned
This site demonstrates the degree to which images on the web can be randomly located
and cataloged using automated systems. It demonstrates that thumbnails can be
created of any image found on the web and stored somewhere other than your site. It
is also clear that search engines can capture your images, without your knowledge,
and use them in connection with their own advertising, unless you use great care in
how you set up your site.
Arriba Vista is trying to be a responsible web partner and will not upload images of
anyone who requests that their images not be included in the index. They will also
remove images that have been uploaded. There is no assurance that other site
opeartors will be as responsible.
The important thing to recognize is not so much what this specific company, Arriba
Vista, is doing, but what it is possible to accomplish with today's technology.
Others will be doing the same type of thing in the near future.
Protecting Yourself From Spiders
There are a variety of ways to protect your images on the web.
One is to embed your copyright information and a contact number into any image file.
You can do this by opening the image in PhotoShop, adding a bar above or below the
image and placing your visible notice within that file. To see a sample of this you
can look at the preview images on www.workbook.com. Many of these images also have
a visible watermark on the image itself.
This way the copyright information will always travel with the image. The downside
is that this bar, on anything within the image itself, will probably be edited out
by any client who licenses usage of the image.
At Arriba Vista all watermarks and copyright management information (CMI) embedded
in the image is maintained. There is no capability in their software to tamper with
embedded CMI.
That said, many copyright notices etc. are placed nearby as HTML text. Depending on
how the Web page is constructed, that text file may reside in a completely different
file from any Meta tag information or other textual context. Sometimes, it can even
appear on different servers for various types of ASP (Active Server Pages) pages.
In cases like this, the crawler may not capture the text based copyright notice.
It is becoming increasingly important to make sure something like the invisible
Digimarc is embedded in every image file that is licensed to a client for use on the
web. Image producers ought to also take a look at Digital Object Identifiers (DOI)
(www.doi.org) as a way to insure that any image can always be tracked to their
current address and contact information.
How To Avoid Getting Picked Up
If you have a site it is probably a good idea to contact Arribavista and either ask
them to index your site, or tell them specifically that you don't want your site
indexed. They will honor either request.
You can also use Robot Exclusion Protocols which Arriba Vista and other search
engines honor. For more information about how robots and search engines in general
work you might want to look at www.searchenginewatch.com. They have a comprehensive
site and they list the names of many of the crawlers used by the search engines.
John Treacy of Arriba Vista supplied the following information:
The robots.txt file allows for the exclusion of all crawlers or specific
crawler(s). This method should be used if you have access to the root directory of
a web site and know specific directories you want excluded. The robots.txt file
MUST be located in the root directory of a given web site.
The following are the procedures for setting up a Robots.TXT file or Meta Tags to
exclude the Arriba Vista web crawler. If you have any further question please
contact Arriba Vista.
Robots.TXT and Meta Tag Procedures
There are two ways to exclude the Arriba Vista (ArribaPacketRat) robot:
Method 1 (the robots meta tag)
The meta tag system is ideal for excluding specific pages or for users who do not
have access to the root directory of the web site and want all robots excluded the
same. The robots meta tag is not fully supported by all crawlers, but it is
supported by Arriba Vista. To exclude Arriba Vista through this method, place a
meta tag in the head of your html document with the name "robots" and place
restrictions in the content space of the meta tag.
Supported Restrictions:
noindex - Don't index this page
nofollow - Don't follow links off of this page
nomediaindex - Don't index media on this page (Specific to Arriba Vista)
Separate restrictions may be grouped together in one tag as in the
following example: meta name="robots" content="noindex,nofollow". This should
be enclosed in the HTML brackets and should be inserted after the Head in the
HTML structure.
Method 2 (robots.txt)
This method is the easiest if you have access to the root directory of a web site
and know specific directories you want excluded. According to the standards for
robot exclusion, the robots.txt file MUST be located in the root directory of a
given web site which is difficult for people who don't have there own domain. For
Arriba Vista (http://www.arribavista.com), the robots.txt file would be located at
http://www.arribavista.com/robots.txt.
An example of an invalid robots.txt location is
http://www.arribavista.com/foo/robots.txt. This file would not be looked at. The
contents of the robots.txt file allows for the exclusion of specific robot(s) or all
robots.
To exclude all robots from the entire site the contents of the robots.txt file would
be:
# anything on a line after a # sign is ignored
User-agent:*#This excludes all crawlers (any text after the # sign is ignored)
Disallow:/
To exclude only the Arriba Vista crawler from the entire site the contents of the
robots.txt file would be:
User-agent: Ditto Sypder
Disallow:/
Alternatively, if you wanted to exclude the Arriba Vista crawler from specific
directories, you could add a Disallow line for each directory you do not want
indexed.
User-agent: DittoSpyder # Arriba Vista Image Search
Disallow:/personal
Disallow:/images
Disallow:/bar