Arriba Vista

Posted on 4/13/1999 by Jim Pickerell | Printable Version | Comments (0)

214

ARRIBA VISTA


April 13, 1999

In Novermber 1998 Arriba Soft Corporation launched the Arriba Vista Image Searcher

on the Web at www.arribavista.com .

To date it has cataloged about 1.5 million images. They estimate they have crawled

about 30% of the web which means they might

be able to catalog 5 million images once they search the entire web. This does not

include the images they are not allowed to include because they are excluded from

certain web sites.

In one sense this should be what individual photographers have been waiting for

because it can help make their individual sites become known to the world at large.

Arriba Vista had about 3 million page hits in February and based on growth curves

they expect that to reach 10 million page hits per month in the near future.

On the other hand, many photographers who have heard anything about this site are up

in arms over its existence and the fact that Arriba Vista store thumbnails of their

images on the Arriba Vista web site without the photographer's copyright.

We will try to explain how the Arribavista site works and the pros and cons of such

sites. One of the things to recognize is that Arriba Vista is not the only search

engine to use this technique. Alta Vista has the "AV Photo Finder" that works in

more or less the same way.

Note: Alta Vista has signed an agreement with Corbis. In exchange for allowing

their images to be shown on the AV Photo Finder site, Corbis is guaranteed that

their images will appear first on any search. This may be great for Corbis

photographers, but the images of all other photographers will appear so far down in

the pile that the chances of anyone ever looking at them is slim.

Getting Images Seen

The big problem for photographers selling stock through individual sites is in

letting potential customers know that their site exists. They can send postcards or

e-mail to their regular clients. For photographers who are primarily interested in

developing assignment business that may be perfectly satisfactory. But, stock

photographers need to cast a wider net because many of the people who will be

interested in using their images will be people with whom the photographer has had

no previous contact.

In the past one solution for many photographers was to get listed on the major text

search engines. That isn't always satisfactory from a user point of view for two

reasons. Text listings tell so little about the specifics on a given site that it

is often difficult to know what you will find when you go there. In addition, if

after reading the text the user thinks there might be something of interest at the

site the user still has to click again in order to see an image. This makes

searching for images deadly slow and frustrating. Anyone who is under any type of

time pressure is simply unlikely to do it. It is here that Arriba Vista and other

sites like them offer a major breakthrough in searching for information in general,

or looking for images.

Professional users are quickly turned off if they have to open lots of sites just to

discover they offer nothing useful. If the photographer has a tight specialization,

a text listing may work, but photographers with broad general files have trouble

explaining the specifics of their file in the few words of description that are

found on one of the major search engines.

Given the large number of sites to choose from there is little likelihood that a

user will bother to open the johnjonesimages site unless the user knows something

about John Jones' work already.

Advantages Over Text Sites

The advantage for the user in using the Arriba Vista site is that it focuses just on

images, not everything related to a particular subject. Instead of getting a

general description or what is on a site the user sees thumbnails of individual

images based on the keyword search.

The eye can then determine by looking at a thumbnail image whether there is likely

to be relevant information on that site. This visual information enables the user

to make much more rapid and accurate decisions about which sites to visit that is

possible using a text search system.

Moreover, if the user is looking for a picture or illustration he or she doesn't

have to wade through thousands of articles on the subject in order to find images.

When the user clicks on the thumbnail they are taken directly to the page where that

photo appears. For most photographer sites that page may be the image itself with

some caption or copyright information connected to it. However, for the vast

majority of web sites where images are found the image is inserted in a page of text

that relates to the image.

Peter Spicer, Chief Technology Officer at Arriba Vista, points out that he received

a call recently from a mother who had been helping her son produce a paper on

Abraham Lincoln. She liked the Arriba Vista site, because there were "fewer" items

to look through and she got the information she needed faster. She started out on

the text search engines and got too much information related to the keywords

"Abraham Lincoln."

At Arriba Vista she felt she was able to get some very good information based on

pictures of Abraham Lincoln, and the text associated with those pictures, rather

than being overwhelmed by all the options at a text site. She still got

inappropriate hits, but possibly not quite as many of them as she would get on the

text search sites.

Searching on Arriba Vista for Abraham Lincoln we got 635 hits which were a

preponderance of pictures of Abraham Lincoln. However, we also got pictures of the

"Lincon cent," "Lincoln Park," people whose first name was Abraham and a picture of

a Curtis biplane. This last was really interesting as it appeared in the airplane

section of book called "Practical Mechanics For Boys." This picture came up when

you search for Abraham Lincoln because the caption under the biplane picture was

"Lincoln Beachey in a Curtis Biplane."

By way of comparison if we search for Abraham Lincoln on Altavista's main text

search area we get 45,234 pages. On their "AV Photo Finder" Abraham Lincoln gets

2985 hits.

It should be recognized that the vast majority of users of the Arriba Vista site are

consumers, not professional users. Thus, the chances of commercially licensing

rights to your images as a result of their appearance on Arriba Vista is slim.

Media Commerce

Arriba Vista acknowledges that site in its current format may be of little value to

professional photographers trying to license rights to their work, or to

professional researchers who are trying to find images to license. The professional

researcher has to go through too much chaff in order to find a few useful bits of

information and thus is likely to be turned off.

To meet the needs of these two groups Arriba Vista is developing a companion site

directed toward Media Commerce. Initially this site will focus on licensing rights

to royalty free images for fixed prices. Later, they will develop a Rights

Protected section of this site where usage fees can be negotiated for individual

uses.

PhotoSphere will be one of the first participants on the site. Arriba Vista is

talking to other RF producers now to try to get others to sign on.

According to Sue Clemons, formerly with Superstock, they will start with RF because

it is easier to get that side of their business going since they don't have to deal

with negotiating sales. On the Rights Protected site they may allow individual

agencies to handle their own negotiations, but that has yet to be decided.

Every participant on the site will have to specifically request that their images be

included and initially they will only work with agencies. At some later date they

may accept images from individual photographers. All aspects of the Rights

Protected site are still in the initial planning stages.

If they eventually accept images from individual photographers it could make it

possible for many photographers with individual sites to overcome the marketing

hurdles of a personal web site. This could be extremely important to photographers

who have been unable to get agent representation.

How The Site Works In Brief

The following are the basics of how the current site works.

  • They use a "spider" to search the net for image files with extensions like

    .jpeg .tiff .gif. This is a continuous process of looking for new information and

    can either be random, or targeted at specific sites by the webmaster. Individuals

    can request that their site be indexed by sending a message to the webmaster at

    Arriba Vista. Recent studies indicate that the largest engine, Alta Vista, has

    probably crawled no more than 50% of the Web although no one is really certain given

    the phenomenal growth of the web.

  • The software captures each image found, creates a thumbnail which is stored on

    the Arriba Vista site along with the path back to the original site.

  • It creates keywords for the image by using Meta data found on the site. Meta

    tags are hidden words that appear in the head element of an HTML document and can be

    extracted by servers/clients for use in dentifying, indexing and cataloging

    specialized documents. This is used to advertise the contents of the document.

    All pictures on the Arriba Vista site will be keyworded with all the words selected

    from the Meta tags, even though some of these words may not directly apply to

    specific pictures. Arriba Vista's goal is to ensure that keywords have semantic

    relevancy to the image but that often can not be accomplished without human

    intervention to view the image. Currently there is no way to attach meta tags to

    individual .jpeg files.

  • In addition to the Meta tags which typically do not supply enough detail,

    Arriba Vista employs a nine (9) step relevancy ranking of other text data in context

    with the images. In particular it looks at headlines, sub heads and captions that

    are close to the image.

    This can work for editorial images when the spider finds them inserted in a

    document. It is unlikely to work well for concept images used on a commercial site

    because the preponderance of the text will probably not relate to the images. It

    will be of little help at all at most photographer sites because they tend not to

    use text to describe or amplify their images.

  • On average the spider softward generates about 10 keywords per image. Many

    images have fewer words. Given the automated system for collecting these keywords

    many of the words are inappropriate to describing the specific image. When

    searching for a particular word string the user gets a high percentage of inaccurate

    hits.

  • Arriba Vista also uses a manual process where a human looks at some of the

    images to validate the (semantic) relevancy of keyword/image pairs. Manual review

    is extremely expensive compared with the automated system. It is unclear what

    percentage of the images get this manual review, but from looking at the number of

    inappropriate keywords attached to the images in most searches it would appear that

    there is very little manual review at this time.

  • There is a "connection list" of words used at the search level that tries to

    capture the intent of the user from the words used. For example if the user enters

    ecology the search engine will look for the words like "air," "water," etc. This

    connection list was developed by Arriba Vista and is proprietary.

    To get an idea of the confusing connections this list can produce consider the

    following. A shoe store site promotes the fact that they have "Air Jordan" shoes in

    their META tags. They show pictures of all their shoes on their site. Someone

    searches for "ecology". In Arriba Vista's "connection list", which the search engine

    automatically uses on every search, the words "air," "water" as well as many other

    things are attached to "ecology." Thus, in the search for ecology the search engine

    pulls all the images that have the keyword "air" and gets a whole bunch of shoes.

  • Currently allow for boolean searches (AND, OR, + and -) according to the Help

    menu. This is a useful tool, but it does not seem to be fully operational. John

    Treacy, VP of Marketing, says,

    "Today, you much enter a + sign to produce correct 'phrases"'.

    The example used in the Help menu is "Cat+Dog". That didn't work for me, but "Cat +

    Dog" did. There must be a space on either side of the "+" sign. I couldn't get

    "and" or "or" to work and I can not explain why. This is an extremely important

    function in refining searches.

    It is also interesting to note that on first glance when doing the "Cat + Dog"

    search you would think that it is not working because you get a lot of pictures of

    single animals. This happens because one of the sources of these images is

    "petcraft.com" which is a pet store catering to all kinds of pets. Their Meta tags

    include: " african, bird, canary, cat, cichlids, dog, friends, kitten, meet,

    petcraft, pig, potbelly, puppy, red, send, sheridan, want and world" All these

    words are attached to every image that comes from this site. Consequently, the

    picture of a pig has the keywords "cat" and "dog".

  • All searches look for "exact phrase matching" of any keyword.

    Thus, if the word entered was "Catskills" and you search on "Catskill" you won't

    find it. The important thing to note here is that if you were putting Meta tags on

    your site in the hopes of being indexed by Arriba Vista you should put in both the

    singlar and plural forms of important words. Many searchers tend to use plural when

    they are looking for a singular subject and visa versa.

    Some search engines automatically look for plurals. At this stage, this one does

    not.

  • Currently all images on the site are sequenced on a first

    on, first to be pulled up basis depending on the search criteria. This means that

    any new images are likely to be at the bottom of the pack and there is a good chance

    that users will not look beyond the first couple hundred images for any search

    criteria. Some search engines use a reverse order process where the newest images

    added (in date order) are looked at first. This system has more appeal to image

    provider, but in a subject area where lots of new images are added even relatively

    new images will work their way down into the pile very rapidly. Arriba Vista is

    considering changing the way they order images.

  • When they first started, Arriba Vista was going directly to the image file at

    the image owner's site once someone clicked on the thumbnail. In many cases this

    was removing the context of the rest of the information that the site creator had

    put on his page around the picture. Photographers complained and Arriba Vista

    listened. They adjusted their search parameters so the page acquired when the

    thumbnail is clicked is one step back from the actual image file itself. This

    usually gives all of the textual information that relates to the image and thus all

    the context is preserved. If, in a photographer's site, his or her copyright

    information appears on the screen next to the preview size image it should now be

    preserved.

    Advantages And Disadvantages For Photographers

    The advantage is that photographers are charged nothing to participate. Arriba

    Vista earns all their revenue by selling ad space on their site. They expect that

    to be their sole source of income. Some photographers have complained that Arriba

    Vista is

    profiting by using their images, but it seems to me that what they are doing differs

    little from what all the major text search engines like Yahoo, Alta Vista, Excite,

    Infoseek, Lycos, etc are doing. The main difference is that they provide a more

    efficient method for users to search for certain types of data.

    The disadvantage is that in its current form it seems unlikely that it will aid

    photographers in earning revenue. It may bring more non-revenue traffic to certain

    pages within their site.

    It seems likely that a high percentage of vists will be from those wanting to make

    small personal uses of the information. Thus far no one has worked out a successful

    model for collecting for these uses. Arriba Vista says they hope to find ways to

    charge fees for consumer use. If they can work out a system the photographers who

    created the images will receive a share of any fees collected.

    There is also a fear that this increased traffic from the general consumers

    population will lead to more misuse. Most stock photographers would like to find

    ways to draw increase traffic from professional users and keep the consumer interest

    in their sites to a minimum. Its called target marketing. The goal of Arriba Vista

    is to reach all potential users and not target any specific group.

    Some photographers are concerned that visitors to the site will think these pictures

    are free to use for any purpose and will not recognize that some of them are

    copyrighted. Arriba Vista has placed the following rights notice under each

    thumbnail once it is selected from a group of thumbnails returned from a search:

    Arriba Vista provides a visual mechanism to search the Web using images instead of

    text. Users are directed to the originating web site on which the images are

    located. Should you wish to use any image, photo or artwork you see during the

    search process, you must obtain the appropriate permission from the owner of the

    material.

    Problems

    As I see it there are both technological and philosophical problems. The

    technological problems may be relatively easy to solve in time. They include:

      --Lack of boolean searches.

      --Developing a better system for making new images on the site available to the

      user, rather than having them always fall at the bottom of the pack. This system

      might involve putting all images acquired at the top based on the date of

      acquisition.

      --Improved Natural Language technology.

      --Improving the quality of the connection list and the thesaurus.

      --A system for attaching specific meta words to specific image files at the

      photographers site thus making it possible for the creator of the web site to

      provide more accurate data about each individual image. In some cases they are

      already accepting keywords from a few photographers.

    However, implementing new and improved versions of the "natural language," the

    "connection list," and to a great extent the "boolean" searches will result in

    little improvement in the site unless more accurate and extensive keywording is

    provided for each image.

    The keywording issue is a difficult one to overcome. The people at Arriba Vista

    believe they can automate this process to a great extent. I am skeptical. The

    inappropriate words on the current site would tend to justify my skepticism. Better

    keywording can be achieved in one of two ways. Arriba Vista could hire humans to

    look at each image and the text related to it and make judgements about what words

    should be added or deleted from the list automatically produced. This is time

    consuming, and probably not cost effective.

    The second way is for those who created the web site to provide appropriate keywords

    and captions. Peter Spicer says they are already accepting keywords and image files

    supplied on disc from some photographers. However, it is my belief that there is

    not enough commercial incentive for the vast majority of people who created the

    pages Arriba Vista is currently indexing to go to this trouble. Most have no

    interest in licensing rights to their images. They will not perceive that they will

    achieve enough benefit from increased eyeballs to their site to justify the expense

    of this keywording. They will spend their promotional dollars in other ways.

    Professional image sellers will recognize the value of keywording and will go to the

    trouble. But, these people will want to be on the Media Commerce site, not the

    current search engine.

    Image sellers need to think carefully about how best to market their images. The

    people at Arriba Vista seem to believe that more eyeballs looking at your images the

    better, but for the professional photographer that may not necessarily be the case.

    It may be more important to have the right kind of eyeballs, not just more. The

    Media Commerce site will probably provide the "right kind of eyeballs," the current

    site doesn't. Some individual photographers whose sites have been promoted on

    other search engines, but have not yet benefited from the increased traffic Arriba

    Vista might generate, are already finding that they have to spend too much time

    fielding requests from individuals who do not want to pay enough for the use on an

    image to justify handling the transaction. These photographers do not need more of

    this type of traffic.

    Even if school children began to show an interest in actually paying to use images

    there is no guarantee that they would be willing to pay enough to offset the cost of

    supplying the service.

    Such payments might be enough for the consolidator (Arriba Vista) to make a profit,

    but not enough for the thousands of individual suppliers of images to individually

    make enough to justify participation in the project.

    Based on the experiences so far it seems that everyone who has tried to reach the

    consumer market finds it costs much more to service than the revenue generated.

    Individual photographer certainly don't want to have to field calls or e-mails from

    customers who want to buy rights to an image for $1.00 or $2.00.

    Because Arriba Vista goes after every image regardless of quality or demand for the

    subject matter they clutter the site with a huge amount of imagery for which there

    is little or no demand. They are getting about 3 million hits per day (thumbnails

    served), but that is probably heavily weighted toward the educational market not the

    commercial market. As we pointed out earlier the site may be helpful for those

    looking for a little general information about a topic because it narrows the search

    to only those pages that include pictures.

    Getting Images From Sites Where We Have Licensed Use

    A major problem for photographers will be when engines like this capture our images from

    sites where we have licensed legal use of our images. Here's how that will work.

    Since Arriba Vista searches for all URL's with the .jpeg or .gif extensions they also find all

    images at magazine or newspaper sites. These sites could prevent their images from

    being picked up if they use the "robots.txt" command to prevent robots from indexing

    their site. Some use this command, but many want their site indexed and listed by as

    many search engines as possible so users can find them.

    Thus, if you have allowed your pictures to be used on magazine or newspaper sites,

    or licensed a use on some commercial site, there is a good chance your images are

    already in the Arriba Vista index. If that is the case, in all likelihood your name

    will not be attached to the image because either your name did not appear at all on

    the site, or if it did it was not included in the information that Arriba Vista's

    spider picks up.

    Even if the photographer's name is listed, anyone who finds that image will be

    referred back to the URL where the images was used, not to the photographer's URL.

    The first contact with that other company will be the webmaster, not with the person

    with whom the photographer or stock agent negotiated the deal. At this point their

    are several ways this whole thing can fall apart as far as the photographer is

    concerned. The webmaster probably has no idea what agreements were negotiated. He

    may say "OK" as long as his company is credited because his goal is to get his

    company's message out to as many eyeballs as possible. If the use is for another

    web site the size of the file on the web is probably perfectly satisfactory.

    If the webmaster wants to check on clearance he may have no idea who within his

    company he should go to. The chances that the request will get back to the

    photographer are slim. It seems to me that in all likelihood there will be a huge

    amount of misuse resulting from this system.

    This is a problem, not just for photographers with their personal sites, but for The

    Image Bank, Tony Stone Images, The Stock Market, Photodisc, Corbis and all the rest

    of rest of the major image suppliers. At present, there may not be that many web

    uses of images that will be available to be sucked up by the search engine spiders,

    but there are stong indications that this usage is going to grow quite rapidly in

    the next few years.

    Lessons Learned

    This site demonstrates the degree to which images on the web can be randomly located

    and cataloged using automated systems. It demonstrates that thumbnails can be

    created of any image found on the web and stored somewhere other than your site. It

    is also clear that search engines can capture your images, without your knowledge,

    and use them in connection with their own advertising, unless you use great care in

    how you set up your site.

    Arriba Vista is trying to be a responsible web partner and will not upload images of

    anyone who requests that their images not be included in the index. They will also

    remove images that have been uploaded. There is no assurance that other site

    opeartors will be as responsible.

    The important thing to recognize is not so much what this specific company, Arriba

    Vista, is doing, but what it is possible to accomplish with today's technology.

    Others will be doing the same type of thing in the near future.

    Protecting Yourself From Spiders

    There are a variety of ways to protect your images on the web.

    One is to embed your copyright information and a contact number into any image file.

    You can do this by opening the image in PhotoShop, adding a bar above or below the

    image and placing your visible notice within that file. To see a sample of this you

    can look at the preview images on www.workbook.com. Many of these images also have

    a visible watermark on the image itself.

    This way the copyright information will always travel with the image. The downside

    is that this bar, on anything within the image itself, will probably be edited out

    by any client who licenses usage of the image.

    At Arriba Vista all watermarks and copyright management information (CMI) embedded

    in the image is maintained. There is no capability in their software to tamper with

    embedded CMI.

    That said, many copyright notices etc. are placed nearby as HTML text. Depending on

    how the Web page is constructed, that text file may reside in a completely different

    file from any Meta tag information or other textual context. Sometimes, it can even

    appear on different servers for various types of ASP (Active Server Pages) pages.

    In cases like this, the crawler may not capture the text based copyright notice.

    It is becoming increasingly important to make sure something like the invisible

    Digimarc is embedded in every image file that is licensed to a client for use on the

    web. Image producers ought to also take a look at Digital Object Identifiers (DOI)

    (www.doi.org) as a way to insure that any image can always be tracked to their

    current address and contact information.

    How To Avoid Getting Picked Up

    If you have a site it is probably a good idea to contact Arribavista and either ask

    them to index your site, or tell them specifically that you don't want your site

    indexed. They will honor either request.

    You can also use Robot Exclusion Protocols which Arriba Vista and other search

    engines honor. For more information about how robots and search engines in general

    work you might want to look at www.searchenginewatch.com. They have a comprehensive

    site and they list the names of many of the crawlers used by the search engines.

    John Treacy of Arriba Vista supplied the following information:

    The robots.txt file allows for the exclusion of all crawlers or specific

    crawler(s). This method should be used if you have access to the root directory of

    a web site and know specific directories you want excluded. The robots.txt file

    MUST be located in the root directory of a given web site.

    The following are the procedures for setting up a Robots.TXT file or Meta Tags to

    exclude the Arriba Vista web crawler. If you have any further question please

    contact Arriba Vista.

    Robots.TXT and Meta Tag Procedures

    There are two ways to exclude the Arriba Vista (ArribaPacketRat) robot:

    Method 1 (the robots meta tag)

    The meta tag system is ideal for excluding specific pages or for users who do not

    have access to the root directory of the web site and want all robots excluded the

    same. The robots meta tag is not fully supported by all crawlers, but it is

    supported by Arriba Vista. To exclude Arriba Vista through this method, place a

    meta tag in the head of your html document with the name "robots" and place

    restrictions in the content space of the meta tag.

    Supported Restrictions:

    noindex - Don't index this page

    nofollow - Don't follow links off of this page

    nomediaindex - Don't index media on this page (Specific to Arriba Vista)

    Separate restrictions may be grouped together in one tag as in the

    following example: meta name="robots" content="noindex,nofollow". This should

    be enclosed in the HTML brackets and should be inserted after the Head in the

    HTML structure.

    Method 2 (robots.txt)

    This method is the easiest if you have access to the root directory of a web site

    and know specific directories you want excluded. According to the standards for

    robot exclusion, the robots.txt file MUST be located in the root directory of a

    given web site which is difficult for people who don't have there own domain. For

    Arriba Vista (http://www.arribavista.com), the robots.txt file would be located at

    http://www.arribavista.com/robots.txt.

    An example of an invalid robots.txt location is

    http://www.arribavista.com/foo/robots.txt. This file would not be looked at. The

    contents of the robots.txt file allows for the exclusion of specific robot(s) or all

    robots.

    To exclude all robots from the entire site the contents of the robots.txt file would

    be:

    # anything on a line after a # sign is ignored

    User-agent:*#This excludes all crawlers (any text after the # sign is ignored)

    Disallow:/

    To exclude only the Arriba Vista crawler from the entire site the contents of the

    robots.txt file would be:

    User-agent: Ditto Sypder

    Disallow:/

    Alternatively, if you wanted to exclude the Arriba Vista crawler from specific

    directories, you could add a Disallow line for each directory you do not want

    indexed.

    User-agent: DittoSpyder # Arriba Vista Image Search

    Disallow:/personal

    Disallow:/images

    Disallow:/bar


  • Copyright © 1999 Jim Pickerell. The above article may not be copied, reproduced, excerpted or distributed in any manner without written permission from the author. All requests should be submitted to Selling Stock at 10319 Westlake Drive, Suite 162, Bethesda, MD 20817, phone 301-461-7627, e-mail: wvz@fpcubgbf.pbz

    Jim Pickerell is founder of www.selling-stock.com, an online newsletter that publishes daily. He is also available for personal telephone consultations on pricing and other matters related to stock photography. He occasionally acts as an expert witness on matters related to stock photography. For his current curriculum vitae go to: http://www.jimpickerell.com/Curriculum-Vitae.aspx.  

    Comments

    Be the first to comment below.

    Post Comment

    Please log in or create an account to post comments.