Arriba Vista

Posted on 4/13/1999 by Jim Pickerell | Printable Version | Comments (0)

214

ARRIBA VISTA

April 13, 1999

In Novermber 1998 Arriba Soft Corporation launched the Arriba Vista Image Searcher

on the Web at www.arribavista.com .

To date it has cataloged about 1.5 million images. They estimate they have crawled

about 30% of the web which means they might

be able to catalog 5 million images once they search the entire web. This does not

include the images they are not allowed to include because they are excluded from

certain web sites.

In one sense this should be what individual photographers have been waiting for

because it can help make their individual sites become known to the world at large.

Arriba Vista had about 3 million page hits in February and based on growth curves

they expect that to reach 10 million page hits per month in the near future.

On the other hand, many photographers who have heard anything about this site are up

in arms over its existence and the fact that Arriba Vista store thumbnails of their

images on the Arriba Vista web site without the photographer's copyright.

We will try to explain how the Arribavista site works and the pros and cons of such

sites. One of the things to recognize is that Arriba Vista is not the only search

engine to use this technique. Alta Vista has the "AV Photo Finder" that works in

more or less the same way.

Note: Alta Vista has signed an agreement with Corbis. In exchange for allowing

their images to be shown on the AV Photo Finder site, Corbis is guaranteed that

their images will appear first on any search. This may be great for Corbis

photographers, but the images of all other photographers will appear so far down in

the pile that the chances of anyone ever looking at them is slim.

Getting Images Seen

The big problem for photographers selling stock through individual sites is in

letting potential customers know that their site exists. They can send postcards or

e-mail to their regular clients. For photographers who are primarily interested in

developing assignment business that may be perfectly satisfactory. But, stock

photographers need to cast a wider net because many of the people who will be

interested in using their images will be people with whom the photographer has had

no previous contact.

In the past one solution for many photographers was to get listed on the major text

search engines. That isn't always satisfactory from a user point of view for two

reasons. Text listings tell so little about the specifics on a given site that it

is often difficult to know what you will find when you go there. In addition, if

after reading the text the user thinks there might be something of interest at the

site the user still has to click again in order to see an image. This makes

searching for images deadly slow and frustrating. Anyone who is under any type of

time pressure is simply unlikely to do it. It is here that Arriba Vista and other

sites like them offer a major breakthrough in searching for information in general,

or looking for images.

Professional users are quickly turned off if they have to open lots of sites just to

discover they offer nothing useful. If the photographer has a tight specialization,

a text listing may work, but photographers with broad general files have trouble

explaining the specifics of their file in the few words of description that are

found on one of the major search engines.

Given the large number of sites to choose from there is little likelihood that a

user will bother to open the johnjonesimages site unless the user knows something

about John Jones' work already.

Advantages Over Text Sites

The advantage for the user in using the Arriba Vista site is that it focuses just on

images, not everything related to a particular subject. Instead of getting a

general description or what is on a site the user sees thumbnails of individual

images based on the keyword search.

The eye can then determine by looking at a thumbnail image whether there is likely

to be relevant information on that site. This visual information enables the user

to make much more rapid and accurate decisions about which sites to visit that is

possible using a text search system.

Moreover, if the user is looking for a picture or illustration he or she doesn't

have to wade through thousands of articles on the subject in order to find images.

When the user clicks on the thumbnail they are taken directly to the page where that

photo appears. For most photographer sites that page may be the image itself with

some caption or copyright information connected to it. However, for the vast

majority of web sites where images are found the image is inserted in a page of text

that relates to the image.

Peter Spicer, Chief Technology Officer at Arriba Vista, points out that he received

a call recently from a mother who had been helping her son produce a paper on

Abraham Lincoln. She liked the Arriba Vista site, because there were "fewer" items

to look through and she got the information she needed faster. She started out on

the text search engines and got too much information related to the keywords

"Abraham Lincoln."

At Arriba Vista she felt she was able to get some very good information based on

pictures of Abraham Lincoln, and the text associated with those pictures, rather

than being overwhelmed by all the options at a text site. She still got

inappropriate hits, but possibly not quite as many of them as she would get on the

text search sites.

Searching on Arriba Vista for Abraham Lincoln we got 635 hits which were a

preponderance of pictures of Abraham Lincoln. However, we also got pictures of the

"Lincon cent," "Lincoln Park," people whose first name was Abraham and a picture of

a Curtis biplane. This last was really interesting as it appeared in the airplane

section of book called "Practical Mechanics For Boys." This picture came up when

you search for Abraham Lincoln because the caption under the biplane picture was

"Lincoln Beachey in a Curtis Biplane."

By way of comparison if we search for Abraham Lincoln on Altavista's main text

search area we get 45,234 pages. On their "AV Photo Finder" Abraham Lincoln gets

2985 hits.

It should be recognized that the vast majority of users of the Arriba Vista site are

consumers, not professional users. Thus, the chances of commercially licensing

rights to your images as a result of their appearance on Arriba Vista is slim.

Media Commerce

Arriba Vista acknowledges that site in its current format may be of little value to

professional photographers trying to license rights to their work, or to

professional researchers who are trying to find images to license. The professional

researcher has to go through too much chaff in order to find a few useful bits of

information and thus is likely to be turned off.

To meet the needs of these two groups Arriba Vista is developing a companion site

directed toward Media Commerce. Initially this site will focus on licensing rights

to royalty free images for fixed prices. Later, they will develop a Rights

Protected section of this site where usage fees can be negotiated for individual

uses.

PhotoSphere will be one of the first participants on the site. Arriba Vista is

talking to other RF producers now to try to get others to sign on.

According to Sue Clemons, formerly with Superstock, they will start with RF because

it is easier to get that side of their business going since they don't have to deal

with negotiating sales. On the Rights Protected site they may allow individual

agencies to handle their own negotiations, but that has yet to be decided.

Every participant on the site will have to specifically request that their images be

included and initially they will only work with agencies. At some later date they

may accept images from individual photographers. All aspects of the Rights

Protected site are still in the initial planning stages.

If they eventually accept images from individual photographers it could make it

possible for many photographers with individual sites to overcome the marketing

hurdles of a personal web site. This could be extremely important to photographers

who have been unable to get agent representation.

How The Site Works In Brief

The following are the basics of how the current site works.

They use a "spider" to search the net for image files with extensions like

.jpeg .tiff .gif. This is a continuous process of looking for new information and

can either be random, or targeted at specific sites by the webmaster. Individuals

can request that their site be indexed by sending a message to the webmaster at

Arriba Vista. Recent studies indicate that the largest engine, Alta Vista, has

probably crawled no more than 50% of the Web although no one is really certain given

the phenomenal growth of the web.

The software captures each image found, creates a thumbnail which is stored on

the Arriba Vista site along with the path back to the original site.

It creates keywords for the image by using Meta data found on the site. Meta

tags are hidden words that appear in the head element of an HTML document and can be

extracted by servers/clients for use in dentifying, indexing and cataloging

specialized documents. This is used to advertise the contents of the document.

All pictures on the Arriba Vista site will be keyworded with all the words selected

from the Meta tags, even though some of these words may not directly apply to

specific pictures. Arriba Vista's goal is to ensure that keywords have semantic

relevancy to the image but that often can not be accomplished without human

intervention to view the image. Currently there is no way to attach meta tags to

individual .jpeg files.

In addition to the Meta tags which typically do not supply enough detail,

Arriba Vista employs a nine (9) step relevancy ranking of other text data in context

with the images. In particular it looks at headlines, sub heads and captions that

are close to the image.

This can work for editorial images when the spider finds them inserted in a

document. It is unlikely to work well for concept images used on a commercial site

because the preponderance of the text will probably not relate to the images. It

will be of little help at all at most photographer sites because they tend not to

use text to describe or amplify their images.

On average the spider softward generates about 10 keywords per image. Many

images have fewer words. Given the automated system for collecting these keywords

many of the words are inappropriate to describing the specific image. When

searching for a particular word string the user gets a high percentage of inaccurate

hits.

Arriba Vista also uses a manual process where a human looks at some of the

images to validate the (semantic) relevancy of keyword/image pairs. Manual review

is extremely expensive compared with the automated system. It is unclear what

percentage of the images get this manual review, but from looking at the number of

inappropriate keywords attached to the images in most searches it would appear that

there is very little manual review at this time.

There is a "connection list" of words used at the search level that tries to

capture the intent of the user from the words used. For example if the user enters

ecology the search engine will look for the words like "air," "water," etc. This

connection list was developed by Arriba Vista and is proprietary.

To get an idea of the confusing connections this list can produce consider the

following. A shoe store site promotes the fact that they have "Air Jordan" shoes in

their META tags. They show pictures of all their shoes on their site. Someone

searches for "ecology". In Arriba Vista's "connection list", which the search engine

automatically uses on every search, the words "air," "water" as well as many other

things are attached to "ecology." Thus, in the search for ecology the search engine

pulls all the images that have the keyword "air" and gets a whole bunch of shoes.

Currently allow for boolean searches (AND, OR, + and -) according to the Help

menu. This is a useful tool, but it does not seem to be fully operational. John

Treacy, VP of Marketing, says,

"Today, you much enter a + sign to produce correct 'phrases"'.

The example used in the Help menu is "Cat+Dog". That didn't work for me, but "Cat +

Dog" did. There must be a space on either side of the "+" sign. I couldn't get

"and" or "or" to work and I can not explain why. This is an extremely important

function in refining searches.

It is also interesting to note that on first glance when doing the "Cat + Dog"

search you would think that it is not working because you get a lot of pictures of

single animals. This happens because one of the sources of these images is

"petcraft.com" which is a pet store catering to all kinds of pets. Their Meta tags

include: " african, bird, canary, cat, cichlids, dog, friends, kitten, meet,

petcraft, pig, potbelly, puppy, red, send, sheridan, want and world" All these

words are attached to every image that comes from this site. Consequently, the

picture of a pig has the keywords "cat" and "dog".

All searches look for "exact phrase matching" of any keyword.

Thus, if the word entered was "Catskills" and you search on "Catskill" you won't

find it. The important thing to note here is that if you were putting Meta tags on

your site in the hopes of being indexed by Arriba Vista you should put in both the

singlar and plural forms of important words. Many searchers tend to use plural when

they are looking for a singular subject and visa versa.

Some search engines automatically look for plurals. At this stage, this one does

not.

Currently all images on the site are sequenced on a first

on, first to be pulled up basis depending on the search criteria. This means that

any new images are likely to be at the bottom of the pack and there is a good chance

that users will not look beyond the first couple hundred images for any search

criteria. Some search engines use a reverse order process where the newest images

added (in date order) are looked at first. This system has more appeal to image

provider, but in a subject area where lots of new images are added even relatively

new images will work their way down into the pile very rapidly. Arriba Vista is

considering changing the way they order images.

When they first started, Arriba Vista was going directly to the image file at

the image owner's site once someone clicked on the thumbnail. In many cases this

was removing the context of the rest of the information that the site creator had

put on his page around the picture. Photographers complained and Arriba Vista

listened. They adjusted their search parameters so the page acquired when the

thumbnail is clicked is one step back from the actual image file itself. This

usually gives all of the textual information that relates to the image and thus all

the context is preserved. If, in a photographer's site, his or her copyright

information appears on the screen next to the preview size image it should now be

preserved.

Advantages And Disadvantages For Photographers

The advantage is that photographers are charged nothing to participate. Arriba

Vista earns all their revenue by selling ad space on their site. They expect that

to be their sole source of income. Some photographers have complained that Arriba

Vista is

profiting by using their images, but it seems to me that what they are doing differs

little from what all the major text search engines like Yahoo, Alta Vista, Excite,

Infoseek, Lycos, etc are doing. The main difference is that they provide a more

efficient method for users to search for certain types of data.

The disadvantage is that in its current form it seems unlikely that it will aid

photographers in earning revenue. It may bring more non-revenue traffic to certain

pages within their site.

It seems likely that a high percentage of vists will be from those wanting to make

small personal uses of the information. Thus far no one has worked out a successful

model for collecting for these uses. Arriba Vista says they hope to find ways to

charge fees for consumer use. If they can work out a system the photographers who

created the images will receive a share of any fees collected.

There is also a fear that this increased traffic from the general consumers

population will lead to more misuse. Most stock photographers would like to find

ways to draw increase traffic from professional users and keep the consumer interest

in their sites to a minimum. Its called target marketing. The goal of Arriba Vista

is to reach all potential users and not target any specific group.

Some photographers are concerned that visitors to the site will think these pictures

are free to use for any purpose and will not recognize that some of them are

copyrighted. Arriba Vista has placed the following rights notice under each

thumbnail once it is selected from a group of thumbnails returned from a search:

Arriba Vista provides a visual mechanism to search the Web using images instead of

text. Users are directed to the originating web site on which the images are

located. Should you wish to use any image, photo or artwork you see during the

search process, you must obtain the appropriate permission from the owner of the

material.

Problems

As I see it there are both technological and philosophical problems. The

technological problems may be relatively easy to solve in time. They include:

--Developing a better system for making new images on the site available to the

user, rather than having them always fall at the bottom of the pack. This system

might involve putting all images acquired at the top based on the date of

acquisition.

--Improved Natural Language technology.

--Improving the quality of the connection list and the thesaurus.

--A system for attaching specific meta words to specific image files at the

photographers site thus making it possible for the creator of the web site to

provide more accurate data about each individual image. In some cases they are

already accepting keywords from a few photographers.

However, implementing new and improved versions of the "natural language," the

"connection list," and to a great extent the "boolean" searches will result in

little improvement in the site unless more accurate and extensive keywording is

provided for each image.

The keywording issue is a difficult one to overcome. The people at Arriba Vista

believe they can automate this process to a great extent. I am skeptical. The

inappropriate words on the current site would tend to justify my skepticism. Better

keywording can be achieved in one of two ways. Arriba Vista could hire humans to

look at each image and the text related to it and make judgements about what words

should be added or deleted from the list automatically produced. This is time

consuming, and probably not cost effective.

The second way is for those who created the web site to provide appropriate keywords

and captions. Peter Spicer says they are already accepting keywords and image files

supplied on disc from some photographers. However, it is my belief that there is

not enough commercial incentive for the vast majority of people who created the

pages Arriba Vista is currently indexing to go to this trouble. Most have no

interest in licensing rights to their images. They will not perceive that they will

achieve enough benefit from increased eyeballs to their site to justify the expense

of this keywording. They will spend their promotional dollars in other ways.

Professional image sellers will recognize the value of keywording and will go to the

trouble. But, these people will want to be on the Media Commerce site, not the

current search engine.

Image sellers need to think carefully about how best to market their images. The

people at Arriba Vista seem to believe that more eyeballs looking at your images the

better, but for the professional photographer that may not necessarily be the case.

It may be more important to have the right kind of eyeballs, not just more. The

Media Commerce site will probably provide the "right kind of eyeballs," the current

site doesn't. Some individual photographers whose sites have been promoted on

other search engines, but have not yet benefited from the increased traffic Arriba

Vista might generate, are already finding that they have to spend too much time

fielding requests from individuals who do not want to pay enough for the use on an

image to justify handling the transaction. These photographers do not need more of

this type of traffic.

Even if school children began to show an interest in actually paying to use images

there is no guarantee that they would be willing to pay enough to offset the cost of

supplying the service.

Such payments might be enough for the consolidator (Arriba Vista) to make a profit,

but not enough for the thousands of individual suppliers of images to individually

make enough to justify participation in the project.

Based on the experiences so far it seems that everyone who has tried to reach the

consumer market finds it costs much more to service than the revenue generated.

Individual photographer certainly don't want to have to field calls or e-mails from

customers who want to buy rights to an image for $1.00 or $2.00.

Because Arriba Vista goes after every image regardless of quality or demand for the

subject matter they clutter the site with a huge amount of imagery for which there

is little or no demand. They are getting about 3 million hits per day (thumbnails

served), but that is probably heavily weighted toward the educational market not the

commercial market. As we pointed out earlier the site may be helpful for those

looking for a little general information about a topic because it narrows the search

to only those pages that include pictures.

Getting Images From Sites Where We Have Licensed Use

A major problem for photographers will be when engines like this capture our images from

sites where we have licensed legal use of our images. Here's how that will work.

Since Arriba Vista searches for all URL's with the .jpeg or .gif extensions they also find all

images at magazine or newspaper sites. These sites could prevent their images from

being picked up if they use the "robots.txt" command to prevent robots from indexing

their site. Some use this command, but many want their site indexed and listed by as

many search engines as possible so users can find them.

Thus, if you have allowed your pictures to be used on magazine or newspaper sites,

or licensed a use on some commercial site, there is a good chance your images are

already in the Arriba Vista index. If that is the case, in all likelihood your name

will not be attached to the image because either your name did not appear at all on

the site, or if it did it was not included in the information that Arriba Vista's

spider picks up.

Even if the photographer's name is listed, anyone who finds that image will be

referred back to the URL where the images was used, not to the photographer's URL.

The first contact with that other company will be the webmaster, not with the person

with whom the photographer or stock agent negotiated the deal. At this point their

are several ways this whole thing can fall apart as far as the photographer is

concerned. The webmaster probably has no idea what agreements were negotiated. He

may say "OK" as long as his company is credited because his goal is to get his

company's message out to as many eyeballs as possible. If the use is for another

web site the size of the file on the web is probably perfectly satisfactory.

If the webmaster wants to check on clearance he may have no idea who within his

company he should go to. The chances that the request will get back to the

photographer are slim. It seems to me that in all likelihood there will be a huge

amount of misuse resulting from this system.

This is a problem, not just for photographers with their personal sites, but for The

Image Bank, Tony Stone Images, The Stock Market, Photodisc, Corbis and all the rest

of rest of the major image suppliers. At present, there may not be that many web

uses of images that will be available to be sucked up by the search engine spiders,

but there are stong indications that this usage is going to grow quite rapidly in

the next few years.

Lessons Learned

This site demonstrates the degree to which images on the web can be randomly located

and cataloged using automated systems. It demonstrates that thumbnails can be

created of any image found on the web and stored somewhere other than your site. It

is also clear that search engines can capture your images, without your knowledge,

and use them in connection with their own advertising, unless you use great care in

how you set up your site.

Arriba Vista is trying to be a responsible web partner and will not upload images of

anyone who requests that their images not be included in the index. They will also

remove images that have been uploaded. There is no assurance that other site

opeartors will be as responsible.

The important thing to recognize is not so much what this specific company, Arriba

Vista, is doing, but what it is possible to accomplish with today's technology.

Others will be doing the same type of thing in the near future.

Protecting Yourself From Spiders

There are a variety of ways to protect your images on the web.

One is to embed your copyright information and a contact number into any image file.

You can do this by opening the image in PhotoShop, adding a bar above or below the

image and placing your visible notice within that file. To see a sample of this you

can look at the preview images on www.workbook.com. Many of these images also have

a visible watermark on the image itself.

This way the copyright information will always travel with the image. The downside

is that this bar, on anything within the image itself, will probably be edited out

by any client who licenses usage of the image.

At Arriba Vista all watermarks and copyright management information (CMI) embedded

in the image is maintained. There is no capability in their software to tamper with

embedded CMI.

That said, many copyright notices etc. are placed nearby as HTML text. Depending on

how the Web page is constructed, that text file may reside in a completely different

file from any Meta tag information or other textual context. Sometimes, it can even

appear on different servers for various types of ASP (Active Server Pages) pages.

In cases like this, the crawler may not capture the text based copyright notice.

It is becoming increasingly important to make sure something like the invisible

Digimarc is embedded in every image file that is licensed to a client for use on the

web. Image producers ought to also take a look at Digital Object Identifiers (DOI)

(www.doi.org) as a way to insure that any image can always be tracked to their

current address and contact information.

How To Avoid Getting Picked Up

If you have a site it is probably a good idea to contact Arribavista and either ask

them to index your site, or tell them specifically that you don't want your site

indexed. They will honor either request.

You can also use Robot Exclusion Protocols which Arriba Vista and other search

engines honor. For more information about how robots and search engines in general

work you might want to look at www.searchenginewatch.com. They have a comprehensive

site and they list the names of many of the crawlers used by the search engines.

John Treacy of Arriba Vista supplied the following information:

The robots.txt file allows for the exclusion of all crawlers or specific

crawler(s). This method should be used if you have access to the root directory of

a web site and know specific directories you want excluded. The robots.txt file

MUST be located in the root directory of a given web site.

The following are the procedures for setting up a Robots.TXT file or Meta Tags to

exclude the Arriba Vista web crawler. If you have any further question please

contact Arriba Vista.

Robots.TXT and Meta Tag Procedures

There are two ways to exclude the Arriba Vista (ArribaPacketRat) robot:

Method 1 (the robots meta tag)

The meta tag system is ideal for excluding specific pages or for users who do not

have access to the root directory of the web site and want all robots excluded the

same. The robots meta tag is not fully supported by all crawlers, but it is

supported by Arriba Vista. To exclude Arriba Vista through this method, place a

meta tag in the head of your html document with the name "robots" and place

restrictions in the content space of the meta tag.

Supported Restrictions:

noindex - Don't index this page

nofollow - Don't follow links off of this page

nomediaindex - Don't index media on this page (Specific to Arriba Vista)

Separate restrictions may be grouped together in one tag as in the

following example: meta name="robots" content="noindex,nofollow". This should

be enclosed in the HTML brackets and should be inserted after the Head in the

HTML structure.

Method 2 (robots.txt)

This method is the easiest if you have access to the root directory of a web site

and know specific directories you want excluded. According to the standards for

robot exclusion, the robots.txt file MUST be located in the root directory of a

given web site which is difficult for people who don't have there own domain. For

Arriba Vista (http://www.arribavista.com), the robots.txt file would be located at

http://www.arribavista.com/robots.txt.

An example of an invalid robots.txt location is

http://www.arribavista.com/foo/robots.txt. This file would not be looked at. The

contents of the robots.txt file allows for the exclusion of specific robot(s) or all

robots.

To exclude all robots from the entire site the contents of the robots.txt file would

be:

# anything on a line after a # sign is ignored

User-agent:*#This excludes all crawlers (any text after the # sign is ignored)

Disallow:/

To exclude only the Arriba Vista crawler from the entire site the contents of the

robots.txt file would be:

User-agent: Ditto Sypder

Disallow:/

Alternatively, if you wanted to exclude the Arriba Vista crawler from specific

directories, you could add a Disallow line for each directory you do not want

indexed.

User-agent: DittoSpyder # Arriba Vista Image Search

Disallow:/personal

Disallow:/images

Disallow:/bar

Arriba Vista

Comments

Post Comment

Stay Connected

Follow Us

Free Stuff

Stock Photo Pricing: The Future

Future Of Stock Photography

Blockchain Stories

2017 Stories Worth Reviewing

Stories Related To Stock Photo Pricing

Stock Photo Prices: The Future

Important Stock Photo Industry Issues

Recent Stories – Summer 2016

Corbis Acquisition by VCG/Getty Images

Finding The Right Image