Monday 25 October 2010

The known and the unknown - keywording for visibility

Why is everyone talking about keywording? People in the image industry are scratching their heads about ways to keyword their images. Now the web is buzzing with visual material, words need to be deployed intelligently to ensure images can be found by the people who need them.

Technology offers other ways of finding images, you may say, which don't require so much human input. Visual recognition techniques do offer clever ways to look for images, but computers can only learn from the way humans keyword the images in the first place, and they are not very clever at understanding abstract concepts. How do you explain to a computer all the different ways of expressing the idea of freedom, for example? Can love only be expressed by the shape of a heart, or a smile between two people? Human interpretation is still needed, and computers are still taking baby steps at recognising 'things in the picture' like trees and tables, never mind the more abstract and subtle signifiers found in visual material. So what we are looking at, for some time to come, is human tagging of images to make them findable.

The problem anyone keywording images faces is this. Language is a wonderful, expressive tool for communication, there are many ways to say the same thing, and words often have more than one meaning. The word I use to tag my image may not the the one used by the person searching for it. They may use the plural where I used the singular. They may use a different spelling, or different versions of a language like American and UK English. And then there are the requirements of multi language searches.

Any good tagging system needs to scoop up all the variations of a word so that whatever word the searcher uses, they will find their way to the image. Words need to be uniquely defined, so you can tell the difference, for example, between orange the colour, and orange, the fruit. There may be broader terms than the one you first thought of which may be useful, so your image of a train should also appear under a search for transport.

The way to achieve consistency, and to scoop up all the appropriate terms, is to create a controlled vocabulary. With a set of preferred terms, and their synonyms, the vocabulary is usually structured in hierarchical way to include broader and narrower terms. Vocabularies for use with images vocabularies have been informed by work done on text search in the library sector, but they have developed further to include concepts specific to visual material. One of the big advantages is that properly controlled vocabularies can be translated - just once- so that searches can be made in different languages.

Can a single vocabulary describe the entire world, the universe, and everything in it? Yes, if it has top level terms broad enough to cover everything, a logical structure, and sufficient depth to reach down to a granular level.

How does it help in practice? The vocabulary is embedded in the software both at the keyboarding and the search stage, creating automatic links between words and effectively automating much of the keyboarding effort. The keyboarding operative, with a well designed CV and good software can concentrate on interpreting the image for the user audience. Thats the part the machine can't do.

People in the stock image industry have been working on this for decades, and have come up with some pretty good systems for keyboarding, led by teams in large agencies like Getty and Corbis. Now it's time for everyone else to sign up for productive and accurate keywording, learning, where possible, from experience already gained in the industry on keyboarding and customer behaviour. The benefits will be felt not only by smaller picture agencies and photographers, but also in the wider world. Imagery is playing an ever greater part in company DAM systems, where the level and quality of retrieval makes sense of investment in this area. A picture may be worth 1000 words, but without words, a picture may be lost forever.

At Electric Lane we have been increasingly involved in creating vocabularies for image collections. We are also working with the standards body IPTC on a project to create a standard vocabulary to help collections of all sizes raise their keywording standards and make their data more interoperable.

We are offering a one day course, Keywording, on December 7 in London, run by Electric Lane Associate Liisa Kaakinen, a stock image industry keywording and controlled vocabulary expert. The course covers professional keywording techniques and the vocabularies that lie behind them, applied to still and moving images. For those wondering what to do about keywording, this session provides an essential step to understanding the process, the gains, the resources needed, and how to maximise productivity.

For further enquiries about course content contact sarah@electriclane.co.uk, tel 020 7607 1415.

See also
Is Language a moving target
Multilingual Keywording
IPTC Mirror on IPTC Controlled Vocabulary Initiative
Google is not Perfect, Fran Alexander

3 comments:

  1. Good overview of the issues. For those looking for examples of Controlled Vocabularies, take a look at both http://www.controlledvocabulary.com/examples.html and http://www.controlledvocabulary.com/products/index.html

    David

    ReplyDelete
  2. You’ve written nice post, I am gonna bookmark this page, thanks for info. I actually appreciate your own position and I will be sure to come back here.
    juegos.com | juegos de matar | jogos de friv

    ReplyDelete