Back to Dec/Jan 2000 Index

of The American Society for Information Science

Vol. 26, No. 2

December / January 2000

Go to
 Bulletin Index

bookstore2Go to the ASIS Bookstore

Copies

Annual Meeting Coverage

Track 2:

Classification and Representation


by Jessica Milstead

When I began to gather my thoughts for this overview of classification and representation for the 1999 Annual Meeting, I was immediately struck by a sense of d้jเ vu. It was just about 25 years ago that I wrote a chapter on "Document Description and Representation" for the Annual Review of Information Science and Technology , and I spent a few minutes entertaining myself with thoughts of how things had changed – and how they hadn't.

I would like to share some of those thoughts with you, as a device to bring some perspective on what I think of as the problems of making information findable.

    n Then we focused on documents, mostly in print or print-like forms. Now we focus on "resources" or "knowledge," but we're still trying to get at information that has been put away in a package of some kind or another.

    n Then we talked about the problems of "nonprint"; now we are trying to figure out both the kinds of information stored in "images," and how to get at it.

    n Then we talked about cataloging and indexing; today we talk about metadata.

    n Then we built thesauri; today we still build thesauri, but we also build semantic networks and ontologies, and we expect machine assistance in developing the structure.

    n Then we were trying to organize information; today we try to synthesize knowledge.

    n Then we had SMART, developed by Gerry Salton, for which a few hundred documents were a substantial test universe. Today we routinely use a variety of linguistic and statistical techniques on terabytes of data. Some of the statistical techniques actually owe quite a bit to SMART. This is part of the information retrieval track, but it's important to keep in mind the fact that the reason why we organize and tag information is so that we can get at it later.

In my view, the basic problem has not changed. We are still tackling the task of organizing the results of past human intellectual labor so that others can find the particular tidbits they need at some point in the indefinite future. What has changed is the scale of the problem and our basic assumptions about how we're going to deal with it. Today the Internet is a basic fact of life, and that has changed everything, not just in volume, but in instability. Where we used to worry about "gray literature," such as technical reports and preprints, now we face the fact that the information we see on a Web site today may be changed tomorrow – or even be gone completely. Fortunately, we have tools to aid us today that weren't imagined then either.

The papers and presentations at this conference are the best possible evidence of the changes that have occurred, and rather than indulge in more nostalgia, I would like to consider what the presentations we are going to hear this week tell us about today's issues. The papers and sessions in this track can be organized into five overlapping areas:

    n knowledge management and organization

    n tools for development of knowledge organization structures

    n visualization of information

    n metadata

    n digital libraries

Knowledge management and organization

Knowledge is inherently unmanageable. And any organization we impose on it is strictly temporary and had better be adaptable to change. I've lost count of the number of times that I have commiserated with indexers who were bemoaning the problems of dealing with new kinds of information. I usually point out that if knowledge were static we would all be out of jobs. If it were possible to organize knowledge and develop a structure for it once and for all there would be no need for this conference. Fortunately, that's not the way the world is, and therein lie our challenges and what keeps our professional lives interesting.

One of the most exciting aspects of knowledge management today is the way in which so many disparate groups are finding common ground. Museums, archives, libraries and even Web site developers are discovering that they have problems in common and can learn from each other. While much of the discussion of "knowledge management" still revolves around "information management," knowledge management is truly much broader.

Knowledge involves the synthesis of information; what people do with the facts and interpretations they acquire. Thus, knowledge management requires understanding, not just of the data, but of the patterns of communicating and using those data. User needs and behavior have traditionally been a concern of information science, but today the interaction of users and technology is fundamental.

A great deal of knowledge management is still focused on technology or on the management of stored information, and these areas have not lost their importance. If the information isn't organized, no amount of understanding of user interactions will enable its use. And if the available technologies are overwhelmed by the task, then integration and application of information as knowledge will be severely hampered. But, if the ways that people use information, both individually and in groups, are not part of the mix, then the organization schemes and technologies will not pay off as they should.

The information to be found in images has always been especially problematic in both description and interpretation. The information in an image can be conceptualized in three major layers: an image is "of" something, for instance a camping site on a mountainside with various objects lying around. That image may be "about" one of the camps used by climbers on Mount Everest, where they discard their packaging and worn-out supplies. The deeper meaning may be that of human arrogance in trashing the Earth.

The first of these layers is the easiest, but even selecting which elements of an image to describe is a major challenge. Determining the "aboutness" requires background information that may not be present in the image, but is often available. However, determining the deeper meaning for purposes of indexing is inherently unreliable. The image I described may show an example of trashing the Earth to one person. But to another it may show how much technological support is necessary for humans to go where they're not adapted.

Tools for development of knowledge organization structures

To a great extent the tools that are available determine what we do, or at least how we do it. This is not in the sense of a child with a hammer who thinks everything looks like a nail. Rather, technologies are enablers, making certain things feasible that would be impossible otherwise.

To take just one example, in the days when building a thesaurus meant organizing and reorganizing thousands of individual cards, as well as manually checking and revising both sides of cross references, very few thesauri were built and those were usually not updated regularly. Today software, both custom and off-the-shelf, routinely automates the repetitive procedures, and the only limitation on thesaurus development is the human intellectual resources required. This is not a trivial limitation, of course, but even aids to the intellectual effort are becoming available.

There was a period when indexing was often denigrated on the grounds that throwing a search engine at full text was all that was needed. Now the fact that intellectual analysis of resources contributes to effective retrieval is being rediscovered – only this time the goal is to develop synergy between people and machines, to make what is inherently a labor-intensive process more productive.

Visualization of information

As for visualization of information, I'm going to make a confession up front. Give me text any time; I'm one of those people for whom list structures make immediate sense, while a set of bubbles with different kinds of lines connecting them tends to make my eyes glaze over.

Which only goes to show what we already know –  that individuals' primary modes of perceiving differ, and we had better be able to present information in ways that are meaningful to different people. Furthermore, some kinds of information are simply not amenable to presentation in text form. We have been able to present structured text reasonably effectively for a long time, but the power required to present dynamic structures visually has been lacking until recently.

And it's the word dynamic that is key here. It is possible to develop a visual structure without immense computing power and powerful algorithms. However, the process is laborious, and the structure is inherently static. There has been no convenient way to modify the structure to reflect changes in knowledge, let alone to customize it to a particular individual's interests at a specific point in time.

Metadata

Yes, metadata is more than just new jargon for cataloging and indexing. It is that, but it's more too. "Data about data" has become a clich้, but it's hard to find a better way to describe metadata in a concise way. Metadata is described by the DESIRE project as "data associated with objects which relieves their potential users of having to have full advance knowledge of their existence and characteristics." This is a very broad definition, going far beyond the traditional recording of names and subjects associated with documents – just as the kinds of information resources accessible via the Web go far beyond traditional document repositories.

Non-text databases abound, and attributes such as coordinates of latitude and longitude, or data describing the forces of a tornado, must be accessed. There are infrared images of earth resources and images from elsewhere in the solar system. And then there are museum objects or musical themes. These are all kinds of content, or attributes, which indicate the content of a resource. But there is metadata, which goes even beyond this. Systems such as PICS – the Platform for Internet Content Specification – aid in rating and filtering. Then there is information about authenticity or availability, date of creation of a resource, its original format and any modifications that may have been made.

To put it much too simply, there are two basic issues of concern with metadata: getting it applied to resources and assuring that what is applied is usable. Third parties cannot possibly assign even limited metadata to all of the resources that are out there – nor even to all of the useful ones. So how do we get metadata?

And even if metadata is assigned to resources, it could simply contribute to the chaos. Suppose we forget about spamming and assume that everyone who assigns metadata to a resource does so with the goal of providing honest information about that resource. Still, if there are fields tagged "creator," "author" or "composer," and there is no map to bring them together as a single set of concepts, the value of the information is reduced. Similarly, if the content of metatags is not standardized, then a myriad of variants will also hinder the process of locating resources.

Digital libraries

Digital libraries could probably have been assigned to almost any one of the tracks at this conference. What is a digital library anyway? The term is used in various ways, but they all seem to share two common elements: the resources are electronic and their physical location is not particularly relevant to users. It may even be unknown to them.

The whole paradigm of a digital library is different from that of a traditional library. Even though libraries for many years have readily gone beyond their own walls for resources required by their users, they have still been dependent on locating and providing physical copies of these resources. Furthermore, the library had to build or house the access tools to permit finding the resources.

A digital library, on the other hand, may not even have a physical location. Instead, it becomes a set of electronic tools for access to whatever resources its users are entitled to use. It facilitates the location of the resources and the bringing together of resource and user, but the resource itself may never pass through the "library," wherever that is. Thus, instead of issues of locating and acquiring materials, and cataloging and housing them, the digital library goes directly into hyperspace. Its problems become those of financing access and developing architectures that enable users to find their way through the mass of available resources.

While formal recognition of digital libraries is relatively new, libraries have been evolving in this direction ever since they started offering access to online databases, roughly in the 1970s. Any library that provided access to Dialog, Orbit or other search services was serving as a digital library for those resources.

Conclusion

The classification and representation track at this conference offers something for a wide range of interests. Issues of effectively labeling and representing information have always been fundamental to information science. How to turn that information into knowledge has also always been a concern, but in a more dispersed way. Some of us have worked at representing the information. Others have worked at trying to figure out how people use it – or would use it if they could find it. Still others looked at individual cognitive and group dynamic processes to see how information could actually effect change. Today, all these disparate strands are unified under the term knowledge management , though it appears that most workers in the field are still concentrating on one or another of the traditional aspects.

Jessica Milstead is principal with The JELEM Company. She can be reached at P.O. Box 5063, Brookfield, CT 06804-5063; 203/740-2433; milstead@jelem.com .

Go to Track 3


asisnavbar 

How to Order

@ 2000, American Society for Information Science