of the American Society for Information Science and Technology       Vol. 27, No. 4              April / May 2001

Search

Go to
Bulletin Index

bookstore2Go to the ASIST Bookstore

 

Copies

ASIST SIG/CR Classification Workshop 2000: Classification for User Support and Learning

by Dagobert Soergel

    Editor's Note: This report from the ASIST 11th Classification Research Workshop, presented by ASIST SIG/CR, was prepared by Dagobert Soergel, with contributions from the session rapporteurs Edie Rasmussen, Corinne Jörgensen, Linda Rudell-Betts, Jian Quin and Barbara Kwasnik.

The 11th Classification Research Workshop of the ASIST Special Interest Group in Classification Research (SIG/CR) was held on Sunday, November 12, 2000, as part of the 62nd ASIST Annual Meeting. The ASIST SIG/CR 2000 co-chairs were Dagobert Soergel, Padmini Srinivasan and Barbara Kwasnik.  A highly competitive selection process brought together papers under the theme Classification for User Support and Learning . The program is given in Figure 1.
 

Figure 1.  Program ASIST SIG/CR 2000:  Classification for User Support and Learning

 Main program

Introduction and foundation

David Jonassen (invited), School of Information Science and Learning Technologies, University of Missouri. "Knowledge is complex: accommodating human ways of knowing"

Session 1.  Developing user-oriented classifications

Marianne Lykke Nielsen, The Royal School of Library and Information Science, Institute of Information Studies. "Domain analysis, an important part of thesaurus construction.  Methodologies and approaches"

Stephanie W. Haas and Carol A. Hert, School of Information and Library Science, University of North Carolina, Chapel Hill, and School of Information Studies, Syracuse University, "Terminology development and organization in multi-community environments: the case of statistical information"

Session 2.  Classification in the user interface

Nina Wacholder, Judith Venuti, Michael Krauthammer and Pat Molholt. Columbia University, 1Office of Scholarly Resources;  2Center for Research on Information Access; 3 Department of Anatomy and Cell Biology; 4Department of Medical Informatics. "Accessing and browsing 3D anatomical images with a navigational ontology"

Susan Dumais (invited) and Ed Cutrell, Microsoft Research, and Hao Chen, University of California, Berkeley. "Use of classified displays of Web search results"

Winfried Schmitz-Esser, University of Applied Science, Hamburg, Germany. "SERUBA - A new search and learning technology for the Internet and intranets"

Session 3.  Automatic creation of representations

Susanne M. Humphrey, Thomas C. Rindflesch, and Alan R. Aronson, Lister Hill National Center for Biomedical Communications, National Library of Medicine. "Automatic indexing by discipline and high-level categories: Methodology and potential applications"

Hidetsugu Nanba, Noriko Kando, and Manabu Okumura

Japan Advanced Institute of Science & Technology and National Institute of Informatics. "Classification of research papers using citation links and citation types: Toward automatic review article generation"

Idea mart

Marcia Lei Zeng and Pat Molholt, Kent State University and Columbia University. "Knowledge organization scheme for cross-cultural and cross-language information systems -- issues and challenges"

Yi-Fang Wu, School of Information Science & Policy, State University of New York at Albany. "Automatic concept hierarchies development:  A revised subsumption approach"

Tony Tse, College of Information Studies, University of Maryland. "Identifying and characterizing a Health Consumer Vocabulary"

Laura Slaughter, College of Information Studies, University of Maryland. "Interfaces for understanding: Improving access to consumer health information"

Elin Jacob Elizabeth Davenport, Uta Priss. "The world of Pokémon: A dynamic ecological classification system"

Elisabeth Davenport, Napier University Business School, and Howard Rosenbaum , and Uta Priss, SLIS, Indiana University. "Ethological classification: a model for  ordering the commercial workplace that draws on collective practice"

Peiling Wang, University of Tennessee. "Comparing cognitive maps using graph algorithms"

Stephen Paling, School of Information Studies, Syracuse University. "Information cartography: A proposed model for access to heterogeneous  end-user databases"

Alejandro Jaimes¹, Ana B. Benitez¹, Corinne Joergensen², and Shih-Fu Chang¹, ¹ Columbia University, ²State University of New York at Buffalo. "Experiments in indexing multimedia data at multiple levels"

Jack Andersen, Royal School of Library and Information Science. "Document theory and knowledge organization: An approach based on epistemology and sociology of knowledge"

Some of the papers are available on the workshop website at

http://uma.info-science.uiowa.edu/sigcr/

Final versions of the papers will be published mid-2001 by Information Today as Advances in Classification Research, v. 11.

The first part of this report gives short synopses of the papers; the second part lists themes and research questions that emerged.

Introduction and Foundation

The leadoff speaker was David Jonassen, Distinguished Professor, School of Information Science and Learning Technologies, University of Missouri. He provided a perspective underlying the workshop in his talk, "Knowledge is complex: accommodating human ways of knowing." The paper's main message: we need classifications for different kinds of knowledge that users hold and seek, particularly types of knowledge that are intimately tied to doing.  The types of knowledge he outlined are shown in Table 1.

Table 1.  Jonassen: Types of Knowledge

Ontological (Domain-specific ) Knowledge Types

1. Declarative knowledge

    1.1. Structural knowledge
    1.2. Conceptual knowledge

2. Epistemological (task-specific) knowledge types

    2.1. Situational knowledge
    2.2. Procedural knowledge
    2.3. Strategic knowledge

3. Phenomenological knowledge types

    3.1. Tacit (implicit) knowledge
    3.2. Compiled (automated) knowledge
    3.3. Sociocultural knowledge
    3.4. Experiential (episodic) knowledge
    3.5.  World knowledge

Session 1: Developing user-oriented classifications

Following the leadoff presentation, Session 1 covered a wide range of methodological tools for constructing thesauri/classifications/ontologies.  There were two papers.

In her paper, "Domain analysis, an important part of thesaurus construction: methodologies and approaches," Marianne Lykke Nielsen introduced and illustrated domain analysis. Domain analysis is a multi-pronged method to discover users' task approaches, resulting information needs, conceptual frameworks, and terminology as the basis for constructing a truly user-oriented thesaurus, exemplified by a thesaurus for a pharmaceutical company.  Domain analysis focuses on the following factors:

  • the nature of the professionals (background, work tasks, information needs, information use, language use, searching behavior, search problems);
  • the subject field (topics, concepts, vocabulary);
  • the literature (type, level, quantity); and
  • the available resources for indexing and thesaurus construction (competence, time).

It uses the following methods:

  • group interviews to obtain an understanding of the work domain and its users;
  • content analysis and discourse analysis of user requests to investigate the perspective and aspects from which the users approach the particular subject field; and
  • word association tests to identify language use and approaches to the subject field.

In the second paper, "Terminology development and organization in multi-community environments: the case of statistical information," Stephanie Haas and Carol Hert presented a conceptual framework and methodology for discovering concepts, concept relationships and terminologies used by different user communities concerned with the same subject matter. Their example concerned statistical data.  The method consists of three parts:

  • constructing a conceptual map of the expert terms;
  • expanding the expert terms by adding synonyms from a thesaurus; and
  • identifying user terms from query logs and matching them with the expanded expert terms, automatically where possible and manually where necessary.

The experts who create the website can use the results to construct a crosswalk from the search terminology of "lay" users of a statistics website to the terminology.

Session 2: Classification in the User Interface (Reported by Corinne Jörgensen)

Session 2 dealt with using classification for enhanced searching and display. It brought to the fore issues of user interaction and navigation with graphical and text-based information and the role that a thesaurus or structured display can play in these areas.

Nina Wacholder et al in "Accessing and browsing 3D anatomical images with a navigational ontology," presented the Vesalius Anatomy Browser : http://cpmcnet.columbia.edu/vesalius/.

The Browser is an elegant system for searching and displaying anatomical images that is based on an ontology of body systems and body parts using several types of relationships.

The presentation elucidated the problem of learning how to make massive amounts of data in a visual display useful and comprehensible. The solution taken by the project team was to add explicit conceptual information to the system; otherwise the displayed information is only meaningful to an expert, in this case an anatomist. Their "navigational ontology" supports restricted inference and restricted relationships. In this case, the anatomically significant relationships are conceptual, functional and spatial. The two major types of relationships in the system are part-of (component-structure) and is-a (taxonomic). Is-a and part-of, however, are not simple relationships; in a visual environment their complexity becomes more obvious, as there are matters of granularity and scale and multiplicity of types. For example, in addition to component structure there are other kinds of part-whole relationships, such as region and marker relationships; some things only makes sense as part of a larger structure.

One important point that holds implications for thesaurus display is that not all combinations of structures have names; therefore the system is designed to show relationships among structures, enabling the user to choose non-named sets or groups.

One question is whether, in a visual navigational ontology, there are other relationships besides is-a and component structure that need to be added. Wacholder stated that currently more spatial relationships are needed, such as "nearby" or "part of two systems," enabling the user to search on more combinations. Adding 3-D creates another set of relationships, and the major issue becomes one of classification, not interface design.

What other non-visual relationships may need to be added in the domain of anatomy? And when does one start adding other relationships outside the scope of anatomy but of interest in a wider medical research domain, such as similar biochemical processes? One can see far in the future a system encompassing many types of knowledge about the functioning of the human body and capable of displaying not only non-named "things" but also facilitating new discoveries by relating previously unrelated structures, processes and outcomes.

Susan Dumais et al, in "Use of classified displays of Web search results," presented empirical evidence that classified displays of Web search results are indeed useful. They perform better than simple ranked-list displays both for user tasks and in user preference. Figure 2 is an example of a category display that has been abridged and simplified. All titles are hyperlinks.

 Figure 2.  Dumais et al: A Category Display

  QUERY: JAGUAR
  Automotive
      Jaguar Club of Florida
      A&L Luxury Car Center - Jaguar Main Page
  Computers & Internet
      Atari Jaguar System
      Jag-Lovers Jaguar Cars Windows Wallpaper Page
  Entertainment & Media
      The Jaguar Photo Gallery
  Travel & Vacation
      Welsh Jaguar Classic Car Museum

Here, automatic techniques map search results to a pre-established scheme of categories.  The advantages of this approach are that a user can quickly know the structure of the information and that users easily understand this type of display. In contrast, clustering techniques are used primarily to discover structure. In a retrieval interface clustering is slow, and it is hard for the user to interpret the resulting unlabeled groups.

The user study reported here confirmed the advantages of a category display over a list display, both in terms of search times and user satisfaction. Interestingly, the researchers found that users could tolerate some ambiguity and "fuzziness" in the display. Items could be in multiple places. In the study they could be placed in up to 13 categories. Subjects noticed this cross-listing and liked it. The automatic classification is not perfect, and users noticed errors. However, the errors did not bother them.

These results raise the question of how "perfect" a classification process should be. While a large amount of error will cause users to distrust a system, greater accuracy requires larger amounts of time and, thus, money. To what extent should we strive for "perfection" in classification of heterogeneous documents in very large databases such as the Web?  Since the standard for a worthwhile improvement is generally taken to be a 10% increase in precision, techniques that create only small incremental improvements may not be worth the time and money invested in their creation. Related to this argument is the idea that while in creating a classification system tailored to the needs of a particular user community we reinforce domain boundaries, classification of large heterogeneous collections would seem to need some permeability across these boundaries. As the work of Dumais, Cutrell and Chen shows, improvement in display of search results can minimize the impacts of "imperfect" categorization.

In "SERUBA - A new search and learning technology for the Internet and intranets," Winfried Schmitz-Esser gave a preview of a Web search system that uses a thesaurus with a rich set of relationship types to help the user explore her search topic.  The relationships used are

    Abstract/generic
    Partitive (physical and theoretical)
    Partitive (habits, law and jurisdiction)
    Partitive (geographical, topographical)
    Instrumental
    Cause/effect
    Beneficial
    Detrimental
    Process applied
    Derivative

When the user enters a search term, the system uses synonym relationships to identify the corresponding concept and then displays other concepts in an array arranged by type of relationship. An abridged display is shown in Figure 3, where each referenced concept is a hyperlink.

Figure 3.  Schmitz-Esser: An abridged display of relationships for the query "telecommuting"

  telecommuting
  is a narrower concept of                 labor
                                                              new ways of working and living
  is a broader concept of                   mobile telecommuting
                                                              alternating telecommuting
  is instrumental for                          organizing work effectively
  causes                                          flexible work time
                                                              energy conservation
  is beneficial for                               virtual organizations
                                                              combining family and work
  is detrimental to                             face-to-face contacts
  by instruments                               telecommuting workplaces
                                                              online technology

The system displays results using its Basic Semantic Reference Structure, a frame whose slots can be seen in Table 2.

Table 2. Schmitz-Esser: Basic Semantic Reference Structure

What?

Who?

 

Event?

Where?

When?

How?

Universal, concept, theme

Person

 

Corporate body
 

Name of event
 

Space

 

Time

 

Aspect

 

General manager
 

Mike Osborne
 

Asia
Trading Co., Vancouver

 

Canada

 

>1998
  11-1
 

Definition

 

Planting of St. John's trees

 

Ministry of Agriculture, Lima

El Algarrobo project

Peru

 

>1984

 

Propagation

 

Session 3: Automatic Creation of Representation

Session 3 dealt with automated methods to create the knowledge structures necessary for good user support.

Susanne Humphrey et al described in "Automatic indexing by discipline and high-level categories: Methodology and potential applications" a system developed for automatically indexing documents with broad descriptors that express the general nature and orientation of the document and thus are useful complements to specific descriptors. Two types of broad descriptors are assigned:

  • a broad scheme of 127 descriptors from the National Library of Medicine's Medical Subject Headings (MeSH), such as Drug therapy, Antibiotics, or Pulmonary disease (specialty), used at NLM to categorize journals by subject; and
  • the134 semantic types defined in the Unified Medical Language System (UMLS), such as Spatial concept, Therapeutic or preventive procedure, or Medical device.

Rules for assigning journal descriptors were developed based on statistical association of document features such as title words with journal descriptors assigned to documents in a training set. The rules for assigning semantic types rely on a more complex indirect method.

In the second paper of the session, "Classification of research papers using citation links and citation types: Toward automatic review article generation," Hidetsugu Nanba et al presented a tool box for the automated or computer-assisted generation of reviews based on analyzing citation relationships. The three tools would each be useful individually. They are as follows:

  • tool to identify and demarcate areas in a document that are concerned with reference to and discussion of a cited document;
  • tool for determining the type of citation relationships; and
  • tool for automatic classification of a document or document passages based on typed citation relationships.

The citation area tool starts from the sentence containing the citation and adds sentences preceding or following based on the occurrence of cue words that indicate text cohesion.  The citation type tool is also based on cue words and assigns a citation to one of three types: (1) shows other researchers theories and methods; (2) points out problems or gaps in related works; and (3) other.  The paper discusses both word-based and citation-based approaches to automatic classification.

The Idea Mart

In the middle of the day an "idea mart" was held.  It was devoted to extensive discussion of emergent research ideas or projects in small groups in five parallel sessions covering two topics each. For a list of presenters and topics, see Figure 1.

This experiment turned out very well, producing many useful suggestions for the researchers who presented at the sessions. As the final section to this paper we report the themes that emerged from the papers and discussions.  Some themes are clearly tied to one paper while others emerged in several papers. We list the 8 themes followed by brief summaries and/or outlines of the points developed.

Theme 1: Expanded use of classifications
Theme 2:  Requirements for diversity in classification
Theme 3: The quest for unity--multi-purpose classifications, reuse
Theme 4:  Types of knowledge covered in classifications
Theme 5: Orientation of classification: users' conceptual structures or intrinsic logic of the domain
Theme 6:  Types of relationships in a thesaurus / classification / ontology 6
Theme 7:  Display and user interaction issues
Theme 8:  Practical issues

Theme 1: Expanded use of classifications

Several presentations call into question the restricted uses that classification schemes have played, being used primarily for organization of information for retrieval. Other roles that need to be explored more fully include roles in learning, such as the use of the visual anatomist for training and education, exploration and browsing, creativity, discourse, problem solving, and information.

Question: How can we build classification systems that would enable us to discover and see relationships that have not yet been established?

Theme 2: Requirements for diversity in classification

Classifications should serve a given purpose for a given user community.  Language – terms and their relationships – is complex; it shows differences not only across domains but also across user groups in the same domain. This introduces many sources of diversity in the design of classifications:

  • knowledge is complex (title of the first talk but an underlying theme of all);
  • many types of knowledge;
  • many (discourse) communities / communities of use;
  • multiple perspectives (for example, "standard" medicine and "alternative" medicine).  Problem of our inability to incorporate all perspectives into one structure or scheme, no matter how richly articulated it is, yet we know that any one perspective limits what we see or learn, and perspectives evolve;
  • multiple situations/contexts; and
  • many different uses of knowledge.

Implications

  • One scheme or many?
  • One representations vs. multiple representations
  • Limitations on mapping between schemes

Role of classification in bridging diversity

Classification should honor diversity by reflecting different perspectives, etc.  But classification should also bridge diversity by mediating between different points of view, different knowledge and cultural systems.  For example, a classification of concepts in "alternative" medicine could include scope notes and relationships that relate its concepts to concepts in "standard" medicine.  By elaborating concepts, concept relationships and conceptual structure in different realms, classification can help identify commonalities and differences and the nature of differences, supporting an effort at sharing and mutual refinement of conceptual structures.

Theme 3: The quest for unity – multipurpose classifications, reuse

Classifications require considerable intellectual investment, so one would like to reuse them.  Tension with diversity! Some thoughts from the discussion:

Can a thesaurus be reorganized for multiple purposes?

Classification modules that can be used in different schemes: How do we build modular ontologies to better represent dynamic domains? These would be ontologies that could flexibly extend the working ontology, for example extending the ontology of basic business processes by adding a module about auctions.

How can we build classification schemes that store basic-level (mid-level) attributes that are neither too abstract nor overly specified so that they can be used effectively by people in a variety of contexts, when we know neither who the people are nor what the contexts are?

The mapping of ontologies one to another must include more than just terms and their relationships, but must also include information about the context/situation.

Is it possible to reorganize an existing thesaurus into a "navigational ontology" to support searching and browsing? Or does such a tool have to be created initially with these goals in mind (re question one)? Can one thesaurus be reorganized in different ways to serve multiple purposes, such as searching, navigation, instruction, "stimulation" (creativity)?

Theme 4: Types of knowledge covered in classifications

  • Role and importance of all knowledge types
  • Most classifications deal with (static) domain knowledge
  • Additional approaches are needed to support users, such as
  • Problem schemas as organizing principle: A classification of  problems by problem type, such as fix a device (fix a car, fix a washing machine), buy something, write a computer program, giving for each problem a schema that specifies aspects to be considered in solving the problem; information, people, material needed for solving the problem; procedural steps for solving the problem.
  • Functions as organizing principle, for example, technical components classified by all the functions they could serve
  • Classification of cases for case-based reasoning or for education and learning

Implications

Importance of stepping back from what we "know" about building an ontology based on domain knowledge.

Theme 5: Orientation of classification

Should classification reflect

  • the users' conceptual structures?
  • the intrinsic logic of the domain (as elaborated by the classifier) on which AI inferences could be based from which users could learn?

How can a classification be constructed that mediates between these two orientations?

Theme 6: Types of relationships in a thesaurus / classification / ontology

Traditional thesauri use just BT/NT and RT (broader term/narrower term and related term) as conceptual relationships. However,

    • Do we need a richer set of relationship types (as in SERUBA or the Vesalius ontology)?  How are the relationships beyond the standard hierarchical relationships determined and how far can they be taken?
    • In a visual environment, such as anatomical images, what other types of relationships besides those discussed in the Vesalius navigational ontology, could be developed? Are these limited by specific visual domains as well?
    • How successful is the idea of a navigational ontology in a non-visual environment? This implementation draws upon the ideas of structure and function for navigation; text-based thesauri rely heavily on hierarchical relationships (structure) with function (related terms) being an unstructured grab bag, so to speak. Is it possible to transfer the idea of conceptual navigation incorporating both structure and function to a strictly text-based domain?
    • In a non-visual environment, should the multiplicity of existing RT types be made explicit to the user? Making RT types explicit may enable people to recognize relationships that they may have otherwise omitted from their search. Related to this, to what extent are RTs bounded by a particular domain? And do specific types of RTs occur more frequently in a particular domain?
    • What are the cultural issues between languages - are some of the relationships more apparent in some languages than in others? (The diversity theme)

Theme 7: Display and user interaction issues

The following points were noted:

  • Classified display of search results is useful.
  • A wide range of methods for displaying classifications is available.
  • Should users interact
  • with the classification structure - concepts and their relationships'
  • with a categorized list of results; or
  • with a combination.
  • Display of relationships among categories: Would there be a benefit to users from displaying relationships among categories rather than just displaying category names? Would this add too much complexity? How should concept relationships be displayed (concept maps etc.)?

Theme 8: Practical issues

Classified displays are useful but

  • constructing classifications manually is expensive.
  • indexing items manually is expensive.

What can be automated? (Session 3)

Dagobert Soergel, professor in the College of Library and Information Services, can be reached at 4105 Hornbake Library, University of Maryland - College Park, College Park, MD  20742-4345; 301/405-2037; e-mail: ds52@umail.umd.edu

How to Order


ASIST Home Page

American Society for Information Science and Technology
8555 16th Street, Suite 850, Silver Spring, Maryland 20910, USA
Tel. 301-495-0900, Fax: 301-495-0810 | E-mail:
asis@asis.org

Copyright © 2001, American Society for Information Science and Technology