Bulletin, October/November 2006

Contextual Analysis for the Design of Controlled Vocabularies

by Jens-Erik Mai

Jens-Erik Mai is associate dean and associate professor in the Faculty of Information Studies, University of Toronto. He can be reached at jens-erik.mai<at>fis.utoronto.ca

That context, use and actors are important components for analysis in the design of almost anything in the information field is a trivial argument. Controlled vocabularies (CV) are no exception, and it has often been argued in both scholarly and professional literature that design of CVs needs to be based on solid understandings of context and actors. The challenge is to decide which factors need to be considered and how to outline an approach for including analyses of context, use and actors. In this paper I will show how the cognitive work analysis (CWA) framework offers one possible approach to the design of controlled vocabularies. 

A controlled vocabulary can be defined as “a list of terms that have been enumerated explicitly” (ANSI/NISO, 2005, p. 5) for the purpose of organizing and representing information to facilitate information retrieval. Controlled vocabularies vary in complexity from simple alphabetic lists of terms to classification schemes and taxonomies that show semantic relationships, and finally, to complex thesauri that also show associative relationships between terms. The steps that a designer of controlled vocabularies can take have been well described in the literature. These steps are often represented as some version of the following:

  1. Analyze literature, needs, actors, tasks, domains, activities, etc.
  2. Collect, sort and merge terms
  3. Select descriptors and establish relationships
  4. Construct the classified schedules
  5. Prepare the final product

The latter steps – 2 through 5 – are well-prescribed and worked out in great detail in several standards, well-established textbooks and best practices. These steps deal with technical aspects of the design and construction of controlled vocabularies, including guidelines and rules-of-thumb for how, for instance, to determine the appropriate form of the terms, clarify the meaning of terms, factor compound terms or determine the relationship between terms. While these steps are important aspects and techniques that must be mastered by developers of controlled vocabularies, design decisions throughout these steps must be guided by the outcome of the first step. However, the first step – analysis of literature, needs, actors, tasks, domains, activities and similar aspects – has been somewhat neglected in the literature.

The advice given for the first step is often limited to either simply mentioning that the designer needs knowledge about the context of the controlled vocabulary or suggesting that a list of potential terms is to be drawn up by subject experts or to be selected or extracted from the content objects. 

To facilitate access that is as transparent and convenient as possible, the designers’ selection of terms to be included in a CV and their determination of the relationships among them must be informed by the actors’ usage of the information. There are many factors that potentially could influence the actors’ use of information and their information seeking strategies and choices. Each of these factors might influence the actors’ perception and understanding of the information. However, it is almost impossible to account for such individuality in the design of CVs that are used by many people. How are we to select them?

The CWA framework provides an answer by focusing on the constraints that shape the actors’ behavior and thereby limiting the number of possible variables that need to be considered. Constraints are factors external to individuals but common to all individuals within the same context or domain. The goal is to identify the constraints that actually shape individuals’ information-seeking behavior and not just their specific preferences, perceptions and experiences. The constraints that shape the behavior of actors in particular situations are the parts of the context that limit and enable the actors to perform their work. It is important to recognize this duality of constraints; constraints limit and enable actions at the same time. For instance, a scholarly domain’s history, schools of thought and paradigms both limit and enable actors in the particular domain – the constraints thereby shape possible information needs. A domain’s history, for instance, enables actors to formulate questions and inquiries about the particular phenomena that the domain studies by providing a narrative of the evolution of the knowledge about the phenomena. Simultaneously, the domain’s history limits the kinds of questions and inquiries actors can pose about the phenomena by providing the current, consensual understanding of it. The exact nature and types of constraints vary from domain to domain and can be uncovered using CWA. 

Understanding the behavior-shaping constraints gives designers insight into the context of actors’ work and provides an understanding that facilitates systems design. The outcome is not a prescription of what actors should do (a normative approach) or a detailed description of what they actually do (a descriptive approach), but an analysis of the constraints that shape the domain and context.

Rather than enumerating the factors that influence actors, CWA defines a set of dimensions (Figure 1) that holds constraints, as explained in the introduction to this Bulletin section. The issues to be addressed at each dimension vary from domain to domain, depending on the type of domain, goals of the CV and the activities that the system supports. Below, I define and discuss each dimension and demonstrate how it contributes to the analysis of information behavior to support design of CVs. 

Figure 1: Dimensions of Cognitive Work AnalysisFigure 1. Dimensions of Cognitive Work Analysis

1) Environment. Actors in a given scholarly domain, for instance, are constrained by such aspects as the domain’s discourse, history, schools of thought, paradigms, research fronts and activities. These constraints limit and enable the types of information needs actors have in the particular scholarly domain. Likewise, a commercial R&D division that is engaged in the development of Web search engines is constrained by the previous research and current state of knowledge in information retrieval. This context shapes the kinds of questions that are asked and addressed by the R&D team and the creation of a CV for their intranet is influenced by the tradition of research in information retrieval. 

2) Work domain analysis. The work domain provides the framework in which the actors operate and actors generate their information needs in this context. For instance, actors in information retrieval research are constrained by the goals, priorities, functions, processes and resources of their particular work domain. While researchers in the domain share some of the constraints in the environment, their particular work domain presents other constraints that are unique to the work domain. Actors in a commercial R&D division work under constraints that are significantly different from actors in a university setting. While actors in these two work domains may work on the same problem, the fact that they operate in different work domains – with different goals, priorities, functions, processes and resources – will cause them to approach the problem differently. This difference affects their information needs and how they search for information, which should determine how the information is to be indexed.

3) Organizational analysis. Workplaces are analyzed in terms of their organizational structures, management styles, organizational culture, nature of the organization and allocation of roles. The organizational analysis gives the designer an understanding of how the domain is structured both explicitly and implicitly. While actors in research workplaces might have a high degree of autonomy in their work and their information needs might therefore develop relatively independently of the organizational structure, actors in more structured organizations, like an insurance company, develop their information needs in accordance with their particular tasks. Actors in such organizations are often assigned particular tasks, and they develop information needs to react to these assigned tasks. We therefore need an understanding of the organization to gain an insight into how work is delegated, assigned or otherwise acquired. 

4) Activity analysis. Activity analysis examines what users do to achieve their tasks. The CWA framework divides the analysis of activities into three separate parts: 

4a) Activity analysis in work domain terms: Actors are constrained not only by the environment, work domain and organizational structure, but also by their activities. The activity analysis in work domain terms teases out the nature of the actors’ tasks to understand how, where and when they need information. Actors in an insurance company, for instance, might want information such as claims, police reports or photos of damaged material for their work. However, these needs develop in response to specific activities the actors perform. We could ask whether the documents are needed to address a specific issue in a class action suit or in response to a retention schedule. Designers of CVs need to understand actors’ work activities to understand the difference between these two types of information needs and to make decisions about the organization and representation of the material. Likewise, when faculty members at universities search for information in relation to their scholarly activities, they may need information for their classes, their research or their service activities. We could ask whether a scholar is interested in a document in preparation for a class presentation or to confirm specific ideas when reviewing a colleague’s manuscript. These activities constrain the type of information they are interested in and the type of information system they will use. Without an understanding of these activities and constraints, designers would not know how to design useful indexing systems.

4b) Activity analysis in decision-making terms: The purpose of this analysis is to clarify what information users need to make decisions, what information is actually available and what information is desirable but not available. Researchers in a commercial R&D division might need information about a specific functionality in a search engine, and they might be able to find this information, for instance, in their personal files, on their intranet or in public digital libraries. Their search for this information is constrained by the decision they have to make. Thus, depending on whether they are exploring issues related to functionality or are searching for design requirements of a search engine, they will need different types of information. This difference in types of information needs could influence the design of the indexing systems for this work place. 

4c) Activity analysis in terms of strategies that can be used: The search strategies employed by actors in current systems can be good indicators of preferences in their search situations and might be valuable to understand as background for the formulation of search strategies in future systems. However, actors’ current search behavior might not be relevant in future information systems, and the analysis of actors’ activities in terms of strategies should focus on possibilities for searching and not be limited to descriptions of current practice. The analysis of strategies should therefore ask questions about possible strategies that actors can take, independently of whether actors actually use those strategies today. To identify possible strategies, the analysis would examine which strategies an actor could use to find specific information in an effective way. For instance could the actor search by using index terms, browsing the system or going directly to sources that are known to him/her?

5) Analysis of actors’ resources and values. The purpose of this analysis is to gain insight into the actors’ cognitive resources and values, such as their knowledge of the subject matter dealt with in the domain, their preferences for information sources and format of information and values in terms of objectivity vs. subjectivity in representation of information. For instance, while designers of systems for actors in scholarly domain might expect a certain level of subject knowledge, the information sources used in scholarly domains might vary among different user groups. An analysis might find that senior researchers in the domain prefer short conference papers while students prefer review articles and monographs. Such a finding should have an impact on the design of the indexing systems. Likewise, such an analysis might reveal that researchers in a commercial R&D division prefer more recent information in digital formats that contains lots of graphics representations. 

By moving the focus from descriptions of what actors do to an analysis of the constraints under which actors operate, studies of human-information interaction can become useful for design. It is more useful because design of CVs cannot be based on knowledge about the behavior of individuals; design of CVs is better served with analyses of the constraints under which actors operate. These constraints remain relatively stable over time and among different actors and therefore serve as better guides for potential information needs. 

CWA provides a powerful framework for analyzing information behavior for the purpose of designing controlled vocabularies. While factors that can affect human-information interaction are almost unlimited, the CWA framework offers a number of dimensions along which one can identify various constraints that influence actors’ information needs. 

Each dimension contributes to the designer’s understanding of the domain, the work and activities in the domain and the actors’ resources and values. The analyses ensure that designers bring the relevant attributes, factors and variables to design work. While analysis of each dimension does not directly result in design recommendations, these analyses rule out many design alternatives and offer a basis from which designers can create systems for particular domains. To complete the design, designers need expertise in the advantages and disadvantages of different types of indexing languages, the construction and evaluation of indexing languages and approaches to and methods of subject indexing. 

For Further Reading
Aitchison, J., Gilchrist, A., & Bawden, D. (2000). Thesaurus construction and use: A practical manual (4th ed.) Chicago: Fitzroy Dearborn. 

American National Standards Committee/National Information Standards Organization (ANSI/NISO). (2005.) Guidelines for the construction, format, and management of monolingual controlled vocabularies: Z39.19-2005. Bethesda, MD: NISO Press. 

Rasmussen, J., Pejtersen, A.M., & Goodstein, L.P. (1994). Cognitive systems engineering. New York: Wiley.

Rosenfeld, L., & Morville, P. (2002.) Information architecture for the World Wide Web ( 2nd ed.) Sebastopol, CA: O’Reilly.

Vicente, K. (1999). Cognitive Work Analysis: Toward safe, productive, and healthy computer-based work. Mahwah, NJ: Lawrence Erlbaum Associates.