Feature

Electronic Records Research
Working Meeting, May 28-30, 1997:
A Report from the Archives Community

by David Bearman and Jennifer Trant

Issues of digital preservation have caught the attention of the digital libraries community in recent years, not in the least because of the work of the Task Force on Archiving of Digital Information (1). However, concerns in the archives and records management communities about electronic records have been quite different from those expressed by the library and preservation communities. Collaboration between these communities is essential if we are going to design systems that ensure the long-term preservation of electronic records.

A decade ago, most archivists thought about electronic records issues much the way that librarians do today – as a problem of documenting and preserving data files in specialized repositories. Since then, networked computing has transformed the mechanisms of business communications. Archivists have increasingly adopted the view that fundamental issues regarding records capture and retention, whether in paper or electronic form, are their identification, classification by provenance and retention in context of use so that they can be understood. Only when these challenges have been successfully met will questions of how or where to keep records or how to provide access to them arise. Thus the “archives” as files in need of retention and the “archives” as repository are issues only after what are currently the most difficult challenges of day-to-day recordkeeping have been satisfied.

Librarians and the preservation community still focus their attention on electronic objects prepared and published as coherent entities to reside in repositories. Thus they generally ignore the very real problem of acquiring coherent records from disparate business information systems not designed to keep records and rife with undocumented software and hardware dependencies. Nor do they usually deal with objects whose content is frequently splattered with proprietary, personal, private and legally troublesome non-public data. So when archivists go to meetings of librarians and preservationists focused on keeping electronic “archives” they generally find the discussion overlooks the front end of the issue, where records “happen.” Librarians and preservationists, meanwhile, find it hard to understand how archivists can seemingly shrug off the back-end, long-term retention issues as not terribly interesting and dependent on technology developments very much out of the hands of either community.

In May 1997, a working meeting of international researchers and practitioners of the archival approach to electronic recordkeeping was organized in Pittsburgh by Archives & Museum Informatics. This meeting focused primarily on the issues at the “front end,” before records can be brought together to become the problem of any repository. The following summary, however, is directed to the larger community. Issues of electronic record creation and capture are shared by all those who have become dependent upon technological systems to support their business processes.

Background

The 1997 meeting was modeled on similar sessions in 1994 and 1991. The purpose of the meeting was to bring researchers familiar with work being done worldwide together to define a set of clearly articulated research questions that were the logical “next steps” for the field. Participants were treated to a healthy dose of background reading to make it possible for presenters to assume familiarity with the state of ongoing projects worldwide (2).

The meeting confirmed the degree to which common ground has been reached in the past several years. However, much research has focused on particular portions of the problem: many solutions which appear independent are actually interdependent. Tensions are emerging between practitioners who want to “just get on with it,” and researchers who seem to be “peeling an onion.” These tensions reveal a critical juncture in the development of solutions for electronic records management. After a long period of developing models, agreeing on terminology and defining problems, we seem ready to begin serious testing of proposed solutions. Much research remains, though. The following themes were explored in presentations and breakout group discussions:

  1. What makes an electronic record? How are records defined and what metadata ensures their “recordness”?
  2. Does policy adoption contribute to more effective electronic records management? If so, what policies can best ensure electronic accountability and integrity of records?
  3. What business events generate records? How are these events recognized?
  4. How can metadata about electronic records be captured? How can they best be stored and maintained in relation to records?
  5. How can records be maintained? What are the requirements for using them?

I. Definition of Records

The first session dealt with an issue that is crucial both to the law and the technology of electronic records – the definition of electronic record. Archivists distinguish between records and information or data; not all information or data is a record. Records are those which were created in the conduct of business and communicated between parties to that business. Some archivists believe records must be “set aside” in the course of business to be considered a record. In any case, being transacted in a particular business context is crucial to a record, thus an adequate record will contain evidence of the context of its creation. The consensus, largely developed since 1990, is that

Research into the definition of records has been focused on two major groups of researchers at the University of Pittsburgh and the University of British Columbia. Both were asked to summarize their findings about what makes a record a record. Presentations by Luciana Duranti, Maria Guercio, Richard Cox and Wendy Duff focused on the source of authority for, and universality of, records metadata requirements. Driven by pragmatism, the University of Pittsburgh team looked for “warrant” in the sources considered authoritative by the practitioners of ancillary professions on whom archivists rely – lawyers, auditors, IT personnel, etc. (See Duff, Wendy M., “Compiling Warrant in Support of Functional Requirements,” Bulletin of the American Society for Information Science, June/July 1997, pp. 12-13.) In the European tradition, the UBC team examined the authority of diplomatics, a discipline grounded in the juridical systems of early modern Europe. To many, their differences on sources of authority (a more philosophical issue about the nature of truth) were overshadowed by their apparent agreement on basic characteristics and most concrete metadata requirements of electronic records.

Subsequent discussions demonstrated that neither definition is adequate for those responsible for managing electronic records or provides necessary algorithmic specificity for systems to recognize records when they are created by business events. The definitions put forward need to be synthesized, and the common core elements of an electronic record must be identified in a high-level definition useful across systems and communities. Variable sets of metadata drawn from the warrant of different juridical, business, organizational and procedural contexts could supplement this core. In combination with an architecture to express content, context and structure, a shared definition would provide a model that maps the differing concepts and languages of the research projects. This common semantic would enable collaboration across the discipline and would provide a means of communication with record creators, users and researchers in other disciplines.

A tension was inherent in the discussion of definitions of electronic records. While a more generalized framework was seen as necessary to bridge the philosophical differences of the researchers, it would not serve the needs of those who are building systems. There, concrete expressions of both the semantics and the syntax of electronic records and their associated metadata are required urgently. The utility of the definition is the basic issue.

II. Policy

The second session dealt with electronic record policy formulation at an institutional, national and international level. Presenters included Luisa Moscato of the Records Management Office of New South Wales and Greg O’Shea of the Australian Archives who presented the process by which they formulated a coordinated Australian professional policy, policies at the state and national archives, and the cross-sectoral Australian Records Management Standard, AS4390 (3). Peter Horsman of the Dutch National Archives has worked within Dutch civil service to formulate policies in the Netherlands. Both efforts in policy development have served as a major vehicle for clarifying the roles and values of archival organizations, an unanticipated benefit, regardless of whether the policies themselves result in better recordkeeping.

The presenters agreed that broad frameworks directing people and organizations to keep electronic records need to be accompanied by specific performance standards, monitoring/reporting mechanisms, rewards and penalties. The presentations reinforced the view that records result from business processes and are the responsibility of process managers. Policy is a strategic, and not fundamentally a technological, issue. But as yet we know little about the acceptance or adherence to policies, the costs of implementing (or even developing) them or the appropriate level of granularity in implementation.

Discussion focused on the feasibility of implementing electronic records management policies. If much of the responsibility for the creation and retention of records is shifted to the desktop of individuals, how do we maintain the quality of records? What are viable strategies in terms of hardware and software implementation? Can we develop a generic set of specifications? What role can professional “best practices” play, and how do we train people to meet these new requirements?

Changes in policy require changes in accountability structures as well. Can policies be enforced? Which mechanisms work? How can project managers, whose output is measured in other business terms, be held accountable for records management? Some organizations respond more readily to policy changes than others. What kinds of organizations respond best to policy? Which to design? Which to implementation? Which to standards? What strategies are available as alternatives in less formal working environments? Are there identifiable and measurable differences between industries and between the public and private sectors?

III. Recognizing Record-Creating Events

The third session explored the questions: What business events generate records? and How are these events recognized? Most archivists believe that few if any electronic information systems existing in organizations create records, or at least create records which are adequate to serve as evidence of business transactions. Since organizations participate in far too many record-creating events, in too distributed a fashion, to assign the responsibility for making record-creating decisions to an office of recordkeepers, systems communicating records must somehow implement a decision to create records.

Groups presenting in this session included Artificial Intelligence Atlanta, a team engaged in research with the Department of Defense, and ASTRA (a Swedish pharmaceutical firm) and the Swedish National Archives, jointly involved in research to develop methods for electronic recordkeeping in the pharmaceutical industry (an industry well represented at this meeting because they are both heavily regulated and have huge long-term liabilities that can be defended only with their now largely electronic, scientific records). Both teams are attempting to find methods to identify a record-creating event, or a business transaction, that requires a record to be created. How can a system recognize a “trigger” event? The ASTRA team used STEP (4) to model the business process and identify such events (which they have termed causa), while the DoD team tried to develop a set-based logic to identify events and provide “automated decision support for classification” to a human records classifier. Both acknowledged that models of types of actions don’t necessarily conform to actions as conducted; matching the process model to real events has proven difficult. Unfortunately, the archival rules to which the business model would relate, if it were a success, are also not as formal as they need to be. Expressions in set theory proposed by AIA look highly algorithmic, but in fact are too vague in operation.

Research questions focus on distinguishing creator vs. organizational requirements. A tension was recognized between the creation of functional and efficient business systems and the implementation of full electronic records capture functionality. For those in the group who felt that one of the primary characteristics of an electronic record was that it was “set aside,” classification became a key moment in the process (5). Much work has focused on how to classify documents consistently. Work-flow systems that position the creation of a record within a function and link that function to a pre-defined classification were seen as promising. Another tack would be to identify functions assigned to personnel classifying a record in order to narrow the possibilities available to them and improve accuracy. Both of these approaches suggest the creation of a structured electronic workspace where work is done within functional areas as an aid in the record capture process. Such a space enables system implementation methodologies that can test for rigorous adherence.

A reliance upon an understanding of the business processes carried out by organizations raised questions regarding modeling of workflow itself. What data is required about the function being performed and how is its location in workflow related to a captured record? Clear models for functional requirements specification are needed. But what is the role of the archivist within an interdisciplinary team that is creating new systems to support electronic work? Communicating recordkeeping requirements to systems designers and implementors is a major challenge that would be aided by a consistent and unambiguous model of events and activities. The model should establish a synthesis between the various models proposed and the business processes and functions identified.

IV. Capturing Records

The fourth session continued the exploration of the relationship between business processes, business transactions, record creation, record capture and recordkeeping systems. If the record-creating event and the requirements of “recordness” are both known, focus shifts to capturing the metadata and binding it to the record contents. The National Archives of Canada team, represented by John McDonald, has been exploring interfaces in the work environment constructed to enable the capture of electronic records. David Bearman, of Archives & Museum Informatics, has been building models of how the metadata captured in record creation can best be structured for future use and how to ensure its inviolability and its readability over time. If a record is comprised of both its metadata and its content, how can these two facets be bound together? Bearman is exploring reference models which might provide a generic record metadata structure and examining how these models relate to other metadata standardization activities, such as the Dublin Core.

McDonald reported on a vision developed at the National Archives of Canada where recordkeeping is transparent, incorporated into an overall IT strategy and integrated into tools and technology. But what does “transparent” mean, and what does this world look like? How do we articulate the relationships between programs, work processes and activities within organization? What is required in order to specify built-in capture and retention rules (to enable automated disposition)? How can systems be designed that support the relatively unstructured environment in the modern office, where work processes are complex, ad hoc and dynamic? Can recordkeeping be made invisible? Or should those responsible for record creation be made aware of their actions?

Even if systems could be designed and implemented to automate the capture of electronic records, research is still needed into the required metadata. How do we model recordkeeping systems that enable records and their metadata to remain meaningful over time? How can we ensure the integrity of a record through time? Will metadata have to be “registered”? What metadata is required to support future re-use? How does metadata required for electronic records map to that for other functions – information discovery for example?

If an encapsulated object approach is taken, what are the characteristics of a good envelope? Are there existing technologies or standards that can be adapted or implemented? Are there standard syntaxes that are “good enough” for some situations? Can we assign value metrics around the capture, management, retention and migration of electronic records? What are the costs vs. the benefits of various strategies?

Test-bed projects are needed to benchmark and cost various approaches to the capture and retention of electronic records and their associated metadata. The semantics and syntax of a generic attribute set need to be designed and tested against the functionality required. The effectiveness of metadata in reducing software dependencies must be evaluated and tested in a variety of circumstances.

V. Maintaining Records Over Time

Consensus exists that exact replication of digital objects is rarely feasible or cost effective, and that migration should replace technology refreshment as preservation strategy. Migration, however, is inherently imperfect: implementation dependency choices have their costs downstream, and the gap between functional (semi-active) and non-functional (representations) is, from a practical migration perspective, absolute.

Researchers in this session included Margaret Hedstrom of the University of Michigan, Anne Marie Makerenko from Babson College Archives, and Alan Murdock, representing a team from Pfizer Ltd., a British pharmaceutical company. Their practical research questions focused on the costs and mechanics of maintaining electronic archives. How can we model event-driven records retention scheduling? What are migration cost elements? What risks arise from what loss under what circumstances? And can models be developed and/or partners be found in highly regulated industries where long-term retention of electronic records is a legislated mandate?

It is evident to the researchers that much remains to be determined before scaleable solutions are available. Though practitioners keep asking for “core” definitions and implementable procedures, it is not yet clear that “cores” are workable. The last mile is proving hard to travel because frameworks aren’t good at the detailed semantics, because functional requirements are far from specifications and because the real costs of migrations depend on so many local variables. Concrete implementations are necessary to build our understanding of these factors, but comparative analysis and detailed reporting on choices made and the rationales for them will be critical to building shared strategies.

As Margaret Hedstrom observed, we need to improve our knowledge of alternatives to exact replication. What strategies are appropriate to different types of records and different preservation goals? How much functionality must be maintained in an archival electronic record? What is acceptable information loss? Could we consider the preservation of surrogates? Can we reconstruct context and structure? We need criteria for the creation and evaluation of surrogates as preservation tools.

Again, implementation became a major theme. How can we devise migration programs without a detailed understanding of the costs and benefits of particular approaches to migration? How do we assess the risks involved in information loss? We are unable to ensure that particular methods will work in all situations; how do we support local decision-making to enable the best conclusions for a particular situation? What are the project management and quality assurance techniques that will be most effective throughout the process?

Besides the need to maintain more explicit contextual metadata, it remains unclear whether or how the requirements for long-term preservation of records are fundamentally different from the requirements for the preservation of other types of digital information. If they are, then how are they different? Where can we collaborate with the broader community, and where must specific archival solutions be developed?

Toward a Research Agenda

Outside of Australia, where strong community leadership is creating an environment (now standards and law driven) that requires action on electronic records, the archival community remains technically and economically ill-prepared to step up to this formidable challenge. Archivists have not yet found a way to enlist others in an ongoing fashion to help solve problems that cannot be addressed by archivists alone. This articulation of open issues may aid in convincing others to join the research.

For the near term, the most promising areas for research seem to require greater specificity and granularity in their focus. In the definition of records, we need concrete risks associated with different definitions in different circumstances and an executable specification of recordness. In policy, we need to define the concrete costs and benefits of specific policies and their implementation through organizational, national and international mechanisms. To understand record creation, we need testable models of the kinds of records created by different business processes. In the arena of capturing records, we need tests of registry mechanisms for software and hardware dependency metadata and for business context metadata, and we need to test proposed structures for the inviolable storage of metadata and records’ content. For the maintenance of records over time, we need comparative migration data, equivalent measures of the effectiveness of different systems architectures and strategic solutions for the universal retention of records (obviating the need for each institution to invest in its own migration of dependencies). Finally, we need very detailed and granular research into the needs of users and how they are articulated so that metadata on the content and context of records will support the research process.

None of these problems is going to be easy to solve. The research agenda meeting in the spring of 1997 articulated a full set of open questions which will provide grist for researchers and practitioners for a long time to come. The archivists participating looked forward to the interdisciplinary collaboration necessary to move beyond open questions to workable solutions.


David Bearman and Jennifer Trant are with Archives & Museum Informatics in Pittsburgh, Pennsylvania. Participants in the meeting discussed in this article have contributed to the Proceedings, which are published in Archives and Museum Informatics: the cultural heritage informatics quarterly, Vol. 11, no. 3-4, available from Kluwer Academic Publishers kapis.www.wkap.nl

Notes

  1. Commission on Preservation and Access and the Research Libraries Group, Task Force on Archiving of Digital Information, Preserving Digital Information, Washington, DC: Commission on Preservation and Access, May 1, 1996.
  2. This material is available on CD-ROM by Archives & Museum Informatics, under the title Electronic Records Research Resources, 1997.
  3. Australian Records Management Standard, AS4390, Australian Standards Institute, Australian Council on Archives. Keeping Electronic Records: Policy for Electronic Recordkeeping in the Commonwealth Government
    www.aa.gov.au/AA_WWW/AA_Issues/KER/KeepingER.html

    Corporate Memory in the Electronic Age. (Sydney: Australian Council of Archives, 23 October 1995).
    www.aa.gov.au/AA_WWW/ProAssn/ACA/Corpmenw.htm

    and Australian Archives, Managing Electronic Messages as Records, May 1997
    www.aa.gov.au/AA_WWW/AA_Issues/EMcontents.HTM
  4. STEP - the Standard for the Exchange of Product Model Data, is the familiar name for ISO 1030, developed by ISO TC184/SC4 (Industrial-Automation Systems and Integration/Industrial Design). See "STEP on a Page"
    www.nist.gov/sc5/soap
  5. See, for example, the Activity Models for Applying IDEF Methodology to Represent Archival Science Concepts available at
    www.slais.ubc.ca/users/duranti/