Special Section

Organizing Internet Resources: Metadata and the Web

by Efthimis N. Efthimiadis & Allyson Carlyle, guest editors

Editor's Note: This issue was prepared in cooperation with Special Interest Group/Classification Research (SIG/CR) by guest editors Efthimis Efthimiadis, the current chair of the SIG, and Allyson Carlyle. They have assembled an excellent and wide-ranging sample of current activities associated with cataloging the Internet. This subject is especially timely, given its crucial role in Digital Collections, the topic of the upcoming ASIS Annual Meeting. I wish to thank all of the authors and the editors who contributed heroically under tight deadlines. I urge the SIGs and the chapters to keep the Bulletin in mind when planning next year's activities. I will be happy to work with you on special topic issues or by following-up with potential authors in your area of interest.

On another subject, you will also note the increasing number of references to World Wide Web sites in the
Bulletin. In the past, the Bulletin had few references of any sort in keeping with its informal style. However, recent topics and changes in patterns of information distribution are making a wide variety of materials available on the Web. Some of these, such as sites in Australia, allow instant access to resources that would take American readers weeks to acquire by mail. Particularly since the Bulletin is not an archival journal, the immediate benefits of the liberal use of Web citations seem to outweigh their disadvantages -- lack of stability or assurance of future availability. Also, for topics such as Internet cataloging, citation of the Web is unavoidable, since many key resources are primarily maintained there. We hope that you will find these citations valuable and be willing to bear with the (currently) inevitable problems. I welcome your comments on this and other policies concerning the Bulletin.

Irene Travis

Organizing resources on the Internet is both an old and a radically new problem. Describing and organizing information resources for retrieval has a long history of professional practice with a wide repertory of tools and experience to back it up. But the Internet is overwhelming in the variety, transience and sheer numbers of resources presented. The various communities of producers, users and value-added agencies, such as libraries, are scrambling to cope with this phenomenon by any means available.

In its initial years the Internet has relied heavily on tools and methods, such as Web Crawlers, that require little or no human intervention or systematization. But it has long (in Internet Time) been apparent that an approach based only on the full-text indexing of the contents of Internet sites is not a complete or fully adequate solution for providing access to these resources. We need means to augment and enrich the "self-description" of materials and encourage creators and third party agencies to engage in this task. Adding additional information or "metadata" about a resource is an essential basis for better organization of resources. Metadata can enhance the probability that a pertinent resource will be retrieved, provide a clearer overview of a subject area and improve the user's ability to discriminate among similar sources.

Metadata is used to document information about resources, such as Web sites, and often provides an "index" or "directory" to the resource. It may reside as a header to a resource or be linked to it by other means. It provides a user (human or machine) with a means to discover that the resource exists and how it might be obtained or accessed. It can cover many aspects, such as subject content, creators, publishers, quality, structure, history, access rights and restrictions, relationship to other works or appropriate audience.

But such an undertaking raises many problems. What is worth cataloging? Who will provide the descriptions? How can the needs of different communities for different kinds of metadata be accommodated? Can or should the extraordinarily heterogeneous resources themselves be placed within a single framework? At what level, both of detail and structure, should such descriptions be standardized? When and by whom? How can we ensure that resources, once described, can be located throughout their lives? How do we deal with the dynamic contents of many of these resources?

During the past two years, the Internet and library communities have explored these issues intensively and arrived at some answers. This issue reports a sample of these activities from many different perspectives -- with an emphasis on practice and understanding of current developments.

In the first article, Erik Jul summarizes the history of the standard library cataloging of Internet resources. He discusses various issues that confront this approach: what Internet resources are worth cataloging, whether current standards are sufficient and problems engendered by transient Uniform Resource Locators (URLs).

Stuart Weibel introduces the Dublin Core, a set of 15 metadata elements developed in an international, cross-disciplinary effort. He describes its current role in resource description for WWW documents as well as its potential role as a core set of descriptors -- a meta-metadata element set. He also briefly introduces the Warwick Framework that resulted from the second Metadata Workshop in Warwick, UK, in 1996, and the Resource Description Framework (RDF) being developed by the World Wide Web Consortium (W3C).

Ray Schwartz reviews current efforts to create stable identifiers, Uniform Resource Names (URNs) and Uniform Resource Characteristics (URCs) for World Wide Web resources.

Sherry Vellucci addresses the coexistence of various types of metadata (e.g., MARC and Dublin Core) in the electronic environment, including local library catalogs and other electronic "catalogs," such as InterCat. She foresees "metacatalogs" that will be able to handle records and documents coded in a wide variety of metadata formats.

Carmel Maguire provides an informative presentation of the very active work on metadata in Australia.

Stuart Sutton and Sam Oh describe the use of metadata on the Gateway to Educational Materials (GEM) project sponsored by the National Library of Education and the Department of Education. GEM will provide the nation's teachers with "one-stop" access to lesson plans, curricula and other Internet-based educational resources. The paper discusses the development of the GEM metadata and the extensions and use of the Dublin Core Element Set and the Warwick Framework.

From OCLC, Diane Vizine-Goetz reports on ongoing work to exploit the content and structure of standard cataloging tools, such as the Library of Congress Subject Headings and the Dewey Decimal Classification, to organize Internet resources. This work employs both human-constructed and automatically generated descriptions and is linked to the creation of innovative searching and indexing systems such as NetFirst, Dewey ExTended Concept (ETC) Trees and WordSmith. This special section concludes with Keith Shafer's look at one particular OCLC project, Scorpion, in more detail.


Efthimis N. Efthimiadis is associate professor in the Graduate School of Library and Information Science (GSLIS) at the University of Washington, Box 352930, Seattle, WA 98195-2930; 206/543-1794. He can be reached by e-mail at efthimis@u.washington.edu. Allyson Carlyle, also in GSLIS at the University of Washington, can be reached by e-mail at acarlyle@u.washington.edu