Metadata and the Web

Uniform Resource Identifiers and the Effort to Bring "Bibliographic Control" to the Web: An Overview of Current Progress

by Ray Schwartz

One standard through which the World Wide Web operates is the Uniform Resource Locator (URL). Each resource on the Web is linked via a URL, which in turn can be linked to another resource. The URL is the standard for addressing the location of a networked resource. The URL, though very powerful, does have serious limitations. In using the Web, inevitably one finds expired links, confusion between names and addresses, difficulty in distinguishing between various versions of a resource and many duplicates to sift through. Unlike the Online Catalog, the Web does not offer an infrastructure for bibliographic control. In 1992, to deal with these inadequacies, the Internet Engineering Task Force (IETF) chartered the Uniform Resource Identifiers (URI) Working Group to discuss and develop standards for naming, describing and addressing Internet resources.

URIs, URLs, URNs, URCs: The Efforts of the IETF

The Uniform Resource Identifier (URI) is meant to be an all-encompassing concept and syntax to include and coordinate all forms of Uniform Resource standards that would be needed for addressing, naming and describing networked resources. The URL, though developed and implemented before the URI concept, is a URI standard that addresses the location of a networked resource. Presently the URL is the only URI standard implemented. After three years of effort, the URI working group found that its chartered mission was much larger than it could handle, and in July 1995 it was disbanded by the IETF with the recommendation that its tasks be divided among future working groups. Among the accomplishments of the group are the URI syntax and two additional proposed forms of URIs. The proposed forms are Uniform Resource Names (URN) and Uniform Resource Characteristics (URC). The URN would deal with the issue of unique identifiers for networked resources. The URC would contain metadata on a networked resource. In other words, a URC standard could provide a syntax for a "bibliographic" description of a networked resource.

Uniform Resource Names

Since the dissolution of the URI working group, two draft charters were drawn up: one for a URC working group and the other for a URN group. In July 1996, the IETF formally established the URN working group. By the end of the year, all participating parties had agreed upon a number of key issues. The agreed framework includes the requirements for URNs, naming schemes and resolution systems (as well as their independence from each other) and a URN registry service.

The functional requirements of a URN, as agreed upon and defined by Sollins and Masinter's Request for Comment 1737 (1994), are as follows:

  1. Global scope: a name would have the same meaning everywhere.
  2. Global uniqueness: the same name will never be assigned to two different resources.
  3. Persistence: the name is intended to be permanent -- even beyond the lifetime of the resource.
  4. Scalability: names can be assigned to any resource.
  5. Extensibility: any scheme must allow for future extensions to itself.
  6. Independence: any given name issuing authority sets the requirements under which it will issue a name.

A naming scheme is a mechanism for creating and assigning unique identifiers that conform to a particular syntax. The participants agreed that the URN syntax should allow developers to utilize existing naming schemes without the fear of future development forcing them to modify existing numbers and/or schemes. In other words, a URN may include any naming scheme (also referred to as a namespace), be it ISBN, ISSN, LC call numbers, CNRI handles and so on. The syntax is as follows:

URN:<NID>:<NSS>

NID is the Namespace Identifier and NSS is the Namespace Specific String. Each namespace (a.k.a. numbering scheme) defines the structure of its NSS, and the Namespace ID is used to interpret the Namespace Specific String. An example for an ISBN would be

URN:ISBN:0872873625

These names would be stored on a networked server for "resolution" to a given location(s). In the resolution process when a user clicks on a URN, a request is sent to a server. The server resolves the request by sending back to the user a list of URLs that match the URN. A similar system that has been in effect since the beginning of the Internet is the Domain Name System (DNS). Domain names (e.g., rci.rutgers.edu) are used to identify computer systems on the Internet. When a user sends a request to another computer on the Internet, the request is forwarded to a Domain Name Server, which resolves the domain name from the request into an Internet protocol address (e.g., 128.60.86.36). The request is then forwarded onto the specified destination. Many URN implementations are modeled on domain names and DNS.

Unlike the DNS, however, this concept of a URN system is intended to deploy more than one naming scheme. In order to accommodate any number of schemes, a URN naming scheme service and a resolution service must be independent of each other. To implement this requirement, an additional service would have to be added -- the URN Registry. A registry would contain information about naming schemes, naming authorities and resolution systems and could point a user to those resolution services that would resolve the URN. For example, when a user clicks on a URN, the request will be sent to a registry to determine which namespace is represented by the string contained in the NID field -- in this case "ISBN." Once the namespace has been determined, the request is mapped and forwarded to an appropriate resolution server that can resolve namespace specific strings -- in our proviso example, "ISBN numbers." The resolution server maps the ISBN number to specific URLs. The set of URLs is then forwarded back to the user. An example of how a URN retrieval would work is shown in Figure 1.

Several prototypes are now in use. Among them are the Handle System from the Corporation of National Research Initiatives (CNRI) and the x-dns-2 scheme developed by Paul E. Hoffman of Proper Publishing and Ron Daniel, Jr., of Los Alamos National Laboratory. Several in the IETF community feel that the combined implementation of CNRI's Handle system and OCLC's Persistent URL system (PURL) would constitute a workable URN system now.

Uniform Resource Characteristics

The would-be URC Working Group charter has been in a perpetual state of revision for over a year. Although the IETF has not officially formed a URC working group, work in developing URC standards goes on within many informal meetings of the IETF. Though the proposed charter addresses the need for descriptive data types, it was from outside institutions that a set of data types or data elements has gone through an extensive level of development through the auspices of OCLC's Dublin Core Metadata Workshop Series. The URC concept is still very undefined. However, a good description of what URCs could be is in Daniel and Mealling's March 1995 Internet draft titled URC Scenarios and Requirements. They propose that the URC would be the binding between the URN and the URL. Daniel and Mealling quote from Sollins and Masinter's Functional Requirements for Uniform Resource Names, Request for Comment 1737 (1994) that "the purpose or function of a URC is to provide a vehicle or structure for the representation of URIs and their associated meta-information." They describe what a URC service would do within the overall context of URIs and how the URC could bind URNs and URLs.

Several scenarios of how users would interact with a URC service are presented. For example, a user clicks on a URN. The browser connects to a URC service and sends it the URN. The query is mapped to a set of URLs and is returned to the user's browser. The list of URLs can be sorted by the data elements contained in the resources' URCs (e.g., title, authors, abstract, subject, version, date, location of resource, medium, veracity, price and so on). The retrieved set could be sorted either by a default specification in the user's browser or by the user's choice. The user would click on one of the URLs to retrieve the resource. The browser would automatically move to the next URL if the previous attempt was not successful and so on. The hypothetical browser in this scenario has considerable capability. Less capable browsers could be designed to display the URC data in a variety of formats, such as annotated subject lists.

Another scenario would be that a user examining a Web page would see a list of links that might be of interest. The user could click on the link with the right mouse button to display a pop-up menu. From the pop-up menu the user selects "More info." The browser displays a box with the resource's bibliographic description and other metadata (e.g., copyright information). The user decides whether to select the resource based on the metadata.

Conclusion

The consensus around a set of URN standards has taken more than five years to accomplish. Much progress has been made. However, even given the achievements to date, URNs still have a long way to go to deployment given the many controversies of administering such a system. One need only look at the present difficulties with the Domain Name System to understand that this is a major challenge. The URCs still are a vague notion. Though there has not been as much URC progress within the IETF, many accomplishments (similar to the goals of the URC) are being made with the Dublin Core Workshop Series and other metadata projects and the World Wide Web Consortium's work on a metadata transport syntax called Platform for Internet Content Selection (PICS).


Ray Schwartz is multimedia/reference librarian at the John Cotton Dana Library, Rutgers University, 185 University Avenue, Newark, NJ 07102. He can be reached by phone at 973/353-5917; or e-mail at rps@newark.rutgers.edu