B  U  L  L  E  T  I  N

of the American Society for Information Science

Go to
 Bulletin Index

Volume 25, No. 5

bookstore2Go to the ASIS Bookstore

June / July 1999

 

Copies

The State of the Dublin Core Metadata Initiative:
April 1999


 by Stuart Weibel

Editor's note: This article contains extracts from a much longer one published in D-Lib Magazine in April 1999 <http://www.dlib.org/dlib/april99/04weibel.html>. To preserve as much discussion as possible, given the Bulletin's space constraints, I have included only a few items from the many references to electronic resources related to the Dublin Core in D-Lib. Interested readers should refer to the D-Lib parent article for fuller information. I wish to thank Stuart Weibel and D-Lib Magazine for permission to publish this version.

One hundred and one experts in resource description convened in Washington, D.C., November 2 through November 4, 1998, for the sixth Dublin Core Metadata Workshop (DC-6). The registrants represented 16 countries on 4 continents, and many disciplines. As with previous workshops, many new issues were opened, and vigorous debate was a hallmark of the event.

Unlike previous workshops, the focus of DC-6 was not to resolve questions in plenary meetings, but rather to identify unresolved issues and assign them to formal working groups for resolution. The result of this process was an ambitious work plan for 1999. This report summarizes that work plan, highlights the progress that has been made on it, and identifies a few significant projects that exemplify this progress.

The Dublin Core Metadata Initiative in 1998

Prior to DC-6, the Dublin Core could be characterized as 15 unstructured elements with text-string values. (See Table.) The only widely deployed syntax option for encoding these elements was the <META> tag dot syntax that has been in use since 1996. Implementations in many countries and languages and in many disciplines testify to the widely perceived need for such a metadata element set, and the Dublin Core is the leading candidate for achieving the goal of simple resource description for Internet resources.

The basic definitions of the 15 elements of Dublin Core 1.0 have been stable since December 1996, reflecting confidence in the consensus that has been developed about core description elements over the previous four years. However, few applications have found that the 15 elements satisfy all their needs. This is unsurprising: the Dublin Core is intended to be just as its name implies -- a core element set, augmented, on one hand, for local purposes by extension with added elements of local importance, and, on the other hand, by refinement through the use of qualifiers. There are many possible approaches to qualifying or refining the elements to meet such needs. Standardization of the semantics and methods for qualification of the basic elements is necessary if such qualification is to be widely interoperable.

DC-6 and the 1999 Work Plan

Several important issues emerged from the DC-6 Workshop. These issues reflect a cross-section of concerns, from process to pragmatics, syntax to theory. Each has a place in the agenda of the community. The following is a summary of the major areas that emerged from discussions just before and during DC-6.

  • Formalization of a process for the Dublin Core. How will the evolution of the Dublin Core be managed so as to reflect the broad interests of diverse stakeholders?
  • Standardization. What documents will be standardized and by whom?
  • HTML encoding.  A formal specification is necessary to replace the informal convention that has guided the community to date.
  • Qualification Mechanisms. Can an underlying data model provide for consistent mechanisms for refining Dublin Core elements? How are qualifiers used? Are there DC-recommended qualifiers?
  • RDF role. What is the role of the Resource Description Framework (RDF) in Dublin Core metadata?
  • Relationships to other metadata models . How can differences among metadata models be minimized to promote interoperability?

A Maintenance Agency for the Dublin Core Initiative

The Dublin Core Metadata Initiative began informally as an interdisciplinary workshop on resource description. As it attracted broader international and interdisciplinary interest, it has been necessary to develop greater formality around the process. Providing explicit process and structure for decision making is critical for sustaining community confidence. Steps toward this goal were initiated in 1998 with the formation of the Dublin Core Directorate, a Policy Advisory Committee (PAC), and a Technical Advisory Committee (TAC).

The Directorate is hosted by the OCLC Office of Research and maintains the Dublin Core Home Page

http://purl.org/dc

-- the repository of official documents and other information about the Dublin Core Metadata Initiative. The Directorate also administers the activities of Dublin Core working groups and plans DC Workshops. 

The Policy Advisory Committee comprises representatives of major stakeholder communities and serves a liaison role between these communities and the Dublin Core Directorate. The Technical Advisory Committee is made up largely of working group chairs and provides a forum for the discussion and ratification of proposals concerning the Dublin Core. A subcommittee of the two groups has taken on the task of preparing a document to codify the process and provide for stable transition of membership on the advisory committees.

The goal is to achieve a stable procedural foundation for the Dublin Core that retains the interdisciplinary, international consensus-building culture that has grown up around the Dublin Core initiative.

Dublin Core Working Groups. Working Groups are formed to address particular problems or clusters of problems. Working groups have charters and scheduled deliverables and are expected to go out of business officially at the end of each workshop cycle (approximately one year). Each Working Group has a mail server to support electronic discussions among its members, and all Working Groups are open to enrollment by any interested parties. All DC mailing lists use the Mailbase system (http://www.mailbase.ac.uk), an electronic discussion forum that supports higher education in the UK. Special thanks are due to Paul Miller of UKOLN who has willingly assumed the challenging responsibility of maintaining the numerous DC mailing lists.

Ratification Process. While administrative details are not entirely resolved at this time, the ratification process will work approximately as follows:

  • Work items emerge from Working Groups, Workshops, or the DC-General mail server

http://www.mailbase.ac.uk/lists/dc-general/

  • Work items are assigned to an appropriate Working Group (or one is created)
  • Discussion of potential solutions lead to formal proposals
  • Proposals are submitted to the community for comments (the DC-General mail server)
  • Revised proposals are submitted to DC TAC and DC PAC for discussion and ratification or rejection.

This structured procedure is intended to meet the requirements for stability for a standard such as the Dublin Core while providing broad representation of stakeholder communities and supporting the need for measured evolution.

Standardization of the Dublin Core

Standardization is taking place along several parallel pathways. The first is the Internet Engineering Task Force (IETF), appealing because it has the least formal structure and useful because it establishes a publicly accessible repository of informational documents that are widely recognized in the Internet world as having formal standing.    

RFC 2413 is the first formal expression of Dublin Core semantics. This RFC describes what has become known as DC 1.0: the semantics of the 15 elements of the Dublin Core. RFC stands for Request for Comments, but in this case, it is more in the nature of a Request for Cooperation.

The next stage of formalization of Dublin Core standardization will involve a polishing and slight restructuring of RFC 2413 and submission to NISO (National Information Standards Organization) and CEN (Center for European Normalization).

The planned modifications fall into two categories. The first is a review of the element definitions to improve clarity and thereby promote more consistent deployment. Working groups have been established to review the definitions and propose changes where necessary, with the proviso that such changes are limited to the purposes of clarification.

The second proposed change is simply to format the Dublin Core specification according to a standard description template for metadata elements, ISO 11179.  ISO 1179 is an international standard for formally expressing the semantics of data elements in a consistent manner.

At this writing, element definition reviews are nearing completion, and formal proposals will be available for public comment and submitted to the Dublin Core Technical Advisory Committee for review and validation. It is expected that formal documents will be submitted to NISO and CEN in 1999.

Encoding Dublin Core in HTML

An Internet Draft authored by John Kunze has recently been released that articulates the specification of how Dublin Core can be encoded in HTML. An early convention for this has been in place since 1996, but changes in HTML and a general need for greater formalization make this Internet Draft an important step forward for the community.

Qualification of Dublin Core Metadata

It has been recognized from the outset that most applications require mechanisms to refine or qualify metadata elements or their values. There are several reasons to do so:

1.  Increased semantic specificity.  Use of domain-specific controlled vocabularies or classification schemes helps to add descriptive precision. The Dewey Decimal System (DDC), Medical Subject Headings (MeSH) and the Library of Congress Subject Headings (LCSH) are common examples.

2.  Specification of encoding rules. Identifying a formal encoding standard can make an otherwise ambiguous value useful. Data values are a good example: only by specifying a set of encoding rules can a string specifying a date be parsed reliably.

3.  Defining formal substructure.  It is often desirable to assign a compound value to an element. For example, the value of a Creator element is in its simplest form a name. Many applications have a need to associate additional information with such a value, such as affiliation, e-mail address and title. Specifying the value of a Creator element as a compound value that includes this information as structured sub-elements is useful, but requires a mechanism for specifying the substructure: a scheme qualifier.

4.  Authority control.  Authority records, used by many communities, are examples of structured records that provide authoritative values that help to uniquely identify a person, corporation or place name.

Implications of Metadata Qualification for Interoperability.  The range of possible qualifiers for Dublin Core metadata is limitless. If applications are to interoperate, it is desirable to constrain these possibilities. When possible, it is recommended that applications use externally maintained schemes (e.g., the Dewey-Decimal System and the ISO-8601 Date Profile.) Doing so leverages the substantial investments that such schemes represent, as well as improving the chances for interoperability.

There is currently underway a review of existing DC community practice, the goal of which will be to identify qualifiers now in use and propose a set of qualifier values that may be adopted to promote interoperability. This review is being conducted by element-specific working groups and is scheduled to be completed in time for the next Dublin Core Workshop in October 1999, at Die Deutsche Bibliothek in Frankfurt.

Qualification and the Dublin Core Data Model.  The Data Model working group has been engaged in the task of identifying a common structural expression of qualifiers such that qualification objectives may be accomplished; a formal report of their efforts is scheduled for release in May of 1999.

Relationship of the Dublin Core to Other Metadata Efforts

Among the significant events of the DC-6 workshop was the participation of representatives of parallel metadata efforts, including the Digital Object Identifier (DOI) Metadata Workgroup, the INDECS project, Government Information Locator Service, and the Instructional Management System. Each of these efforts has similarities and differences with Dublin Core, and each has important constituencies, all of whom will benefit from convergence.     

Among the principles of the Warwick Framework is the notion that different varieties of metadata will be elaborated by stakeholder communities, and the metadata architecture should support the snapping together of metadata modules just as Lego blocks are snapped together to form compound structures. The goal of a metadata architecture should be similar: to support a broad diversity of metadata semantics within a common syntactic and structural framework. The Resource Description Framework (RDF) was developed specifically with this objective in mind. RDF makes easier the task of harmonizing various metadata schemas, but by no means assures that they will be useable in a Lego-like modularity. It is still necessary to identify the common aspects of the data models that underlay various metadata sets and work towards harmonizing them.

The first example of harmonization of different varieties of metadata has begun by representatives of the Dublin Core Data Model working group and the INDECS project. INDECS is a project to explore the common functional metadata requirements necessary to support electronic commerce for a number of content industries (publishing, music and visual arts). The functional requirements of managing intellectual property rights include the ability to encode descriptive data at a high level of precision. The description requirements for resource discovery are generally less precise, and production environments often will not permit the expenditure of costs necessary to achieve this precision. Nonetheless, harmonizing the underlying data models will have long term benefits.

An early report on the expected benefits and problems of this effort has been published by David Bearman et al. in the January 1999 issue of D-Lib Magazine,

http://www.dlib.org

What's this I hear about DC 2.0? Discussions during and after DC-6 raised issues of changes in the underlying structure of Dublin Core metadata. Are Creator, Contributor and Publisher just specific (and sometimes misleading) ways of expressing the more general notion of an agent that plays a role in the life cycle of a resource? Is it the case that the Source element is simply a particular variety of Relation? Is it helpful to view elements such as Date as a facet of events that occur in the lifecycle of an information resource (for example, a resource is published by a particular agent on a particular date)? Discussions revolving around these questions suggest that the 15 Dublin Core elements might be more coherently expressed if they are related to an underlying logical model such as that expressed in the Functional Requirements for Bibliographic Records (FRBR) of the International Federation of Library Associations. This model treats information resources as having logical states (an abstract work or a physical item, for example) that have relationships to each other and to other resources.

What implications do these discussions have for Dublin Core in its present state?  Exploration of the issues is ongoing, and if they prove fruitful, then the results will be embodied in a proposal for a version of the Dublin Core that is being referred to as Dublin Core 2.0.

It is unclear how these issues will be played out, but the following can be asserted with confidence:

  • DC1.0 enjoys wide recognition as a basic building block for Internet resource discovery. The Dublin Core Directorate is committed to supporting the viability of the 15-element Dublin Core while accommodating the change that is inevitable in a rapidly maturing Web metadata ecology.
  • Changes in any underlying data model for the Dublin Core will have little direct impact on users, who will want to search for names, subjects, corporate bodies, dates and titles (among other things). No proposals under discussion will require substantive changes in what applications show to users, unless perhaps to make clearer the relationships among elements.
  • There are currently several means for representing Dublin Core metadata, including embedded HTML, raw XML and XML-encoded RDF. The current consensus on DC elements can be seen as a semantic view that can be represented in a variety of ways. Those interested in exploring the implications of this are urged to read the DC-Schema Discussion Paper discussing views of Dublin Core and their relationship to an underlying data model

http://www.oclc.org/~emiller/dc/documents/wd-dc-schema.html

    and to participate in the ongoing discussions on these issues.

  • Any proposal for restructuring the Dublin Core will be considered only with a strong commitment to sustaining the investment in legacy applications and with the full participation of the community that has forged the most broadly based consensus on resource description on the Web.

RDF Is a W3C Recommendation

The Resource Description Framework (RDF) became a W3C Recommendation at the end of February. The formalization of RDF as a standard will promote promulgation of the supporting tools that should make many of our implementation challenges easier.

RDF is a set of conventions for expressing metadata that uses eXtensible Markup Language (XML) as an encoding standard and provides a framework for exchanging metadata of many varieties. RDF constrains the expression of metadata, allowing assertions to be made only according to a standard set of constructs, thereby making it easier for any given application to make use of them.

Putting aside the issue of software, the underlying ideas of RDF provide a conceptual foundation for the efforts of the Data Model Working Group and hence have influenced much of the work on qualification of Dublin Core. The deployment of Dublin Core metadata is not, however, dependent on the deployment of RDF. Useful systems have been, and will continue to be, developed using simpler syntactical expressions (HTML or raw XML, for example).

Why add the additional complexity of RDF? The answer has to do primarily with the additional constraints that RDF imposes on the expression of metadata (the grammar of the metadata assertions). Without these conventions, the variety of metadata grammars would be so varied and complex as to preclude the development of general tools to support management and interchange of metadata sets.

The ability to specify metadata schemas in RDF will make it possible for applications to access a particular schema from a publicly accessible registry on the Web and retrieve the parsing structure and semantics of the element set. This does not ensure either searching or interchange interoperability among metadata sets, but it makes the job of achieving it easier. [For a discussion of the RDF, see Eric Miller's article, "An Introduction to the Resource Description Framework" originally published in D-Lib Magazine in May 1998 and reprinted in the October/November 1998 issue of the Bulletin of the American Society for Information Science .]

Internationalization

Among the most important indicators of the impact of the Dublin Core is the continuing propagation of the element set in multiple languages. To date, the Dublin Core Element Set has been translated into 18 languages. The realization of International metadata that will globalize resource discovery is far more complex than simply translating element definitions. Interested readers should see the article "Languages for Dublin Core" by Thomas Baker in D-Lib Magazine, December 1998, for further background and discussion.

Projects of Note

The Dublin Core has progressed not so much on its foundations in ontology as on the pragmatics of making useful systems to solve real needs of information seekers. Highlighting a handful of important Dublin Core applications gives a flavor of the directions and progress of the Dublin Corps -- the pioneers who are laying down the tracks on the frontiers. The following are a few of the many projects that exemplify the commitment of a community of people with a passion for making information more accessible and the fortitude to act in the face of uncertainty.

  • The CIMI Interoperability Testbed enters Phase II
    This has been the most ambitious interoperability project to date, 14 distinct museums having created records for more than 200,000 resources in only a few months. The project has just entered Phase II, with the objective of broadening the scope and adding qualification to the basic Dublin Core schema used for Phase I.
  • The CORC project: Dublin Core and MARC in the same system
    CORC (Cooperative Online Resource Catalog) is a research project at OCLC exploring the cooperative creation and use of metadata, primarily for online resources [CORC]. Currently the system provides for creation and editing of metadata records in MARC and Dublin Core. All records are available (and can be exported) in either view.
  • Finland adopts Dublin Core for government information
    The Government of Finland is joining the governments of Australia and Denmark in adopting the Dublin Core as the basis for description of official government documents at state and regional levels. The Finnish metadata format will be a superset of Dublin Core with additional elements and qualifiers.

Conclusion

The fourth year of the Dublin Core has been as tumultuous as the first, marked with controversy and vigorous debate. Broadened interest in metadata, in general, and the Dublin Core, in particular, combined with closer interaction with other metadata communities has sharpened debate and made cooperation both more difficult and more urgent.

Nonetheless, the year has witnessed important strides forward on many fronts, including standardization, the formalization of syntax alternatives, a deeper understanding of data modeling issues and a refinement of the semantics of the elements and their qualifiers. The Dublin Core continues to attract broad international interest, continues to see new projects in many disciplines and sectors, and has begun to formalize a process that will ensure stability and representation across the broad spectrum of its constituency. 

This progress, and expectations for further growth, all hinge on the hard work and good will of a diverse, often contentious, always dedicated cadre from around the world, who have found in the Web an unprecedented opportunity for improving information access and have found in themselves the commitment to realize this opportunity through cooperative action


THE DUBLIN CORE ELEMENTS

1. TITLE   The name given to the resource by the CREATOR or PUBLISHER.

2. CREATOR The person(s) or organization(s) primarily responsible for creating the intellectual content of the resource.

3. SUBJECT  The topic of the resource: keywords or phrases that describe the subject or content of the resource, including controlled vocabularies or classification schemes.

4. DESCRIPTIONS  A textual description of the content of the resource, including abstracts in the case of document-like objects or content descriptions in the case of visual resources.

5. PUBLISHER  The entity responsible for making the resource available in its present form, such as a publisher, a university department or a corporate entity.

6. CONTRIBUTOR  Person(s) or organization(s) in addition to those specified in the CREATOR element who have made significant intellectual contributions to the resource but whose contribution is secondary to the individuals or entities specified in the CREATOR element (for example, editors, transcribers and illustrators).

7. DATE The date the resource was made available in its present form.

8. TYPE  The category of the resource, such as home page, novel, poem, working paper, technical report, essay, dictionary. It is expected that TYPE will be chosen from an enumerated list of types.

9. FORMAT  The data representation of the resource, such as text/html, ASCII, Postscript file, executable application or JPEG image.

10. IDENTIFIER  String or number used to uniquely identify the resource. Examples for networked resources include URLs and URNs (when implemented). Other globally unique identifiers, such as International Standard Book Numbers (ISBN) or other formal names, would also be candidates for this element.

11. SOURCE   The work, either print or electronic, from which this resource is derived, if applicable.

12. LANGUAGE   Language(s) of the intellectual content of the resource.

13. RELATION  Relationship to other resources. The intent of specifying this element is to provide a means to express relationships among resources that have formal relationships to others, but exist as discrete resources themselves.

14. COVERAGE  The spatial and temporal characteristic of the resource. Formal specification of COVERAGE is currently under development.

15. RIGHTS The content of this element is intended to be a link (a URL or other suitable URI as appropriate) to a copyright notice, a rights-management statement or perhaps a service that would provide such information dynamically.

See  http://purl.org/metadata/dublin_core_elements for further information.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Return to Paragraph


Stuart Weibel is affiliated with OCLC Online Computer Library Center, Inc. He can be reached by e-mail at weibel@oclc.org.
asisconfLine

asisnavbar 

How to Order

@ 1999, American Society for Information Science