of the American Society for Information Science and Technology       Vol. 27, No. 5              June / July 2001

Search

-->

Go to
Bulletin Index

bookstore2Go to the ASIST Bookstore

 

Copies

The Design of Metadata Interchange for Chinese Information and Implementation of Metadata Management System

by Chao-chen Chen, Hsueh-hua Chen, Kuang-hua Chen

With the rapid development of the Internet, research on digital libraries and museums has received worldwide attention, and all developed countries are supporting this research with great enthusiasm. Taiwan has a rich cultural heritage with a wide range of world-class treasures. In addition, many organizations and research institutions in Taiwan possess abundant collections of rare books, historical remains, artifacts and documents of Taiwanese culture. In the past, they were not open to the public due to preservation considerations. Now, through the power of the Internet, we will be able to present these valuable resources on the World Wide Web. Besides increasing public exposure, the Web will allow us to preserve the physical resources that might otherwise be deteriorating.

In Taiwan, major institutions that have digitized their rare collections include National Taiwan University, Academia Sinica, the National Central Library, the National Palace Museum, the National Museum of History and the National Museum of Natural Science. To digitize these valuable resources and present them on the Web is their primary task. However, it is much more important to organize these resources based on their characteristics so that users may retrieve and use them effectively. It is obvious that metadata is vital to digital library/museum systems.

From the perspective of its users, a digital system should contain the basic functions of retrieval, browsing and linkage to other related Web resources. Usually a digital library system with a large volume of data will apply database management systems to manage its bibliographic records, digital objects and Web links.

At present, retrieval technology can be classified into full-text search and field-based search. Full-text search does not require metadata description of the resource, but it yields a lower precision ratio. For non-textual images, sound or multimedia information, full-text search cannot be used; therefore, manual creation of metadata to establish field-based bibliographic data for various types of digital objects is a pivotal step for digital libraries.

The first step in formulating a metadata system is to understand the user demand and object characteristics. The second step is to consider the interoperability among information systems, which depends on the use of standards. For example, developers may apply and adapt an internationally accepted metadata format, such as one of the following:

Currently, regardless of the metadata format they choose, most digital library/museum systems use XML or SGML as their metadata syntax. In particular, the Internet community is vigorously promoting the XML syntax, a true subset of SGML, which can extend the more limited capabilities of HTML. This paper will discuss issues related to the development of metadata and introduce the features, structures, functions and use of Metalogy, an XML/Metadata general-purpose system developed under the Digital Museum Project funded by the National Science Council, Taiwan.

The Digital Museum Project

The National Science Council (NSC) of Taiwan launched "Greeting a New Millennium A Cross-Century Technology Development Program with Concern for the Humanities" in May 1998, with the intention of strengthening research in humanities, social science and science education. The Digital Museum Project (DMP) was a part of this theme project. Its main goals are to integrate and establish a digital museum with an emphasis on Taiwanese culture for the Chinese people and to develop educational contents on the Internet. Establishing and promoting educational culture/art/science content on the Web allows the public to use the Internet to retrieve or browse information freely and consequently experience its enrichment and enjoy lifelong learning. Furthermore, by promoting digital collections, the NSC hopes to stimulate the technological development of multimedia and the growth of a content industry.

Since then, the DMP has progressed to its second phase. During the first year, the NSC invited experts/scholars with experience in digital collections to form a collaborative mechanism to promote digital museum research. Projects can be categorized into two types: topic-based and technical support. In addition DMP Extension (DMPE) is responsible for training and promotion, serving as a bridge for the library and museum communities, teachers, industries and DMP staff.

Topic-based DMP projects in the first phase include two comprehensive projects on local culture: Discovery of the Tamsui River and Taiwanese Aborigines The Ping-pu Race. There are also two projects on natural science and environmental ecology: Butterfly Ecology and Native Plants and Fishes of Taiwan. On traditional culture there are three projects: Traditional Thoughts and Literatures (the Four Books, Lou-Chuang, Poems of the Tang dynasty), an Immortal Palace Han Dynasty Culture and Burials and Firearms and Ming-Ching Dynasty Warfare.

System technical support projects in the first phase include the Electronic Cultural and Natural Resource Atlas (establishing a common coordination system for time, space and language symbols) and Understanding Ancient Texts The Written Knowledge Network. In addition, there are three information technology projects: Resources, Organization and Searching Specification (establishing a metadata interchange format for Chinese information, a thesaurus and searching specifications so that individual topic-prototype systems may have international interoperability), Digital Collection System Technology Development and Research on a System Evaluation Standard. This latter project used the Discovery of the Tamsui River development effort as a testbed for empirical studies on developing an evaluation standard and system evaluation methods to enhance quality and discrimination.

DMPE was established in August 1998. Its goals are to train staff in digital collection skills and to promote research results to various communities in our society. Through seminars, training courses for professionals, training sessions for teachers, e-news and news articles in mass media, DMPE has increased the knowledge of collecting institutions and industry about digital libraries and museums, enhanced public interest, improved the Web resource utilization skills of elementary and middle-school teachers, and trained professionals for the digital library/museum projects.

Currently the DMP is in its second phase (January - December 2000). During the second phase it is open to all interested participants. Among nearly 90 proposals, 12 were funded.

    1. Treasures of the National Palace Museum
    2. The World of Xuanzang and the Silk Route
    3. Discovery of the Tamsui River
    4. Native Artist Digital Museum Yu-Yu Yang Art Research Center
    5. Historical Photos of Taiwan
    6. Architectural History of Taiwan
    7. Mystery of the Human Body
    8. Web maintenance of Taiwanese Aborigines the Ping-pu Group
    9. Ancient Texts and Popular Songs of the Tang and Sung Dynasties (II)
    10. Native Freshwater Fishes of Taiwan (II)
    11. Chinese Medicine and Acupuncture
    12. Biology-Cultural Diversification of Orchid Island

Among the 12, four (numbers 3, 8, 9, 10) are carried on from the first year.

Technical support projects have been reduced to two: Implementation for Resources Organization and Searching Specification in Digital Museums and Technology Development of Digital Watermarking and Software Tools.

Resources Organization and Searching Specification

Before the NSC launched the DMP, the authors and their colleagues initialized a metadata research team, ROSS (http://ross.lis.ntu.edu.tw), under the National Taiwan University Digital Library/Museum (NTUDL/M) Project to study metadata interchange for Chinese information (MICI) in March 1997. Its research scope contains the following:

  • to understand the history and features of collections;
  • to study various metadata formats both domestically and internationally;
  • to understand relations among the metadata, the database and the system framework; and
  • to understand the information demand and retrieval behavior of potential users

ROSS held that our metadata should be able to

  • describe attributes of the collections;
  • provide users with the mandatory access points;
  • enhance interoperability among different digital libraries to exchange information; and
  • take consideration of the quality of cataloging.

At that time most digital collections of NTUDL/M were historical documents. After studying the characteristics of historical documents, ROSS made in-depth studies of the metadata for similar types of collections, including CIMI (Computer Interchange of Museum Information, http://cimi.org, describing museum art collections) and EAD (see above), describing archival information. However, due to cultural and characteristic differences, we concluded that these metadata forms were not sufficient to describe Chinese special collections. Hence it was necessary to focus on research on Chinese metadata, which is the main goal of ROSS, as we described in our 1999 paper "Metadata Interchange for Chinese Information" in IT and Global Digital Library Development (pp. 65-74), published by MicroUse Information. We have also been awarded the two NSC grants for Resources Organization and Specification mentioned above, the first in 1998, and the second, for implementation, in 2000. Research in the second year project is focused on issues related to information organization and retrieval in Chinese digital libraries and museums, which include data storage and management system design, user demand and information retrieval behaviors, and integration among different systems.

Besides historical documents, ROSS began work on metadata for other resource types (objects, ancient maps, photos/pictures and butterfly specimens) in November 1998. During the process of metadata development, in addition to frequent discussions with experts and scholars, we studied how similar digital museums record their collections. In the first year ROSS was responsible for metadata development for two of the NSC topic-based prototypes, Discovery of the Tamsui River and Butterfly Ecology . In the second year, the main task of ROSS has been to develop a management system capable of handling various types of metadata for all topic-based prototype projects. This system is called "Metalogy."

Metadata Interchange for Chinese Information

The metadata system we developed, Metadata Interchange for Chinese Information (MICI), adopts the 15 elements of the Dublin Core Metadata Element Set (DCMES) as its basic structure. However, in order to describe the attributes of our rich cultural heritage and be more precise on the semantics of the collection descriptions, element qualifiers were added to the appropriate elements based on the attributes of collections. Although it extends the scope of its application, MICI remains compatible with international standards. This set of DCMES-based MICI with self-defined qualifiers is called "MICI-DC."

MICI-DC has been used to catalog various types of resources: historical documents, maps, photos/pictures, calligraphies, objects and Buddhist scriptures/paintings. In addition to the DCMES official qualifiers, an individual institution may define their own based on the attributes of its collections. Users may choose DCMES elements and qualifiers and adjust the order of these elements according to their needs. This approach will be compatible with international standards and meanwhile allow users great flexibility in meeting local requirements. In order to make it easier for users to catalog resources using MIDCI-DC, a tagging guide was compiled with explanations and examples. Thus, users may implement their MICI-DC projects without further assistance. For details on MICI-DC, please see Appendix I.

Metalogy, an XML/Metadata System

With the advantages of SGML, but free of many of its complications, XML is being widely applied on the Web. At the same time it may provide much of the flexibility and preciseness that HTML lacks. The Internet and database communities have promoted the XML syntax with great enthusiasm. Thus, when ROSS was about to design a metadata management system, we decided to use XML syntax as a basis for information interchange among databases. However, in addition to syntax, we also needed to consider semantics. Currently there are various types of metadata formats, and many communities are developing their own to fulfill domain-specific needs. Thus, flexibility is essential for a metadata management system. In short, one cannot develop a system based only on one particular type of metadata; it should provide users the freedom to choose their own metadata types. Therefore, developing a general purpose XML/Metadata system is our main concern. The design concept and structure are described below.

The Features and Structures of the Metalogy (Version 1.0) System. Metalogy may be used to develop databases for any digital museum, digital library or digital archive in various subjects. Its functions include database set-up using XML data-type definitions (DTDs), metadata editing, authority file (thesaurus) editing, retrieval (including both Microsoft Windows and Web interfaces), and import-export of XML files. Incorporating the following features, the system

  • is driven by a schema that is mainly based on the input DTD;
  • allows co-existence of different types of DTD;
  • is capable of retrieving different data in different formats at the same time;
  • allows the users of the metadata management functions to adjust the elment format designated by the DTD and the access restrictions based on the schema;
  • allows these users to define their hyperlink, index and retrieval and display elements with a user-friendly interface;
  • provides data import-export that conforms to its DTD format;
  • is capable of determining whether the mported data conform to the desgnated DTD format and checks for duplication of input data;
  • is capable of processing structured elements, multimedia and texts;
  • contains management functions such as access control and transaction logging; and
  • has a Web search capability that allows end-users to retrieve information from the database via WWW interface.

The structure of the Metalogy system is shown in Figure 1.

System Development Tools and Their Functions. The Metalogy development environment is Delphi 5.0, and the programming language of Web searching is ASP. The backend database system can be either Oracle or SQL Server. Currently the system has the following functions developed:

  • Input a DTD to set up a database

Simply by importing any type of XML DTD, users may set up a corresponding database and access a cataloging display screen.

  • Define the system schema

The DTD declaration does not define such things as the data format, conversion specifications, maximum length of input characters, authority control or the index file. As a result, although the schema will be automatically generated by the system as the DTD is imported, it is also necessary to check and modify the schema manually.

  • Catalog using the chosen metadata

After choosing the cataloging Meta type, users may add, correct or delete any of these records. While editing a record, users may duplicate, delete or insert a sub-element based on the mapping. If this element is input by code or authority control, which are available in addition to direct input, users may browse or retrieve through the display of code or authority screens. In addition, users may access the retrieval function directly from the cataloging screen to check immediately for needed records.

  • Establish thesauri and authority files

Users may construct a thesaurus or authority file by the same process used to create the metadata cataloging records.

  • Manage and describe digital objects

Users may put digital objects, including multimedia objects, into the database with brief descriptions that will allow them to be located and identified when they are to be linked to metadata records. Batch processing is recommended for importing large volumes of multimedia files.

  • Search metadata records

The search function may be carried out on a single Meta field or on all the Meta fields in the database. It may be conducted by exact or fuzzy search.

  • Search the authority file

The same search functionality may be applied to searching the authority files.

  • Import XML files

Metalogy is capable of importing XML files from other systems (Figure 2). After the XML DTD is imported to Metalogy, it will accept XML files with one or more records conforming to that DTD. Users may set decision rules beforehand so that, during the import of XML files, the system will determine whether a particular record is already in the system database.

  • Export XML files

Through XML, Metalogy can exchange metadata with other systems and export well-formed XML files for user access. Users may export a certain number of records or a batch of records through search, and they may set up variables to define the fields for export.

  • Establish access controls

The system provides management functions for establishing basic information and access control restrictions on users. While initializing the system users need to input their user names and passwords. The system will allow appropriate access to Metalogy after verification.

  • Manage error messages

System managers can edit error messages based on their needs without re-compiling the system. Descriptions, icons and buttons may be customized in order to avoid user misunderstanding.

  • Web search function

Metalogy provides the same search functions for Web search as for internal search.

Metalogy User's Manual

The Metalogy User's Manual was compiled to guide information managers on how to use this system. It includes step-by-step diagrams and descriptions on how to install, setup, use and operate the system. Through this manual users may easily manage the system themselves without further assistance.

Metadata and the DTD Instance

While designing a digital library, museum or archive, one must develop the metadata specification based on data requirements of the application. In addition, Metalogy requires using an XML/DTD to express the metadata format. Since the formulation of a metadata specification and the DTD to express it require an in-depth understanding of the resource characteristics (which is rather time-consuming) and needs to take into account interoperability among systems, it is better to use some kind of pre-defined DTD. In 2000 the authors designed several types of metadata specifications with DTDs for the National Palace Museum, including the metadata and DTD for calligraphies, objects, scriptures, exhibitions, references, name authority files, title authority files, a geographical names thesaurus and a time thesaurus with tagging guides and examples. We hope these may be of use in other applications.

Conclusion

Metadata technology is the core of digital library systems, and XML is the most popular syntax for metadata. Major types of metadata formats in use include EAD, GILS, DGDC, MARC, CIMI, TEI, DC and others. In addition, many formats are extensions of the above. Furthermore, one institution may hold different types of resources and use different types of metadata formats, which is a major difference between digital libraries and traditional ones. Thus, when designing a metadata management system, one should not base it on a particular format. Rather, it is more appropriate for the system designers to use XML as a core that would be capable of handling various metadata formats. This concept is behind the development of Metalogy. For the time being, Metalogy is available upon request and is free of charge. User feedback and comments are welcomed for further improvement of the system.

This work was partially supported by the National Science Council of Taiwan under grant NSC 89-2750-P-002-012. For further readings on this project in Chinese see the National Science Council website at http://www.nsc.gov.tw .

Chao-chen Chen is with the Department of Adult and Continuing Education, National Taiwan Normal University, and is head of the Department of Public Services, National Central Library, Taipei, 10642, Taiwan, ROC, and can be reached by e-mail at cc4073@tpts1.seed.net.tw.

Hsueh-hua Chen is with the Department of Library and Information Science, National Taiwan University, Taipei, 10660, Taiwan, ROC; e-mail: sherry@ccms.ntu.edu.tw.

Kuang-hua Chen is with the Department of Library and Information Science, National Taiwan University, Taipei, 10660, Taiwan, ROC; e-mail: khchen@ccms.ntu.edu.tw

How to Order


ASIST Home Page

American Society for Information Science and Technology
8555 16th Street, Suite 850, Silver Spring, Maryland 20910, USA
Tel. 301-495-0900, Fax: 301-495-0810 | E-mail:
asis@asis.org

Copyright 2001, American Society for Information Science and Technology