Distributed Institutional Repository: White Paper
 
 

Vision
One of the major missions of Purdue University is to facilitate and disseminate research. The goal of an institutional repository is to bring together the university’s research output in one place to make it available to appropriate groups. To do so in today’s digital world, the Libraries recognize that a robust distributed repository is needed to house, curate, manage, access and update locally generated research. The Purdue Institutional Repository will include everything from reports to articles, but will especially focus on collecting research data (datasets or databases). Because of the diversity of users, sources and formats of research, building this repository presents challenges in gathering the data and information, as well as in organizing, accessing, and facilitating the use of it. Libraries faculty and professional staff, partnering with ITaP, will develop and implement the infrastructure to achieve overall goals. The repository environment will be used to investigate and resolve problems related to archiving, curating, accessing, retrieving and securing locally developed database and data sets. It will address collaborative use and interoperability so that cross and multi disciplinary works could be accessed under a uniform interface to better facilitate interdisciplinary research.

Purpose
As the collections and information systems of the Libraries evolve and attune to technological advances, the needs of our patrons and of interdisciplinary research, and new models for disseminating information, challenges and opportunities are presented for librarians to engage and partner with the Purdue research community and our faculty to widen and improve access to the University’s body of work electronically. The distributed institutional repository (DIR) will play a vital role in accomplishing this goal by establishing a framework for the description and linkage of these digital materials to foster federated search and retrieval, interoperability of disparate systems, and the repurposing of information in exciting new ways.

By collecting, organizing, managing, and providing access to data and information, the institutional repository will fulfill goals of archiving and disseminating research for local, remote, current, and future communities. Currently, the Libraries provide collection management and library science expertise to help strengthen knowledge acquisition at Purdue and as demonstrated leaders of information access and facilitation on campus, this initiative is well-suited to its mission and strengths.

Model
The Libraries will manage a unified portal that will act as a high-level gateway to search and access the DIR by harvesting metadata and providing various levels of descriptions and linkages to these distributed information systems (see diagram). At a minimum, the description would include the fifteen core elements of the Dublin Core metadata standard and the linkage would be a URL. Enhanced metadata and delivery of information will improve levels of both descriptions and linkages. The unified portal is not intended to replace the native interfaces and tools of participating information systems; instead, it will serve to increase the discovery of them and provide an architecture to support future interoperability. It is important to recognize that while some information systems may be operated locally by the Libraries, many will be hosted remotely and/or administered elsewhere on-campus or on the Internet.

Within the model, these are examples of potential collaborative projects:

  1. A patron accesses the unified portal using a web browser to search the DIR.
  2. After discovering digital materials, the patron may switch from the portal to the native interface and tools by following a link to them.
  3. Bridging high-level and low-level metadata will lead to interoperability of information systems, for example, using Storage Resource Broker (SRB) to marry D-Space and TeraGrid resources. Access control can be incorporated with the method of linkage and can be described in the metadata.
  4. Interoperating systems with rich, complete metadata can lead to the development of new tools for blending and repurposing data; for example, an electronic dissertation can be put together with the dataset that supports the researcher’s thesis.

Two early goals in developing the DIR will be to establish the base linkages and descriptions in a user interface (the unified portal) and to begin reconciling low and high-level metadata to foster interoperability.

Librarians can easily generate high-level metadata using Dublin Core, which elements include the creator, title, description, rights, subject, and identifier (link.)

Low-level, administrative metadata is provided natively by the files themselves, including their timestamps, filenames, and format. Additional description comes at a slightly higher level from the filesystem and the application that provides the information service.

This second goal is to bridge the low-level metadata (which is useful to the native system and application) and the high-level metadata (which is useful for discovery and federated searching) with mid-level metadata and protocols that will enable participating systems to interoperate and share data with each other and the user. This will enable the user to modularize data from two or more systems and create innovative tools for exporting and analyzing the data.

 By collecting, organizing, managing, and providing access to data and information, the Libraries can use the distributed institutional repository to help fulfill the goals of archiving and disseminating research for local, remote, current, and future communities.