LOC Workshop on Etexts, Library of Congress [books to read in a lifetime TXT] 📗
- Author: Library of Congress
- Performer: -
Book online «LOC Workshop on Etexts, Library of Congress [books to read in a lifetime TXT] 📗». Author Library of Congress
BARONAS next outlined the four categories of EIM standardization in which AIIM standards are being developed: transfer and retrieval, evaluation, optical disc and document scanning applications, and design and conversion of documents. She detailed several of the main projects of each: 1) in the category of image transfer and retrieval, a bi-level image transfer format, ANSI/AIIM MS53, which is a proposed standard that describes a file header for image transfer between unlike systems when the images are compressed using G3 and G4 compression; 2) the category of image evaluation, which includes the AIIM-proposed TR26 tutorial on image resolution (this technical report will treat the differences and similarities between classical or photographic and electronic imaging); 3) design and conversion, which includes a proposed technical report called “Forms Design Optimization for EIM” (this report considers how general-purpose business forms can be best designed so that scanning is optimized; reprographic characteristics such as type, rules, background, tint, and color will likewise be treated in the technical report); 4) disk and document scanning applications includes a project a) on planning platters and disk management, b) on generating an application profile for EIM when images are stored and distributed on CD-ROM, and c) on evaluating SCSI2, and how a common command set can be generated for SCSI2 so that document scanners are more easily integrated. (ANSI/AIIM MS53 will also apply to compressed images.)
******
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
BATTIN The implications of standards for preservation A major obstacle to successful cooperation A hindrance to access in the digital environment Standards a double-edged sword for those concerned with the preservation of the human record Near-term prognosis for reliable archival standards Preservation concerns for electronic media Need for reconceptualizing our preservation principles Standards in the real world and the politics of reproduction Need to redefine the concept of archival and to begin to think in terms of life cycles Cooperation and the La Guardia Eight Concerns generated by discussions on the problems of preserving text and image General principles to be adopted in a world without standards *
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Patricia BATTIN, president, the Commission on Preservation and Access (CPA), addressed the implications of standards for preservation. She listed several areas where the library profession and the analog world of the printed book had made enormous contributions over the past hundred years—for example, in bibliographic formats, binding standards, and, most important, in determining what constitutes longevity or archival quality.
Although standards have lightened the preservation burden through the development of national and international collaborative programs, nevertheless, a pervasive mistrust of other people’s standards remains a major obstacle to successful cooperation, BATTIN said.
The zeal to achieve perfection, regardless of the cost, has hindered rather than facilitated access in some instances, and in the digital environment, where no real standards exist, has brought an ironically just reward.
BATTIN argued that standards are a double-edged sword for those concerned with the preservation of the human record, that is, the provision of access to recorded knowledge in a multitude of media as far into the future as possible. Standards are essential to facilitate interconnectivity and access, but, BATTIN said, as LYNCH pointed out yesterday, if set too soon they can hinder creativity, expansion of capability, and the broadening of access. The characteristics of standards for digital imagery differ radically from those for analog imagery. And the nature of digital technology implies continuing volatility and change. To reiterate, precipitous standard-setting can inhibit creativity, but delayed standard-setting results in chaos.
Since in BATTIN’S opinion the near-term prognosis for reliable archival standards, as defined by librarians in the analog world, is poor, two alternatives remain: standing pat with the old technology, or reconceptualizing.
Preservation concerns for electronic media fall into two general domains. One is the continuing assurance of access to knowledge originally generated, stored, disseminated, and used in electronic form. This domain contains several subdivisions, including 1) the closed, proprietary systems discussed the previous day, bundled information such as electronic journals and government agency records, and electronically produced or captured raw data; and 2) the application of digital technologies to the reformatting of materials originally published on a deteriorating analog medium such as acid paper or videotape.
The preservation of electronic media requires a reconceptualizing of our preservation principles during a volatile, standardless transition which may last far longer than any of us envision today. BATTIN urged the necessity of shifting focus from assessing, measuring, and setting standards for the permanence of the medium to the concept of managing continuing access to information stored on a variety of media and requiring a variety of ever-changing hardware and software for access—a fundamental shift for the library profession.
BATTIN offered a primer on how to move forward with reasonable confidence in a world without standards. Her comments fell roughly into two sections: 1) standards in the real world and 2) the politics of reproduction.
In regard to real-world standards, BATTIN argued the need to redefine the concept of archive and to begin to think in terms of life cycles. In the past, the naive assumption that paper would last forever produced a cavalier attitude toward life cycles. The transient nature of the electronic media has compelled people to recognize and accept upfront the concept of life cycles in place of permanency.
Digital standards have to be developed and set in a cooperative context to ensure efficient exchange of information. Moreover, during this transition period, greater flexibility concerning how concepts such as backup copies and archival copies in the CXP are defined is necessary, or the opportunity to move forward will be lost.
In terms of cooperation, particularly in the university setting, BATTIN also argued the need to avoid going off in a hundred different directions. The CPA has catalyzed a small group of universities called the La Guardia Eight—because La Guardia Airport is where meetings take place—Harvard, Yale, Cornell, Princeton, Penn State, Tennessee, Stanford, and USC, to develop a digital preservation consortium to look at all these issues and develop de facto standards as we move along, instead of waiting for something that is officially blessed. Continuing to apply analog values and definitions of standards to the digital environment, BATTIN said, will effectively lead to forfeiture of the benefits of digital technology to research and scholarship.
Under the second rubric, the politics of reproduction, BATTIN reiterated an oft-made argument concerning the electronic library, namely, that it is more difficult to transform than to create, and nowhere is that belief expressed more dramatically than in the conversion of brittle books to new media. Preserving information published in electronic media involves making sure the information remains accessible and that digital information is not lost through reproduction. In the analog world of photocopies and microfilm, the issue of fidelity to the original becomes paramount, as do issues of “Whose fidelity?” and “Whose original?”
BATTIN elaborated these arguments with a few examples from a recent study conducted by the CPA on the problems of preserving text and image. Discussions with scholars, librarians, and curators in a variety of disciplines dependent on text and image generated a variety of concerns, for example: 1) Copy what is, not what the technology is capable of. This is very important for the history of ideas. Scholars wish to know what the author saw and worked from. And make available at the workstation the opportunity to erase all the defects and enhance the presentation. 2) The fidelity of reproduction—what is good enough, what can we afford, and the difference it makes—issues of subjective versus objective resolution. 3) The differences between primary and secondary users. Restricting the definition of primary user to the one in whose discipline the material has been published runs one headlong into the reality that these printed books have had a host of other users from a host of other disciplines, who not only were looking for very different things, but who also shared values very different from those of the primary user. 4) The relationship of the standard of reproduction to new capabilities of scholarship—the browsing standard versus an archival standard. How good must the archival standard be? Can a distinction be drawn between potential users in setting standards for reproduction? Archival storage, use copies, browsing copies—ought an attempt to set standards even be made? 5) Finally, costs. How much are we prepared to pay to capture absolute fidelity? What are the trade-offs between vastly enhanced access, degrees of fidelity, and costs?
These standards, BATTIN concluded, serve to complicate further the reproduction process, and add to the long list of technical standards that are necessary to ensure widespread access. Ways to articulate and analyze the costs that are attached to the different levels of standards must be found.
Given the chaos concerning standards, which promises to linger for the foreseeable future, BATTIN urged adoption of the following general principles:
* Strive to understand the changing information requirements of
scholarly disciplines as more and more technology is integrated into
the process of research and scholarly communication in order to meet
future scholarly needs, not to build for the past. Capture
deteriorating information at the highest affordable resolution, even
though the dissemination and display technologies will lag.
* Develop cooperative mechanisms to foster agreement on protocols
for document structure and other interchange mechanisms necessary
for widespread dissemination and use before official standards are
set.
* Accept that, in a transition period, de facto standards will have
to be developed.
* Capture information in a way that keeps all options open and
provides for total convertibility: OCR, scanning of microfilm,
producing microfilm from scanned documents, etc.
* Work closely with the generators of information and the builders
of networks and databases to ensure that continuing accessibility is
a primary concern from the beginning.
* Piggyback on standards under development for the broad market, and
avoid library-specific standards; work with the vendors, in order to
take advantage of that which is being standardized for the rest of
the world.
* Concentrate efforts on managing permanence in the digital world,
rather than perfecting the longevity of a particular medium.
******
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
DISCUSSION Additional comments on TIFF
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
During the brief discussion period that followed BATTIN’s presentation, BARONAS explained that TIFF was not developed in collaboration with or under the auspices of AIIM. TIFF is a company product, not a standard, is owned by two corporations, and is always changing. BARONAS also observed that ANSI/AIIM MS53, a bi-level image file transfer format that allows unlike systems to exchange images, is compatible with TIFF as well as with DEC’s architecture and IBM’s MODCA/IOCA.
******
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
HOOTON * Several questions to be considered in discussing text conversion
* +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
HOOTON introduced the final topic, text conversion, by noting that it is becoming an increasingly important part of the imaging business. Many people now realize that it enhances their system to be able to have more and more character data as part of their imaging system. Re the issue of OCR versus rekeying, HOOTON posed several questions: How does one get text into computer-readable form? Does one use automated processes? Does one attempt to eliminate the use of operators where possible? Standards for accuracy, he said, are extremely important: it makes a major difference in cost and time whether one sets as a standard 98.5 percent acceptance or 99.5 percent. He mentioned outsourcing as a possibility for converting text. Finally, what one does with the image to prepare it for the recognition process is also important, he said, because such preparation changes how recognition is viewed, as well as facilitates recognition itself.
******
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
LESK Roles of participants in CORE Data flow The scanning process The image interface Results of experiments involving
Comments (0)