readenglishbook.com » Literary Collections » LOC Workshop on Etexts, Library of Congress [books to read in a lifetime TXT] 📗

Book online «LOC Workshop on Etexts, Library of Congress [books to read in a lifetime TXT] 📗». Author Library of Congress



1 ... 13 14 15 16 17 18 19 20 21 ... 33
Go to page:
important than the hardware and might also cost more than the hardware, but it was likely to prove critical to the success or failure of one’s system. In addition to a stand-alone scanning workstation for image capture, then, text capture requires one or two editing stations networked to this scanning station to perform editing. Editing the text takes two or three times as long as capturing the images.

Finally, ZIDAR stressed the importance of buying an open system that allows for more than one vendor, complies with standards, and can be upgraded.

******

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

WATERS Yale University Library’s master plan to convert microfilm to digital imagery (POB) The place of electronic tools in the library of the future The uses of images and an image library Primary input from preservation microfilm Features distinguishing POB from CXP and key hypotheses guiding POB Use of vendor selection process to facilitate organizational work Criteria for selecting vendor Finalists and results of process for Yale Key factor distinguishing vendors Components, design principles, and some estimated costs of POB Role of preservation materials in developing imaging market Factors affecting quality and cost Factors affecting the usability of complex documents in image form

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Donald WATERS, head of the Systems Office, Yale University Library, reported on the progress of a master plan for a project at Yale to convert microfilm to digital imagery, Project Open Book (POB). Stating that POB was in an advanced stage of planning, WATERS detailed, in particular, the process of selecting a vendor partner and several key issues under discussion as Yale prepares to move into the project itself. He commented first on the vision that serves as the context of POB and then described its purpose and scope.

WATERS sees the library of the future not necessarily as an electronic library but as a place that generates, preserves, and improves for its clients ready access to both intellectual and physical recorded knowledge. Electronic tools must find a place in the library in the context of this vision. Several roles for electronic tools include serving as: indirect sources of electronic knowledge or as “finding” aids (the on-line catalogues, the article-level indices, registers for documents and archives); direct sources of recorded knowledge; full-text images; and various kinds of compound sources of recorded knowledge (the so-called compound documents of Hypertext, mixed text and image, mixed-text image format, and multimedia).

POB is looking particularly at images and an image library, the uses to which images will be put (e.g., storage, printing, browsing, and then use as input for other processes), OCR as a subsequent process to image capture, or creating an image library, and also possibly generating microfilm.

While input will come from a variety of sources, POB is considering especially input from preservation microfilm. A possible outcome is that the film and paper which provide the input for the image library eventually may go off into remote storage, and that the image library may be the primary access tool.

The purpose and scope of POB focus on imaging. Though related to CXP, POB has two features which distinguish it: 1) scale—conversion of 10,000 volumes into digital image form; and 2) source—conversion from microfilm. Given these features, several key working hypotheses guide POB, including: 1) Since POB is using microfilm, it is not concerned with the image library as a preservation medium. 2) Digital imagery can improve access to recorded knowledge through printing and network distribution at a modest incremental cost of microfilm. 3) Capturing and storing documents in a digital image form is necessary to further improvements in access. (POB distinguishes between the imaging, digitizing process and OCR, which at this stage it does not plan to perform.)

Currently in its first or organizational phase, POB found that it could use a vendor selection process to facilitate a good deal of the organizational work (e.g., creating a project team and advisory board, confirming the validity of the plan, establishing the cost of the project and a budget, selecting the materials to convert, and then raising the necessary funds).

POB developed numerous selection criteria, including: a firm committed to image-document management, the ability to serve as systems integrator in a large-scale project over several years, interest in developing the requisite software as a standard rather than a custom product, and a willingness to invest substantial resources in the project itself.

Two vendors, DEC and Xerox, were selected as finalists in October 1991, and with the support of the Commission on Preservation and Access, each was commissioned to generate a detailed requirements analysis for the project and then to submit a formal proposal for the completion of the project, which included a budget and costs. The terms were that POB would pay the loser. The results for Yale of involving a vendor included: broad involvement of Yale staff across the board at a relatively low cost, which may have long-term significance in carrying out the project (twenty-five to thirty university people are engaged in POB); better understanding of the factors that affect corporate response to markets for imaging products; a competitive proposal; and a more sophisticated view of the imaging markets.

The most important factor that distinguished the vendors under consideration was their identification with the customer. The size and internal complexity of the company also was an important factor. POB was looking at large companies that had substantial resources. In the end, the process generated for Yale two competitive proposals, with Xerox’s the clear winner. WATERS then described the components of the proposal, the design principles, and some of the costs estimated for the process.

Components are essentially four: a conversion subsystem, a network-accessible storage subsystem for 10,000 books (and POB expects 200 to 600 dpi storage), browsing stations distributed on the campus network, and network access to the image printers.

Among the design principles, POB wanted conversion at the highest possible resolution. Assuming TIFF files, TIFF files with Group 4 compression, TCP/IP, and ethernet network on campus, POB wanted a client-server approach with image documents distributed to the workstations and made accessible through native workstation interfaces such as Windows. POB also insisted on a phased approach to implementation: 1) a stand-alone, single-user, low-cost entry into the business with a workstation focused on conversion and allowing POB to explore user access; 2) movement into a higher-volume conversion with network-accessible storage and multiple access stations; and 3) a high-volume conversion, full-capacity storage, and multiple browsing stations distributed throughout the campus.

The costs proposed for startup assumed the existence of the Yale network and its two DocuTech image printers. Other startup costs are estimated at $1 million over the three phases. At the end of the project, the annual operating costs estimated primarily for the software and hardware proposed come to about $60,000, but these exclude costs for labor needed in the conversion process, network and printer usage, and facilities management.

Finally, the selection process produced for Yale a more sophisticated view of the imaging markets: the management of complex documents in image form is not a preservation problem, not a library problem, but a general problem in a broad, general industry. Preservation materials are useful for developing that market because of the qualities of the material. For example, much of it is out of copyright. The resolution of key issues such as the quality of scanning and image browsing also will affect development of that market.

The technology is readily available but changing rapidly. In this context of rapid change, several factors affect quality and cost, to which POB intends to pay particular attention, for example, the various levels of resolution that can be achieved. POB believes it can bring resolution up to 600 dpi, but an interpolation process from 400 to 600 is more likely. The variation quality in microfilm will prove to be a highly important factor. POB may reexamine the standards used to film in the first place by looking at this process as a follow-on to microfilming.

Other important factors include: the techniques available to the operator for handling material, the ways of integrating quality control into the digitizing work flow, and a work flow that includes indexing and storage. POB’s requirement was to be able to deal with quality control at the point of scanning. Thus, thanks to Xerox, POB anticipates having a mechanism which will allow it not only to scan in batch form, but to review the material as it goes through the scanner and control quality from the outset.

The standards for measuring quality and costs depend greatly on the uses of the material, including subsequent OCR, storage, printing, and browsing. But especially at issue for POB is the facility for browsing. This facility, WATERS said, is perhaps the weakest aspect of imaging technology and the most in need of development.

A variety of factors affect the usability of complex documents in image form, among them: 1) the ability of the system to handle the full range of document types, not just monographs but serials, multi-part monographs, and manuscripts; 2) the location of the database of record for bibliographic information about the image document, which POB wants to enter once and in the most useful place, the on-line catalog; 3) a document identifier for referencing the bibliographic information in one place and the images in another; 4) the technique for making the basic internal structure of the document accessible to the reader; and finally, 5) the physical presentation on the CRT of those documents. POB is ready to complete this phase now. One last decision involves deciding which material to scan.

******

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

DISCUSSION TIFF files constitute de facto standard NARA’s experience with image conversion software and text conversion RFC 1314 Considerable flux concerning available hardware and software solutions NAL through-put rate during scanning Window management questions *

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

In the question-and-answer period that followed WATERS’s presentation, the following points emerged:

 

* ZIDAR’s statement about using TIFF files as a standard meant de

facto standard. This is what most people use and typically exchange

with other groups, across platforms, or even occasionally across

display software.

 

* HOLMES commented on the unsuccessful experience of NARA in

attempting to run image-conversion software or to exchange between

applications: What are supposedly TIFF files go into other software

that is supposed to be able to accept TIFF but cannot recognize the

format and cannot deal with it, and thus renders the exchange

useless. Re text conversion, he noted the different recognition

rates obtained by substituting the make and model of scanners in

NARA’s recent test of an “intelligent” character-recognition product

for a new company. In the selection of hardware and software,

HOLMES argued, software no longer constitutes the overriding factor

it did until about a year ago; rather it is perhaps important to

look at both now.

 

* Danny Cohen and Alan Katz of the University of Southern California

Information Sciences Institute began circulating as an Internet RFC

(RFC 1314) about a month ago a standard for a TIFF interchange

format for Internet distribution of monochrome bit-mapped images,

which LYNCH said he believed would be used as a de facto standard.

 

* FLEISCHHAUER’s impression from hearing these reports and thinking

about AM’s experience was that there is considerable flux concerning

available hardware and software solutions. HOOTON agreed and

commented at the same time on ZIDAR’s statement that the equipment

employed affects the results produced. One cannot draw a complete

conclusion by saying it is difficult or impossible to perform OCR

from scanning microfilm, for example, with that device, that set of

parameters, and system requirements, because numerous other people

are accomplishing just that, using other components, perhaps.

HOOTON opined that both the hardware and the software were highly

important. Most of the problems discussed today have been solved in

numerous different ways by other people. Though it is good

1 ... 13 14 15 16 17 18 19 20 21 ... 33
Go to page:

Free e-book «LOC Workshop on Etexts, Library of Congress [books to read in a lifetime TXT] 📗» - read online now

Comments (0)

There are no comments yet. You can be the first!
Add a comment