LOC Workshop on Etexts, Library of Congress [books to read in a lifetime TXT] 📗
- Author: Library of Congress
- Performer: -
Book online «LOC Workshop on Etexts, Library of Congress [books to read in a lifetime TXT] 📗». Author Library of Congress
cognizant of various experiences, this is not to say that it will
always be thus.
* At NAL, the through-put rate of the scanning process for paper,
page by page, performing OCR, ranges from 300 to 600 pages per day;
not performing OCR is considerably faster, although how much faster
is not known. This is for scanning from bound books, which is much
slower.
* WATERS commented on window management questions: DEC proposed an
X-Windows solution which was problematical for two reasons. One was
POB’s requirement to be able to manipulate images on the workstation
and bring them down to the workstation itself and the other was
network usage.
******
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
THOMA Illustration of deficiencies in scanning and storage process Image quality in this process Different costs entailed by better image quality Techniques for overcoming various deficiencies: fixed thresholding, dynamic thresholding, dithering, image merge Page edge effects
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
George THOMA, chief, Communications Engineering Branch, National Library of Medicine (NLM), illustrated several of the deficiencies discussed by the previous speakers. He introduced the topic of special problems by noting the advantages of electronic imaging. For example, it is regenerable because it is a coded file, and real-time quality control is possible with electronic capture, whereas in photographic capture it is not.
One of the difficulties discussed in the scanning and storage process was image quality which, without belaboring the obvious, means different things for maps, medical X-rays, or broadcast television. In the case of documents, THOMA said, image quality boils down to legibility of the textual parts, and fidelity in the case of gray or color photo print-type material. Legibility boils down to scan density, the standard in most cases being 300 dpi. Increasing the resolution with scanners that perform 600 or 1200 dpi, however, comes at a cost.
Better image quality entails at least four different kinds of costs: 1) equipment costs, because the CCD (i.e., charge-couple device) with greater number of elements costs more; 2) time costs that translate to the actual capture costs, because manual labor is involved (the time is also dependent on the fact that more data has to be moved around in the machine in the scanning or network devices that perform the scanning as well as the storage); 3) media costs, because at high resolutions larger files have to be stored; and 4) transmission costs, because there is just more data to be transmitted.
But while resolution takes care of the issue of legibility in image quality, other deficiencies have to do with contrast and elements on the page scanned or the image that needed to be removed or clarified. Thus, THOMA proceeded to illustrate various deficiencies, how they are manifested, and several techniques to overcome them.
Fixed thresholding was the first technique described, suitable for black-and-white text, when the contrast does not vary over the page. One can have many different threshold levels in scanning devices. Thus, THOMA offered an example of extremely poor contrast, which resulted from the fact that the stock was a heavy red. This is the sort of image that when microfilmed fails to provide any legibility whatsoever. Fixed thresholding is the way to change the black-to-red contrast to the desired black-to-white contrast.
Other examples included material that had been browned or yellowed by age. This was also a case of contrast deficiency, and correction was done by fixed thresholding. A final example boils down to the same thing, slight variability, but it is not significant. Fixed thresholding solves this problem as well. The microfilm equivalent is certainly legible, but it comes with dark areas. Though THOMA did not have a slide of the microfilm in this case, he did show the reproduced electronic image.
When one has variable contrast over a page or the lighting over the page area varies, especially in the case where a bound volume has light shining on it, the image must be processed by a dynamic thresholding scheme. One scheme, dynamic averaging, allows the threshold level not to be fixed but to be recomputed for every pixel from the neighboring characteristics. The neighbors of a pixel determine where the threshold should be set for that pixel.
THOMA showed an example of a page that had been made deficient by a variety of techniques, including a burn mark, coffee stains, and a yellow marker. Application of a fixed-thresholding scheme, THOMA argued, might take care of several deficiencies on the page but not all of them. Performing the calculation for a dynamic threshold setting, however, removes most of the deficiencies so that at least the text is legible.
Another problem is representing a gray level with black-and-white pixels by a process known as dithering or electronic screening. But dithering does not provide good image quality for pure black-and-white textual material. THOMA illustrated this point with examples. Although its suitability for photoprint is the reason for electronic screening or dithering, it cannot be used for every compound image. In the document that was distributed by CXP, THOMA noticed that the dithered image of the IEEE test chart evinced some deterioration in the text. He presented an extreme example of deterioration in the text in which compounded documents had to be set right by other techniques. The technique illustrated by the present example was an image merge in which the page is scanned twice and the settings go from fixed threshold to the dithering matrix; the resulting images are merged to give the best results with each technique.
THOMA illustrated how dithering is also used in nonphotographic or nonprint materials with an example of a grayish page from a medical text, which was reproduced to show all of the gray that appeared in the original. Dithering provided a reproduction of all the gray in the original of another example from the same text.
THOMA finally illustrated the problem of bordering, or page-edge, effects. Books and bound volumes that are placed on a photocopy machine or a scanner produce page-edge effects that are undesirable for two reasons: 1) the aesthetics of the image; after all, if the image is to be preserved, one does not necessarily want to keep all of its deficiencies; 2) compression (with the bordering problem THOMA illustrated, the compression ratio deteriorated tremendously). One way to eliminate this more serious problem is to have the operator at the point of scanning window the part of the image that is desirable and automatically turn all of the pixels out of that picture to white.
******
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
FLEISCHHAUER AM’s experience with scanning bound materials Dithering
* +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Carl FLEISCHHAUER, coordinator, American Memory, Library of Congress, reported AM’s experience with scanning bound materials, which he likened to the problems involved in using photocopying machines. Very few devices in the industry offer book-edge scanning, let alone book cradles. The problem may be unsolvable, FLEISCHHAUER said, because a large enough market does not exist for a preservation-quality scanner. AM is using a Kurzweil scanner, which is a book-edge scanner now sold by Xerox.
Devoting the remainder of his brief presentation to dithering, FLEISCHHAUER related AM’s experience with a contractor who was using unsophisticated equipment and software to reduce moire patterns from printed halftones. AM took the same image and used the dithering algorithm that forms part of the same Kurzweil Xerox scanner; it disguised moire patterns much more effectively.
FLEISCHHAUER also observed that dithering produces a binary file which is useful for numerous purposes, for example, printing it on a laser printer without having to “re-halftone” it. But it tends to defeat efficient compression, because the very thing that dithers to reduce moire patterns also tends to work against compression schemes. AM thought the difference in image quality was worth it.
******
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
DISCUSSION Relative use as a criterion for POB’s selection of books to be converted into digital form
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
During the discussion period, WATERS noted that one of the criteria for selecting books among the 10,000 to be converted into digital image form would be how much relative use they would receive—a subject still requiring evaluation. The challenge will be to understand whether coherent bodies of material will increase usage or whether POB should seek material that is being used, scan that, and make it more accessible. POB might decide to digitize materials that are already heavily used, in order to make them more accessible and decrease wear on them. Another approach would be to provide a large body of intellectually coherent material that may be used more in digital form than it is currently used in microfilm. POB would seek material that was out of copyright.
******
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
BARONAS Origin and scope of AIIM Types of documents produced in AIIM’s standards program Domain of AIIM’s standardization work AIIM’s structure TC 171 and MS23 Electronic image management standards Categories of EIM standardization where AIIM standards are being developed
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Jean BARONAS, senior manager, Department of Standards and Technology, Association for Information and Image Management (AIIM), described the not-for-profit association and the national and international programs for standardization in which AIIM is active.
Accredited for twenty-five years as the nation’s standards development organization for document image management, AIIM began life in a library community developing microfilm standards. Today the association maintains both its library and business-image management standardization activities—and has moved into electronic image-management standardization (EIM).
BARONAS defined the program’s scope. AIIM deals with: 1) the terminology of standards and of the technology it uses; 2) methods of measurement for the systems, as well as quality; 3) methodologies for users to evaluate and measure quality; 4) the features of apparatus used to manage and edit images; and 5) the procedures used to manage images.
BARONAS noted that three types of documents are produced in the AIIM standards program: the first two, accredited by the American National Standards Institute (ANSI), are standards and standard recommended practices. Recommended practices differ from standards in that they contain more tutorial information. A technical report is not an ANSI standard. Because AIIM’s policies and procedures for developing standards are approved by ANSI, its standards are labeled ANSI/AIIM, followed by the number and title of the standard.
BARONAS then illustrated the domain of AIIM’s standardization work. For example, AIIM is the administrator of the U.S. Technical Advisory Group (TAG) to the International Standards Organization’s (ISO) technical committee, TC l7l Micrographics and Optical Memories for Document and Image Recording, Storage, and Use. AIIM officially works through ANSI in the international standardization process.
BARONAS described AIIM’s structure, including its board of directors, its standards board of twelve individuals active in the image-management industry, its strategic planning and legal admissibility task forces, and its National Standards Council, which is comprised of the members of a number of organizations who vote on every AIIM standard before it is published. BARONAS pointed out that AIIM’s liaisons deal with numerous other standards developers, including the optical disk community, office and publishing systems, image-codes-and-character set committees, and the National Information Standards Organization (NISO).
BARONAS illustrated the procedures of TC l7l, which covers all aspects of image management. When AIIM’s national program has conceptualized a new project, it is usually submitted to the international level, so that the member countries of TC l7l can simultaneously work on the development of the standard or the technical report. BARONAS also illustrated a classic microfilm standard, MS23, which deals with numerous imaging concepts that apply to electronic imaging. Originally developed in the l970s, revised in the l980s, and revised again in l991, this standard is scheduled for another revision. MS23 is an active standard whereby users may propose new density ranges and new methods of evaluating film images in the standard’s revision.
BARONAS detailed several electronic image-management standards, for instance, ANSI/AIIM MS44, a quality-control guideline for scanning 8.5” by 11” black-and-white office documents. This standard is used with the IEEE fax image—a continuous tone photographic
Comments (0)