Overcoming Data Overload: <br> How one company reduced storage requirements by 90 percent31 Oct, 2007 By: Howard Gross imageSource
Overcoming Data Overload:
How one company reduced storage requirements by 90 percent
When E-BizDocs Inc., one of the largest records management and document
imaging companies in upstate New York, started working with two New York State
agencies – the Education Department and Office of Mental Health – the company
was faced with an extreme challenge. The agencies combined had more than 400
million pages of legal and medical documents that needed to be digitally
archived. Complicating the matter even more was the fact that the agencies
required the documents to be archived in color at 300 dpi in order to preserve
unique identifying features, such as color gradations on diplomas, photos and
various signatures and handwritten notes on medical records.
The two agencies’ requirements quickly resulted in data overload. E-BizDocs
anticipated that it would go through at least 1 terabyte of storage every two
weeks as they progressed through the project. For example, the Education
Department was scanning employee applications in four color for its Office of
Professions. Similarly, the Office of Mental Health was scanning medical records
in four color. Each batch of documents processed – which was essentially
equivalent to a standard file box – would result in 1 gigabyte of data.
E-BizDocs needed to find a cost-effective way to minimize storage
requirements while ensuring that they were producing archive-quality digital
reproductions of the documents that could easily be e-mailed and accessed by
multiple people within the agencies for years to come.
The solution, the E-BizDocs found, was PDF/A with mixed raster content (MRC)
compression from LuraTech Inc., a leading provider of open, ISO-compliant
JPEG2000 and PDF/A technology. Using LuraTech’s LuraDocument PDF Compressor
Server, E-BizDocs was able to significantly reduce storage and network traffic
requirements by producing highly compressed PDF/A files from scanned color
documents. In fact, by using the LuraTech PDF/A compression solution, the New
York State agencies were able to reduce their storage requirements by 90 percent
and improve electronic transfer capabilities by reducing file sizes from 8
Megabits to 80 kilobits per page.
The fact that the documents were in PDF/A format also was important to the
New York State Education Department and Office of Mental Health. The agencies
are required to keep many of these documents in excess of 40 years, so they
needed to be able to view the records long into the future. PDF/A is an open
International Standards Organization (ISO) file format designed for long-term
archiving based on PDF. PDF/A provides users assurance that their documents will
maintain their appearance and readability regardless of the applications and
systems used to create them or future availability of viewing applications or
How PDF/A with MRC Works
The LuraDocument PDF Compressor enables the generation of highly compressed PDF
and PDF/A files from color or black and white scanned documents with the use of
MRC compression technology. This proven multi-layer segmentation and compression
process offers the best way to minimize the size of scanned documents while
maintaining superior image quality and text legibility.
MRC is a unique process through which text and images are separated into
their own individual layers, also known as a multi-layered segmentation process,
and then optimally compressed (see Figure 1). The underlying concept of the
compression process is the partitioning of the document into three distinct
1. a bi-level image containing text;
2. a foreground image containing the color information of the text segments; and
3. a residual image devoid of text components.
Each segment is then compressed using separate algorithms that are
specifically adapted to the corresponding type of data. The text is compressed
losslessly using the Fax G4 format, while the foreground and background is
highly compressed using the JPEG2000 format. MRC reduces full-color documents to
the size of a TIFF G4 file, while black and white scanned documents are
approximately 50 percent smaller than Fax G4.
Deploying the PDF/A Scanning & Compression Solution
E-BizDocs evaluated a number of products before deciding that LuraTech’s PDF/A
with MRC compression solution was right for the job. Beyond the superior
compression performance, the reasons E-BizDocs selected the LuraDocument PDF
Compressor Server were threefold:
• Ease of use: The solution does not require a huge, technologically advanced
IT staff. LuraTech offers web-based licensing, so E-BizDocs was able to start
creating PDF/As within an hour of installing the software.
• Customer service: LuraTech offered effective and responsive customer
service, resolving E-BizDocs issues with a single call or e-mail. Additionally
LuraTech offered opportunities for future enhancements based on the questions
and concerns E-BizDocs described.
• Total cost of ownership: Unlike other solutions available on the market
today which requires payment for each page processed, LuraTech charges a
one-time, runtime license fee for unlimited PDF create and an annual fee for
maintenance and support. This model simplifies administration for the user and
ensures maximum productivity. This was an important factor in E-BizDocs’
decision because, as a service bureau, they needed to maintain a certain cost
structure for each project. The LuraDocument PDF Compressor Server was the only
product available that offered a fixed-price option.
By employing the right solution, E-BizDocs was able to cost-effectively scan
millions of pages of historical legal and medical documents for New York State’s
Education Department and Office of Mental Health, while meeting the agencies’
goals for long-term storage, which included easy access to the digital files and
the ability to search the text. The success achieved by using the LuraTech
solution for the Education Department earned E-BizDocs the honor of Best Digital
Archive for the Office of the Professions, which was presented in September 2007
as part of Government Technology Conferences Best Solutions Showcase.
From this experience, E-BizDocs has learned that producing high-quality
replication of original documents in color to PDF/A via MRC compression does not
have to be costly or require inordinate amounts of storage. With the right
solution, color documents can be scanned and compressed in a size comparable to
black and white PDFs. Moreover, PDF/A ensures long-term accessibility of the
digital document copies. By using this standard, E-BizDocs was able to alleviate
concerns that the agencies had about future technology changes and comply with
regulations that require them to maintain records for more than 40 years.
About the Author: Howard Gross is president and founder of E-BizDocs,
a privately held document management located in Albany, N.Y. Mr. Gross has been
involved in the records management industry for over 20 years. His company has
been responsible for digitizing confidential work, as well as historic records,
ranging from Federal death penalty cases to court documents from the late 1700s.
Mr. Gross earned an MBA from Rensselaer Polytechnic Institute.
About E-BizDocs: E-BizDocs, Inc. is an Albany, N.Y.-based business
providing proven leadership in the emerging world of document management. E-BizDocs,
Inc. offers a customized approach to records management with a philosophy is
based on the belief that there is no "one size fits all." They are resellers of
the industry’s best ECM solutions, as well as a paper scanning service provider
to numerous clients including the Unified Court System, State of New York
Mortgage Association, New York State Lottery, the Division of Parole, the New
York State Education Department and New York State Office of Mental Health.