Texas Lawyer TL E-alert / Litigation
Advertising Content
Executive Legal Adviser

Current TL E-alerts: Proposed future
TL E-alerts:

  • Appellate
  • Computers
  • Construction
  • Corporate Compliance
  • Criminal
  • Energy
  • Environmental
  • Health Care
  • Intellectual Property
  • International
  • Legal Software/
  • Litigation Support
  • Real Estate
  • Tax
  • Trusts, Estates & Probate

The Bell Curve Document Indexing - Imaging

By Greg Krehel


Remember the "bell curve" from statistics class? The bell curve, so named because of its shape, illustrates the frequency distribution of many phenomena, for example, height. Measure a thousand people. For every person over 7', you'll have a mob between 5'6" and 5'10".

Let's apply the bell curve to the document collections produced during discovery. Out of every thousand cases, how many involve 1,000,000+ documents? 100,000+? 10,000? What does this distribution suggest regarding strategies for imaging and searching documents?

Giant Cases Special Tools Required

We're all familiar with cases in which millions of documents are produced during discovery. But we've also seen individuals over 7' tall. Both instances are outliers occurring infrequently. Out of every thousand cases, only a handful has 1,000,000 or more documents.

Cases with document collections of over 100,000 are also relatively rare. Do even a hundred cases out of every thousand involve this many documents? Widespread use of email has dramatically increased the volume of documents present in many cases, but it hasn't turned every case into a document monster.

Dealing with 1,000,000+ documents or even 100,000+ justifies a substantial investment in scanning and coding. This type of case also demands sophisticated software tools such as Concordance, iCONECT, IPRO, Litigator's Notebook, or Summation to assist with document indexing, image handling, and more.

So that's the story for the giant cases lurking out in one tail of the bell curve. But what about the cases that populate the rest of the curve? How many documents do these cases involve? What's an appropriate image handling and text searching solution for them?

Normal Cases -- Perfect For Adobe Acrobat

Cases with very small document collections fall at the other end of the curve. For every 1,000,000 document case, there's a case that involves a single red weld of documents. These cases with only a single folder or box of documents are probably as rare as the ones with massive quantities of documents.

Which brings us to the approximately 70% of all cases that fall into the center area of the bell curve. My experience suggests these cases have between 1,000 and 50,000 documents. A small number relative to a gargantuan million document case, but still a heap of paper. More documents than any trial team can memorize the details about. Certainly a document collection that should be imaged and available in a searchable form.

If your firm has one of the excellent products mentioned above, it can definitely be put to work on smaller matters as well. However, another wonderful option to consider on cases with small or mid-sized document collections is having documents scanned as PDF and using Adobe Acrobat.

There are numerous reasons Acrobat makes a great choice for a case with a normal size document population. The fact that the PDF format has become ubiquitous is a benefit in and of itself. You may already own and be comfortable with Acrobat, perhaps in connection with court-filing requirements. It's very likely expert witnesses, other law firms, and even your clients are familiar with PDF files and have either a full Acrobat license or the free Adobe Reader, making it easy to share case documents.

Why has the PDF format become the de facto standard for electronic versions of paper documents? The primary reason is that a single PDF file can contain the images of all pages of the paper document as well as the associated document text, typically captured by optical character recognition (OCR) software.

If you're new to document imaging, you may be surprised to learn that, prior to the introduction of the PDF format; the standard way to create electronic versions of paper documents was to generate a series of single-page TIFF images and a separate OCR text file. Thus, scanning a 15-page document would yield a total of 16 separate electronic files --15 Tiffs and a text file.

When scanning first became available, the Many Electronic Files = 1 Paper Document approach was as good as it got and certainly beat nothing at all. However, with the advent of PDF, which meant that 1 Electronic File = 1 Paper Document, it wasn't long before PDF ruled the roost.

The argument for PDF has become even stronger following Adobe's release of Acrobat 6. This important new version of Acrobat offers numerous enhancements, including cross-PDF text searching and improved document mark-up functionality. For example, you can search a folder containing any number of PDF files and instantly locate those containing any term or phrase.

Here's a final tip for any reader who's yet to experiment with document imaging: Using Acrobat is a great way to get comfortable using electronic documents without jumping into the deep end of the pool. Don't scan every case document until you're sure it's worth the effort. Instead, identify the 100 or so most critical documents and have them scanned as PDFs and put in a folder on your network from which they can be searched. You'll be able to evaluate the benefits of using electronic versions of case documents with a minimal investment of time and expense.

When you have documents produced during discovery imaged, be sure to let the scanning vendor or your in-house support staff who does the scanning know you want the resulting PDFs to contain both images and text. If you're not clear about this requirement, you may get back PDFs that contain only images and not the associated text of the documents. PDFs that contain only images cannot be searched.


If you only handle cases with a gazillion documents, Adobe Acrobat isn't the right answer for image-handling and text searching. However, for the vast majority of us, Acrobat is a fantastic solution for some or all cases. If you haven't put Acrobat to the test, you owe it to yourself to try it on an upcoming matter.

Copyright 2004 Greg Krehel. All rights reserved.

About the Author

Greg Krehel is CEO of Casesoft. CaseSoft is the developer of the popular software tools CaseMap, TimeMap, DepPrep, and NoteMap. CaseMap makes it easy to organize and explore the facts, the cast of characters, and the issues in any case. TimeMap makes it a cinch to create chronology visuals for use during hearings and trials, client meetings and brainstorming sessions. DepPrep helps prepare clients for depositions. NoteMap makes it easy to create, edit, and use outlines. In addition to his background in software development, Mr. Krehel has over 15 years of trial consulting experience. You can reach him via e-mail (gkrehel@casesoft.com) or telephone (904-273-5000).

Return to Litigation TL E-alert

Texas Lawyer | Privacy | Terms & Conditions| Executive Legal Adviser