The Mueller Report, Obsolete Technology, And Inferior PDFs

CD & DVD optical drives - obsolete technology

The Justice Department hand-delivered Special Counsel Robert Mueller’s 448-page report to Congress on Thursday morning – on CD-ROMs. Isn’t that the cutest thing? House Judiciary Committee staff looked in the closets and found a computer with a working CD-ROM drive.

The report was then posted to the Department of Justice website as a 139Mb PDF file. Maybe you know – there are high-quality PDF files that live in nice homes, and there are low-class PDF files that are kind of trashy. The Mueller report PDF was one of the low-class ones.

A generous assessment is that the Mueller report was provided to the DOJ on paper. (The next part is even more unforgivable if the DOJ started with a high-quality PDF from Mueller.) The DOJ scanned the report. They used some program to apply the black redaction bars. Then they printed it out again, and then scanned it again to create a new PDF. Staffers burned it onto CDs for Congress.

The result is a technically awful PDF. The file is 10 times bigger than it needed to be, the pages are full of visual artifacts from the scanner, and most importantly, it is just a collection of images that cannot be searched. It is not digitally signed for security and does not meet the archival standard for PDF files.

The PDF Association, which is interesting because who knew there is a “PDF Association”?, concludes:

“If Mueller delivered a “born digital” PDF to Justice, that file was printed and scanned back into a set of low-quality images for release; a disservice to all future users of the document, and also a violation of Section 508 regulations [accessibility for users with disabilities].

“If Mueller delivered a paper document to Justice which was subsequently scanned, DOJ’s treatment of the document is more understandable, but still non-conforming with Section 508. . . .

“It’s interesting – and deeply unfortunate – that DOJ clearly used advanced redaction software but nonetheless chose to deliver a paper-age “images only” PDF. In so doing they:

  • “Dramatically increased the file’s size, probably by 8-10x.
  • “Permanently and substantially reduced the visual text and image quality of a document of historical interest
  • “Permanently reduced text searchability (assuming they received a searchable PDF from Mueller)
  • “Delivered a documents that’s inherently inaccessible to users who require assistive technology (AT) in order to read, requiring substantial remediation efforts to recover any useful degree of accessibility, let alone full compliance with applicable regulations.”

Obsolete technology

CD drives enter the hospice stage

Although CD-ROMs are effectively obsolete, apparently they’re still used commonly in the federal government for classified materials. CDs cannot be altered and cannot easily be compromised. The alternatives are posting files to a website (a password-protected Google Drive doc, for example), sending them by email, or putting them on a USB stick or external hard drive, but each one of those is potentially less secure. It takes hard work to secure a website used to distribute files when you’re trying both to limit access to a group of people and to prevent hacking. USB sticks are well known to be vulnerable to malware. Lawyers still frequently use CDs and DVDs for document productions.

Optical drives – CDs and DVDs – disappeared from Mac computers years ago. Almost no Windows laptops include CD drives any more; the drives take huge amounts of space inside the case that can be better used for a larger battery or more storage or improved graphics, or better yet, removed to allow the laptop to be thinner and lighter. The drives are cheap enough that they’re still included in full-size Windows desktop computers but they’re seldom used. If you happen to need a CD/DVD drive for a computer that doesn’t have one, you can get an external USB CD/DVD drive for $20-$30.

At one time the storage capacity of CD-ROMs felt spacious. I’m old and I remember when the idea of a single thin disc that could hold 650Mb of data was magical, but our inexhaustible thirst for storage space quickly began to make them feel cramped. Theoretically DVD-ROMs could hold 4.7Gb of data but there were hints of problems from the beginning, with confusing formats and frequent errors creating or reading discs. Later, fights over standards and falling prices on USB sticks and external drives meant that Blu-Ray never became part of our computer experience. Now we store our files in the cloud and share them online. If you really need a USB stick, a 32Gb drive is $5.99 and can be grabbed from a bin by the cash register, but in this age of Dropbox and OneDrive, why would you?

Even the niche uses of optical discs are fading away. Blu-Ray movies never really caught on (DRM and anti-piracy measures made watching movies on Blu-Ray discs into a horrible experience) and the format is basically now deceased – it has become trivially easy to watch Netflix on a new TV, and a $35 Chromecast gives you access to many other streaming services on your TV. Gamers traditionally bought new games on disc for Playstation and XBox, but all the momentum now is toward streaming new games to the console from online, to the point that last week Microsoft announced XBox One S All-Digital Edition, which is 100% identical to the XBox One S but fifty dollars cheaper because it does not have a disc drive.

Happy trails, CD-ROMs! For a brief shiny period you were a trusted friend, but you no longer spark joy and Marie Kondo says it is time for you to go. Other than for delivery of the single most important document in recent US history, apparently.

Inferior PDFs

Low quality PDFs

When you create a PDF directly from Microsoft Word or Google Docs, the result is the best kind of PDF, an “all-digital PDF.” Imagine the PDF has layers. The top layer shows the words and images. The next layer has the precise text, all the words exactly like the original, indexed and ready to be searched. You can hold the left button of the mouse down and highlight individual words or sentences.

On the other hand, when you create a PDF by scanning pages or by taking a photo or screenshot, the PDF is a collection of pictures of the pages, like a series of photos. Without more work, that type of PDF has no knowledge of what’s on a page, no idea of whether there are words or pictures of dogs with silly expressions. As an aside, if your job requires you to closely evaluate PDFs of dogs with silly expressions, you might want to take another look at your life choices.

PDFs of scanned pages are not searchable and do not allow you to highlight words and sentences. If dragging the left button of the mouse produces a rectangle instead of snapping to words and sentences, your PDF only has images and not the full text of the underlying document.

Adobe Acrobat and other programs can perform OCR (“Optical Character Recognition”) on an image-based PDF and try to identify the words in the document. The OCR results are stored in a separate layer, like an all-digital PDF, but the results are likely to be less accurate, depending on the quality of the image, recognizability of the writing, and page layout quirks.

The PDF of the Mueller report was a collection of images. Congress and news organizations quickly ran OCR on the PDF but they could not get ideal results from the DOJ source, which had relatively low quality scans plus underlining and redactions. Lots and lots and lots of redactions.

Redactions can be done securely by Acrobat Pro without degrading the remainder of an all-digital PDF. Although the DOJ used a redaction program, when they printed and re-scanned the Mueller report, they might as well have cut out pieces of black paper and taped them over the redactions. In the words of the PDF Association: “Instead of delivering “native” redactions, however, it’s obvious that DoJ printed and then scanned the document after it was redacted. We know this because on many pages a scanner artifact (the faint yellow line) crosses a redacted area. This deliberate and unnecessary act made the document substantially harder for anyone and everyone to use, forever.”

Are you a generous person who believes people are fundamentally good? Then the DOJ was taking precautions for understandable reasons to make absolutely sure that the redacted portions could not be read by technical snoopers. There have been many high-profile redaction failures in the last few years by everyone from the NSA to the New York Times. There was even a redaction failure during the Mueller investigation when blacked-out parts of a filing by Paul Manafort’s lawyers could be read due to a mistake either by Manafort’s lawyers or by DOJ staffers. So the crappy PDF of the Mueller report might be natural paranoia by staffers who have already been burned by a technical redaction mishap.

Are you a suspicious person who believes the release of the redacted Mueller report was one small step in an incredibly clumsy coverup by morons? If you believe that Bill Barr and the people carrying out his orders in the DOJ are low-life criminals who will stoop to anything to obstruct justice, no matter how petty and stupid, then releasing a low-quality PDF creates a tiny little roadblock to slow down the full understanding of the report. It only took 10 minutes for everyone to run OCR and make the report searchable, but hey, ten minutes is ten minutes.

I honestly don’t know which one is more likely. Are they actively engaged in one last corrupt attempt to obstruct justice, or are they merely technical morons? Although we don’t have enough evidence to be sure, to my mind it’s obvious ██████████ █████ ████████ ███████ ██████. █████ █████ and the horse they rode in on.