Not only the volume of information subject to e-Discovery is exploding; now the formats for that information are, too. So even as companies still struggle to pore through electronic libraries of text, e-Discovery of audio and video material is coming up fast.

Reliable statistics are still hard to find—one IDC report from last year says the “digital universe” will increase by a factor of 300 from 2005 to 2020—but without question, the audio and video files now subject to e-Discovery are surging. “We've seen a dramatic spike,” says Todd Horst, executive vice president with Consilio, an e-Discovery software vendor.

Several factors drive the increase. One is simply “the overall proliferation of conversations and the number and types of devices,” Horst says. Example: a securities trader conducting business on a personal mobile phone, making it more difficult for the company to track that conversation. Another cause is increased regulatory attention to audio information; laws in both the United States and Britain either require, or strongly encourage, companies to save such audio files.

The U.K.'s top regulator in the financial industry began requiring firms to record all relevant telephone conversations and electronic communications in 2009. Section 731 of the Dodd-Frank Act requires swaps dealers to  “maintain daily trading records … and recorded communications, including electronic mail, instant messages, and recordings of telephone calls.” 

Those laws now mean that e-Discovery is no longer restricted to data that can be physically reproduced with a Control-P command. “The definition of electronic information is not limited to information that's susceptible to printing,” says Steven Teppler, a partner with the law firm of Kirk Pinkerton and co-chair of the American Bar Association's e-Discovery and Digital Evidence Committee.

That presents a number of challenges. For starters, audio files are far larger than a comparable amount of information stored as text. Chris O'Connor, director of e-Discovery services with Complete Discovery Source, says he worked on one case where 650,000 audio files contained the same amount of data as 2 million text records.

The nature of audio and visual information slows down e-Discovery as well, O'Connor says. A person can review electronic documents by scanning them for specific words or phrases; no such trick exists for audio and video. Instead, the person must rewind or fast-forward audio-visual files to locate a specific piece of information.

Both of those facts about audio and video files ultimately mean that throwing more bodies at the problem—a time-honored way to run up legal bills in the past—no longer works. “The human approaches to solve the problem fall apart,” says John Mancini, president of AIIM, an industry association for information professionals. 

Emerging Technology

“You want to address at least 80 percent of the problem in a way that's better than what you're currently doing now and is defensible. Making things better is better than making things perfect.”

—John Mancini,

President,

AIIM

Mancini admits that software to search audio and video files effectively is still evolving, although it “is improving all the time,” he adds. One promising area, he says, is phonetic search technology.

Consilio's approach, for example, incorporates technology from Nexidia, a provider of phonetic indexing and searching capabilities. Rather than transforming audio information into text, Consilio applies “phonetical search indexing,” says Adam Pollitt, executive vice president with Consilio. That is, the technology indexes recorded audio using phoneme patterns. (A phoneme is the smallest phonetic unit in a language that can convey a distinction in meaning, as the m of mat or the b of bat.) A reviewer can input a text-based search string; say, “hot stock tip.” The software breaks this phrase into phonemes and then searches for the relevant phonemes or phoneme sequences.

That's not to say the technical solutions available today are perfect. Reviewing multiple formats of data together—say, an audio file and a Word document—is difficult, says Allison Jane Walton, founder of Fortis Quay, an information consulting firm, and a board member with the Association of Certified e-Discovery Specialists. “This can create a disconnect between the other information collected for a search-and-review,” Walton says. “Context can be lost and the review becomes more manual.”

THE DIGITAL UNIVERSE IN 2020

Below is a summary of IDC's paper, “The Digital Universe in 2020.”

… at the midpoint of a longitudinal study starting with data collected in 2005 and extending to 2020, our analysis shows a continuously expanding, increasingly complex, and ever more interesting digital universe. This is IDC's sixth annual study of the digital universe, and it's chock-full of new findings:

From 2005 to 2020, the digital universe will grow by a factor of 300, from 130 exabytes to

40,000 exabytes, or 40 trillion gigabytes (more than 5,200 gigabytes for every man, woman, and child in 2020). From now until 2020, the digital universe will about double every two years.

The investment in spending on IT hardware, software, services, telecommunications, and staff that could be considered the “infrastructure” of the digital universe and telecommunications will grow by 40 percent between 2012 and 2020. As a result, the investment per gigabyte (GB) during that same period will drop from $2.00 to $0.20. Of course, investment in targeted areas like storage management, security, big data, and cloud computing will grow considerably faster.

Between 2012 and 2020, emerging markets' share of the expanding digital universe will grow from 36 percent to 62 percent.

A majority of the information in the digital universe, 68 percent in 2012, is created and consumed by consumers—watching digital TV, interacting with social media, sending camera phone images and videos between devices and around the Internet, and so on. Yet enterprises have liability or responsibility for nearly 80 percent of the information in the digital universe. They deal with issues of copyright, privacy, and compliance with regulations even when the data zipping through their networks and server farms is created and consumed by consumers.

Only a tiny fraction of the digital universe has been explored for analytic value. IDC estimates that by 2020, as much as 33 percent of the digital universe will contain information that might be valuable if analyzed.

By 2020, nearly 40 percent of the information in the digital universe will be “touched” by cloud computing providers—meaning that a byte will be stored or processed in a cloud somewhere in its journey from originator to disposal.

The proportion of data in the digital universe that requires protection is growing faster than the digital universe itself, from less than a third in 2010 to more than 40 percent in 2020.

The amount of information individuals create themselves—writing documents, taking pictures, downloading music, etc.—is far less than the amount of information being created about them in the digital universe.

Much of the digital universe is transient—phone calls that are not recorded, digital TV images that are watched (or “consumed”) that are not saved, packets temporarily stored in routers, digital surveillance images purged from memory when new images come in, and so on. Unused storage bits installed throughout the digital universe will grow by a factor of 8 between 2012 and 2020 but will still be less than a quarter of the total digital universe in 2020.

Source: IDC.

Older audio files can present other technical challenges. If a recording was done on an obsolete system, the company may need to acquire a CODEC program to play the recording on modern computers, says Kevin Treuberg, director of forensic services with Complete Discovery Services.

And even with the best technology, some human review of audio files still is required in most cases. For instance, a firm may have several months' worth of phone calls, and use phonetic searching to identify those that contain certain words. Then those files undergo human review. The overall efficiency of the review process usually remains higher than it would be if the entire process was manual.

In some court proceedings, O'Connor says, the judge may require certified transcription of an audio file. Certified transcribers apply quality control processes that ensure as accurate a transcription as possible, he adds.

Prevention

Even in the audio-video age of e-Discovery, the old advice still applies: The best legal strategy is to stay out of court in the first place. So compliance departments can employ the new generation of e-Discovery software to current company operations, to identify and stop possible litigation risk before any lawsuit comes along.

For instance, a company may record its calls with customers (in compliance with applicable regulations), and then use phonetic searching to check both the quality of the employee's interaction, and for any words or phrases, such as “guaranteed return,” that might be of concern.

The key, Walton says, is to develop the processes and systems to manage audio and video files effectively, period. “Increasingly, we are seeing video and audio as data types that need to be preserved, collected, reviewed, and produced in discovery,” Walton says.

As the volume of these data types grows, most organizations will need to make greater use of technology to supplement their manual efforts. The goal is “not necessarily putting in place a perfect process but a defensible one,” Mancini says. You want to address at least 80 percent of the problem in a way that's better than what you're currently doing now and is defensible, he adds. “Making things better is better than making things perfect.”