Companies are creating oceans of data every day. Many are drowning in it. But the savviest companies see the future of compliance deep in those vast seas of ones and zeros.

Collectively, we create 2.5 quintillion bytes of data every single day, according to IBM. A quintillion is a one followed by 18 zeroes. That's so much that 9 of every 10 bytes accumulated on servers and storage media around the world were produced just in the last two years alone. In 2011, we hoarded over 1.8 zettabytes (1.8 trillion gigabytes) added research firm IDC, meaning that the world's data volume is doubling every two years.

It's a happy coincidence, then, that amid this explosion of zettabytes, a new set of tools is emerging to harness mass volumes of diverse data at speeds once inconceivable. You see, the new buzzword “big data” isn't about the data, which is indeed big. Rather, big data is about what we can now do with it all. The possibilities are tantalizing in everything from retail marketing to geospatial imaging to fraud detection and risk management.

LinkedIn uses big data to power its “people you may know” functionality. Facebook uses it as a source for reporting and analytics, as well as for machine learning. Twitter uses it to store and process tweets, log files, and spot user trends. Orbitz uses it to understand user preferences. Chevron uses it to process seismic data from ships cruising for oil reserves. Zions Bancorp uses it to expand its fraud detection net. And so on.

“It's really about one thing: the ability to cost-effectively handle this growing volume and velocity of data,” says David Corrigan, director of strategy for IBM Information Management, a major big data player.

Yet it's still early days for big data. Even in mid-2012, recent articles on the topic often include a reference to analyst Doug Laney's 2001 description of the three “Vs” of big data. Laney emphasized that the Velocity of data processing and the ability to handle its structural Variety (and hence difficulty in terms of sticking it all into the tidy slots of a relational database) were as important as its Volume. It's a bit like mentioning World-Wide-Web inventor Tim Berners-Lee in current stories about the Internet.

So where's a governance, risk, and compliance specialist to begin? First understand that if you're not a big data expert, you're not alone. Joe Gottlieb, president and CEO of big data firm Sensage, placed big data at “about 1.5” on a scale of ten along the technology-adoption curve, or somewhere between “bleeding edge” and “leading edge.”

“Right now, it's not for the faint of heart,” Gottlieb says. “There's a lot of investment going into it, and a lot of players who will say, ‘We'll help you overcome the immaturity of Hadoop by wrapping ourselves around it.'”

Hadoop, the foundational technology of big data, is a way to use distributed processing to crank through massive data sets on clusters of computers. Its name derives not from a Hindu god, but rather a favorite stuffed elephant of the infant son of Doug Cutting, the programmer who developed it at Yahoo starting in 2006 after reading papers about a Google project called MapReduce.

Hadoop and other open-source big data tools like MongoDB and Cassandra all have the ability to handle massive amounts of data—ranging from structured information like that populating tidy rows and columns of relational databases to unstructured data in the form of free text, images, and video—with relative ease and at low cost.

Big data provides advantages on several levels. Foremost is that because big data is optimized for unstructured data, a company doesn't have to spend a lot of time packaging square-peg data into round-hole relational databases. Big data is also great for cloud computing and can harness that technology's outsourceable, pay-as-you-go flexibility. Big data's core software is open source, although companies like IBM do license enhanced versions aimed at compensating for big data's technological immaturity. And the software is fairly intuitive for experts in traditional databases.

“It's really about one thing: the ability to cost-effectively handle this growing volume and velocity of data.”

—David Corrigan,

Director of Strategy,

IBM Information Management

“If you're familiar with database technologies, you're in a good position to quickly learn some of these new data stores—whether it's Hadoop, MongoDB, or Cassandra,” says Brian Gentile, CEO of business-intelligence software firm Jaspersoft, which plugs into Hadoop and other big data sources.

There are still plenty of drawbacks, too. Talent is still an issue. Gentile says that much of the big data technical expertise still resides at vendors, something that will change as implementations build competence at client sites. The more important talent gap may not be technical at all, he adds. It may have more to do with collaboration and bringing varying disciplines together. For example, says Gentile, a group of mathematicians, computer scientists, and financial gurus in the Ukraine have built a business on adding “a huge amount of domain knowledge about how financial markets work” to an open-source big data platform and packaging and selling their results

“They're making a fortune,” Gentile says.

For GRC professionals, the definitions of success are different, but the basic approach is the same, says Richard Anderson, Crowe Horwath Global Risk Consulting's director for the United Kingdom.

“When you begin to understand your value drivers, you can ask the disruptive questions: What's going to change about it? How can we see the signals?” he said. “The mega-opportunities and mega-threats are outside—not just internal. That's what we can do with big data—that's the strategic excitement.”

Among GRC types, big data could provide a vehicle to broader thinking, says risk and compliance lecturer, author, and consultant Michael Rasmussen.

“A lot of times now, it's, ‘I've checked my check boxes of what's required of me today to pass the scrutiny of the regulator and the auditor,' and not thinking big-picture,” Rasmussen says. “IT is focused on information security issues, corporate compliance, anti-bribery and corruption, and Sarbanes-Oxley. Nobody's thinking about how we look at this more holistically.”

DATA DETAILS

The following chart from Computer Sciences Corp. provides statistics on unstructured data vs. structured data from 2009 to 2014.

Source: Computer Sciences Corp.

Corrigan says experimentation is a low-cost way to see how big data might yield big business value. The software is open source or available in free trial versions; resources like IBM's Big Data University can help get IT staff up to speed.

“It's proving that this technology can do something different,” Corrigan says. “Let's say I have a hypothesis that casting a wider net and capturing more data yields better insights. Are ten years of transactions better than two years? Well, let's prove it.”

Carl Lackstrom, vice president of risk management and internal audit for Irving, Texas-based healthcare information firm HMS, says his company is dipping its toes into the big data waters. The company has long dealt with huge amounts of data, he says, but in structured, industry-standard formats. Like most companies that have looked at it, HMS is in the earliest stages of adoption. “I think at best we're taking some of the initial steps that would enable us to consider big data in the future,” he says. 

He says big data could help enhance its customer offerings as well as provide value from a risk-management and audit perspective. “The more data you have in a system with the infrastructure that allows you to do more analysis with it, the more opportunities there are from a risk and audit perspective to understand what's happening with the company, to identify potential issues and control gaps, and to leverage the risks and controls we have,” Lackstrom says.

But first, he says, hard thinking is needed and there are more questions than answers: What data do we have? What data do we need? What kinds of metrics do you use in terms of evaluating returns on investment? How do you monitor these projects to make sure you're getting value out of them?

“For something like big data, a lot of the vendors out there—let's be frank—they're trying to sell their products and services and there's a lot of pie-in-the-sky thinking,” Lackstrom says. “If you haven't thought through what you're trying to get out of these projects, you could very well be pouring money down the drain.”