Ferreting out fraud is never easy. For years, compliance professionals have relied on the bravery of whistleblowers, the mis-steps of perpetrators, and a good amount of luck to uncover wrongdoing both inside and outside their companies.

That may all be changing with the advent of the Big Data era. These days, high-powered analytic tools can crunch through enormous quantities of structured and unstructured data, producing an exponentially greater set of comparisons within the data on any given afternoon than a human could in a month. As companies refine their tools and techniques, catching fraudsters may soon become more a matter of learning how to properly interrogate a computer program rather than putting gumshoes on the case.

Like most big data applications, using analytics to help spot fraud is not an entirely new concept. For at least two decades, internal auditors have used tools like ACL and Caseware to manage very large quantities of data, typically structured or numerical data, says Allan Bachman, education manager for the Association of Certified Fraud Examiners and former director of internal audit for a private college.  What's new is that now such tools can handle non-numerical data, like employee files or audio from hotline calls, vastly expanding their reach. “The ability to analyze text and other unstructured data has become a huge thing,” he says, with e-mails and social media activity yielding plenty of insights about what is going on behind the numbers.

Indeed, the old rules-based queries are easily evaded by smart thieves who know how to hide in numerical patterns.  “Fraud is about going where rules don't exist,” says Vincent Walden, a partner in Ernst & Young's fraud investigation and dispute services group. “That's where we move to modeling tools that integrate text mining.”  

Walden is currently working to help companies assess their risk of violating various global anti-bribery and corruption laws, including the U.S. Foreign Corrupt Practices Act and the U.K.'s Bribery Act.  Corruption is an area of hot interest for regulators; the Securities and Exchange Commission brought 20 Foreign Corrupt Practices Act cases in fiscal year 2011 and seems intent on keeping the law in the headlines, with a recent $60 million settlement with Pfizer on this issue.

Many of the bribery schemes in the cases on record lasted for years before they were detected, and some involved several executives. Walden says that's in part because “the analytics required for detecting bribery and corruption fraud schemes are fundamentally different from those [used to look for] traditional financial and accounting frauds,” even though both rely on similar sources of data, including the general ledger, accounts payable systems, and travel and expense tracking systems.

Detecting improper payments largely comes down to what misguided employees enter into the free-text field of a payment description in the accounts payable system, Walden says.  His analytics are designed to identify the most frequently occurring noun phrases in those fields, sorting them by frequency, and then dollar amount.  From that, he gets a sense of what's standard (like “invoice entry”) and what's out of the norm (like “volume contract acceleration,” or “special advance”), possibly indicating an illegal payment.

Sometimes the phrases will be specific—one scheme Walden found used the code word “black” to mark everything related to it— while other terms tend to be universal.  “One of the more suspicious words is ‘special;' if a payment is ‘special,' it definitely needs to be looked at,” he says.

What would have taken weeks or even months for investigators to accomplish in the past without Big Data tools can now be done in a matter of minutes. Scanning the results of such a query “generally takes about 30 minutes with a cup of coffee,” Walden says.

On the Offense

Plenty of companies, particularly those in high-stakes industries like banking, insurance, and healthcare, are using data tools to protect themselves against external fraud threats.  “We look at a series of transactions coming through and apply analytical models and standard data mining techniques to determine whether or not a threat is likely present,” says Scott Burroughs, manages the software portfolio for IBM's industry solutions, including fraud detection tools such as i2, SPSS, and Content Analytics.

One common way that money launderers and insurance fraudsters try to hide, for example, is by using slightly different variations of a name in several cases. Bill Smith may be a victim in one car crash, B.J Smith may be a witness in another, and John Smith may have been a passenger in a car crash several months before, an IBM case study notes.  Such nuances are unlikely to be spotted by the human eye, particularly if they occur over time and in different business units. With Big Data capabilities, though, “you can parse through huge amounts of data and reveal that this person is in the center of all these transactions even though they're not actually executing them,” says Burroughs. Armed with such results, insurance companies can decide whether to challenge the claim or ask the claimant for more details; actions which will often scare away less-sophisticated thieves in and of themselves.

The tools can also help flag suspicious transactions for a closer look.  “A lot of what we're trying to do is help clients look at more transactions faster and pinpoint the big impact items,” Burroughs says.  Along those lines, predictive models aim to figure out which ones are most likely to yield results, sometimes incorporating investigators' free-text notes from previous, unresolved investigations to determine whether a new case matches any intangible aspects of older ones.

“Fraud is about going where rules don't exist. That's where we move to modeling tools that integrate text mining.”

—Vincent Walden,


Ernst & Young

Experts stress that finding fraud in piles of data still relies heavily on human intelligence, in addition to turbo-charged Big Data tools.  “I've found fraud over four decades, and a lot of it is asking people the right questions, listening to the answers, comparing those to what you thought they would say, and then going to the data to see if it follows what they said,” says Don Sparks, vice president of industry relations for data analysis services and tools provider Audimation Services and former chief audit executive for a property and casualty insurance provider.

Sparks says much useful data lies under the radar. A search of printer logs, for example, helped him spot a scheme in which an employee illegally printed and sold over $200,000 worth of amusement park tickets that the company was supposed to be giving to charity. Matching up network activity and vacation schedules can uncover people who aren't doing what they say they're doing. Internal fraud investigators can also easily correlate employee records and vendor records to hunt for evidence of dummy companies, a common scheme in which an employee creates a fictional firm, such as a rug cleaning company, and then signs invoices to be paid to it. (In most cases, the two will share some identifier, likely a zip code or phone number.) “The tools are all there, the problem is that companies often don't know what to look for,” he says.

On the Outside Looking In

Then there's the problem of actually getting the data.  The one entity that is least likely to use big data to find fraud is, ironically, is the external auditor. “There are a number of tools in existence that are very helpful at identifying anomalies and more are being developed,” says William Titera, an audit partner with Ernst & Young. “However the number-one barrier to using these tools is the inability to get complete, accurate data in a timely manner.”

                     ABOUT THIS SERIES

Compliance Week's exclusive four-part series on “Big Data” is examining the growing volume of information that companies are capturing and the tools they are building to harness mass volumes of diverse data at speeds once inconceivable. We'll look at ways it can be used to improve risk management, audit, and compliance, and the compliance officer's role in this landmark business transformation.

Part 1: Unlocking the Potential of Information, July 17Part 2: Starting Small, Scaling Up, July 31

Part 3: Big Data and fraud investigation, Aug. 14

Part 4: The logjam on execution, Aug. 28

That's largely the result of inconsistencies in how ERP systems maintain and export data to auditors, due to varying specifications from vendor to vendor and even company customizations within the same product. “This lack of standardization is a barrier to the use of sophisticated data analysis, which is critical to enhancing audit quality,” Titera says.

In response, the AICPA's Assurance Services Executive Committee, which Titera chairs, recently issued an exposure draft on audit data standards, in conjunction with all the major audit firms and the three largest ERP vendors, among others, with comments due Sept. 17.

None of those obstacles seem to be stopping government agencies, however, which have the power to commandeer just about any data they desire and are steadily accumulating the tools they need to crunch it.  Read any recent SEC or Department of Justice complaint and you'll see a healthy dose of evidence coming from searches of e-mail and other computer activity. One particularly chilling case: The SEC was able to charge a Bristol Myers Squibb finance executive with insider trading in part because he searched for ways to avoid such charges on his work computer.

In fact, if there's any incentive for companies to get more serious about searching their own data for fraud, it may be this: “We work on both sides of the fence,” says IBM's Burroughs, “and the tools are all the same on the law enforcement side as they are on the corporate side.”