Escalating enforcement of anti-corruption laws around the world is driving chief compliance and risk officers to get savvier about how they monitor their anti-corruption compliance programs. Enter data analytics.

Analyzing data to ferret out potential acts of bribery and corruption is not a new concept, but traditionally it has been limited by the archaic manual process of analyzing structured data—such as spreadsheets and database records. In an era of Big Data, however, most of the vast and deep oceans of information companies collect every day—from social media, mobile devices, e-mail, and more—hold absolutely no value.

“It’s not necessarily about Big Data; it’s about good data,” says Seth Rosensweig, a partner in the Advanced Risk and Compliance Analytics group at PwC. Part of the responsibility of chief compliance officers is to ensure that what they report to regulators is true and accurate, but that requires having true and accurate data to start with.

What’s more, compliance officers at multinational companies often receive a ton of data and reports from various other countries where the company operates that they have to sift through and decide what’s relevant and needs to reported, Rosensweig adds. They have to decide, “What is a risk? What could be a risk?” 

That’s where data analytics-driven risk assessments are changing the face of risk and compliance. In simple terms, data analytics is the ability to harness all the vast and deep oceans of data that companies amass, both internal and external, and at speeds once inconceivable.

Relative to anti-bribery and anti-corruption monitoring, in particular, data analytics gives chief compliance and risk officers the ability to look across multiple sources of data in real-time to more easily identify anomalies and suspicious activity that may be indicative of fraud to isolate potential areas of risk.

Of course, most compliance departments don’t have endless budgets; the wise ones are piggy-backing off technology their companies already employ—like technology-assisted review (TAR) used by legal teams to facilitate document review as part of the discovery process in legal proceedings. “People are learning that the same techniques that we use with e-discovery translate and apply to compliance searches,” says Mark Noel, a managing director of professional services at Catalyst Repository Systems, an e-discovery and software services firm.

Using data analytics on its own is not enough to substantiate that misconduct has occurred. It will, however, point compliance, risk, and audit in the direction of where to focus their efforts.

Just as a miner prospects for gold, “data prospecting” can help compliance locate unexplored subsets of data they didn’t even know existed. “Once you find a few flakes of gold, you start digging and realize there is more,” says Jeremy Pickens, chief scientist at Catalyst.

“Once you know what it is you’re looking for, you mine that data to dig deeper to find as much information as you can,” Pickens says. Both of those pieces are necessary for an overall compliance program, being able to both prospect for data and mine it.”

“It’s not necessarily about Big Data; it’s about good data.”
Seth Rosensweig, Partner, Advanced Risk and Compliance Analytics, PwC

“A classic problem with searching a large volume of documents is that you don’t know what you don’t know,” Noel says. A vast collection of e-mails containing code words used between employees to evade an investigation, for example, could be missed by legal and compliance, simply because you wouldn’t know to search for them, he says. “A machine, however, can see all of the data, all patterns and pockets of anomalies that human eyes may have overlooked,” he adds.

That’s where a tool like Catalyst’s Contextual Diversity system, “Insight Predict,” comes into play. “Many TAR systems concentrate exclusively on relevance feedback—that is, giving you the un-reviewed documents predicted to be the most relevant,” Noel explains. “But Insight Predict’s Contextual Diversity system also adds in some exploratory documents to help make sure you’ve looked into all the corners of your document collection, even the ones that you don’t know about.”

Consider the compliance challenge faced by a global medical device company, whose compliance staff across its Asian and European regions regularly receive reports of potential violations of the U.S. Foreign Corrupt Practices Act, the U.K. Bribery Act, and other anti-corruption laws. The company sought to streamline and standardize its process for investigating these tips so that its remote regional offices could easily collect relevant data—even across multiple languages—and analyze it for potential violations.

By using the search capabilities of Catalyst Insight, the company found the answer to its problem. Insight’s integrated processing software Fast Track enabled the company to easily submit multi-language files directly into Insight, whose multi-language search and analytics enabled the company to quickly assess the merits of a tip. Insight’s user controls also gave each region the ability to manage the process, while enabling the company to centrally monitor its compliance investigations worldwide.

Assessing the data requires having the right team of people at the table. That means assembling a team of IT experts­—who know what data is available, where it is located, and how it’s stored—and also an investigative team of subject experts, including compliance, legal, and internal audit to be able to interpret the results.

“Compliance is also using analytics to weed out false positives,” Rosensweig says. In this way, companies—financial services firms, in particular—can more accurately hone in on what risk areas to audit or monitor on a much more targeted basis.

Consider the compliance task of a global financial institution, for example, that has to check the names of individuals against those on a sanctions blacklist. Such a task can pose significant challenges for any multinational company with thousands of customers who may share the same name as those on the blacklist.


Mark Noel, managing director of professional services at Catalyst, explains the meaning of “contextual diversity.”
Contextual Diversity is an exploratory tool found only in the Insight Predict system that runs automatically as part of a technology-assisted review project. Many TAR systems concentrate exclusively on relevance feedback—that is, giving you the unreviewed documents predicted to be the most relevant. But Insight Predict’s Contextual Diversity system also adds in some exploratory documents to help make sure you’ve looked into all the corners of your document collection—even the ones that you don’t know about.
A classic problem with searching a large volume of documents is that you don’t know what you don’t know. If there are unexpected documents, concepts or terms in the collection, you could miss them simply because you don’t know to search for them. For example, a collection could contain emails among people who used code words in order to actively evade searching. Even though you can’t see all the details of all the documents, Insight Predict can. Predict can also see what you’ve already reviewed, and what you haven’t yet seen.
The Contextual Diversity system is constantly searching through the entire collection for the next biggest clump of similar but unseen stuff. From that unseen region, it picks the best example and puts it in front of you to review. The technical description of this is “explicit modeling of the unknown,” but that just means that the machine is actively making sure you get a look into those unexplored pockets of the collection.
This process is iterative, meaning that it’s re-computed every time Predict re-ranks your collection, which is often several times an hour during active review. So another way to think about Contextual Diversity could be “continuous active exploration.” As the review progresses and more documents are reviewed, it explores deeper into smaller and smaller pockets of different, unseen documents.
This kind of active exploration is much more efficient than random sampling for making sure you see all the topics in the collection, because topics are not all of the same size. Random sampling would oversample the large topics and miss many of the smaller ones. But active exploration of all the documents, constantly updated as you review more documents, lets you quickly and systematically explore smaller topics without wasting time reviewing redundant documents from the largest topics.
Source: Catalyst

Using data analytics, financial institutions can collect a broader variety of information—such as the individual’s nationality, the names and locations of family members, and whether they've traveled to, or received money from, sanctioned countries—to more easily identify those who are truly sanctioned individuals versus those who only share a name with them.

Prescription drug fraud

Some pharmaceutical companies are using data analytics to identify and prevent prescription drug fraud. At online pharmaceutical company Express Scripts, for example, “the fight against prescription drug fraud and abuse involves a combination of tried-and-true detective work and state-of-the-art technology,” the company states on its website.

To achieve this, Express Scripts has a fraud, waste, and abuse team that uses “proprietary data analytics to uncover patterns of potential fraud or abuse, and scans for behavioral red flags to identify when someone is involved in wrongdoing. By combining Express Scripts’ unique platform, Health Decision Science—behavioral sciences, clinical specialization, and actionable data—the team has identified 290 potential indicators of pharmacy fraud.

Examples of fraud indicators Express Scripts has identified include:

The number of doctors visited;

Distance traveled to the physician or pharmacy;

The geography and patient population;

The mix of drugs dispensed; and

The frequency of those prescriptions.

Through the analysis of such data, Express Scripts is more accurately able to identify such fraudulent activity as billing for drugs that were never dispensed, billing for incorrect quantities of drugs, incidents of overbilling, and more, the company stated. “Collaborating with clients, government agencies, and law enforcement is a key component of the team’s work.”

Operational efficiencies

In addition to monitoring for fraud, other pharmaceutical companies are using data analytics to not only meet regulatory mandates, but streamline operational efficiencies.

Take the example of a global pharmaceutical company that entered into a corporate integrity agreement (CIA) with the government after its sales representatives were caught promoting the off-label use of the company’s blockbuster drugs to physicians. Under the terms of that CIA, the company was required to produce a monthly report that, in part, had to identify which of their sales representatives completed mandatory compliance training.

At the time, the CIA training team worked with a legacy human resources system to assign training to employees based on criteria such as country, job code, and job category. The HR information was manually entered into a learning management tool to select the “covered individuals” who needed training. This process was often repeated several times.

As a result, the company had a difficult time producing the reports required by the CIA, because the data it needed resided in seven different learning management systems, explains Ramon Chen, chief marketing officer at Reltio, a data management solutions provider. Also missing was a complete analytical picture of which individuals truly needed to take the intense four-hour online certification course to satisfy government requirements, he says.

That’s when the pharmaceutical company turned to a combination of data-driven applications and modern data management. The new system enabled the company to track and manage training assignments, as well as “covered person” status, by aligning the legal criteria required under the CIA and mapping those with the attributes stored for their employees.

By combining data from its main HR system and multiple training systems, and matching and consolidating records, the company could get an accurate handle on who actually needs to take the training, Chen says. With this insight, the company was able to reduce from 10,000 to 5,000 the number of employees taking the training, effectively refocusing 20,000 hours of employee time, and freeing the compliance team to focus on other responsibilities, he says.

In the compliance space, the use of data analytics is still evolving. But those compliance programs that are ahead of the game are already realizing its benefits, reducing bribery and corruption risks and achieving operational efficiencies.