Armed with big data and cheap computing power, corporations scour every quantitative source from web traffic to highway traffic for information that offers any kind of competitive edge.
But a new case study from Stanford Engineering suggests those numbers may not be as trustworthy as they appear. Thanks to human meddling, data analyses can be infected by hidden problems that lead to wasted time and labor, bad decisions and misguided strategies.
That’s the ominous picture that emerged from more than a year of closely observing and interviewing data analysts at one multinational high-tech company. The case study observed that, under pressure to deliver results, even experienced data analysts cut corners and made compromises on the rationale that flawed data is better than none at all.
Analysts told the researchers that they felt compelled to be “flexible,” to “keep it simple” and to deliver quick-and-dirty “directional” answers if they didn’t have the time or data for a complete analysis, said Ryan Stice-Lusvardi, a PhD candidate in management science and engineering, who conducted the interviews. Her research was overseen by Professor Pamela Hinds, who has long studied how technology affects work in in large corporations. If the preliminary results were disappointing, Stice-Lusvardi said, the managers who had asked for the studies would sometimes push to change the analysis in midstream — a big violation of proper statistical practice.
The researchers offered one anecdote that exemplified the tensions inside the company between the data seekers and data analysts. When one in-house client told an analyst that a quick-and-dirty report had vindicated his strategy, the analyst retorted, “No, what the data is telling you is that the data is crap.”
How widespread is this problem? The researchers caution that observations at a single company can’t automatically be applied to others. However, their observations are a cautionary tale for companies of all sorts, and should cause data-driven companies to consider how results can be undermined when expert analysts take orders from non-experts who want quick answers and feel pressure to support their decisions with data but don’t understand the statistical pitfalls.
“People could be making decisions that they think are evidence-based and well-rounded, when in fact they may not be,” said Hinds.
Stice-Lusvardi observed that even experienced analysts began to rationalize serious compromises. “What was disturbing was how often people who knew better, who had years of experience, were saying ‘do it anyway,’” she said. “It sounds ludicrous, but it makes sense in context. In an organization, you have to deliver.”
The data analysts, despite their specialized expertise, were effectively service providers who didn’t have the authority to push back. Their career prospects often depend on pleasing their clients, who typically didn’t have expertise in data analytics.
Stice-Lusvardi said that the pressure for speedy analysis was a big source of compromise. In many cases, the data analysts suspected that their “clients” in the company — marketing managers, for example — felt that they had the experience and intuition to proceed with their ideas but wanted to back themselves up with numbers.
Stice-Lusvardi said the analysts often made things worse by rationalizing their shortcuts and formalizing them into standard procedures. Inappropriate data and faulty practices can infect decisions by both humans and by increasingly powerful artificial intelligence systems, she warned. In other words, the errors can compound over time and cause big real-world problems.
Part of the problem, Hinds and Stice-Lusvardi suspect, is that data analytics is still a young area of expertise with limited institutional authority. That makes it different from accounting, which has strict rules that can’t be easily overruled by non-experts who want to tell a particular story. The big risk is that the impact of flawed practices can snowball.
“It’s very concerning when those analyses get propagated and used elsewhere without an understanding of the data that were originally used or the compromises that were made,” Hinds said.