LEADERSHIP LIBRARY

Weapons of Math Destruction.png

Weapons of Math Destruction

Cathy O’Neil

 

IN BRIEF

O’Neil describes how big data algorithms, if not scrutinized for bias and applied to markets at large, can cause negative consequences.

Key Concepts

 

Model of a WMD model

Opacity: “The first question: Even if the participant is aware of being modeled, or what the model is used for, is the model opaque, or even invisible?” (p. 28)

Scale: “The third question is whether a model has the capacity to grow exponentially. As a statistician would put it, can it scale? This might sound like the nerdy quibble of a mathematician. But scale is what turns WMDs from local nuisances into tsunami forces, ones that define and delimit our lives.” (p. 29)

Damage: what impact does it have on the subjects of the model

WMD example: hiring tests

“Such tests now are used on 60 to 70 percent of prospective workers in the United States, up from 30 to 40 percent about five years ago, estimates Josh Bersin of the consulting firm Deloitte.” (p. 108)

“Even putting aside the issues of fairness and legality, research suggests that personality tests are poor predictors of job performance. Frank Schmidt, a business professor at the University of Iowa, analyzed a century of workplace productivity data to measure the predictive value of various selection processes. Personality tests ranked low on the scale—they were only one-third as predictive as cognitive exams, and also far below reference checks.” (p. 108)

“And as you might expect, I consider personality tests in hiring departments to be WMDs. They check all the boxes. First, they are in widespread use and have enormous impact. The Kronos exam, with all of its flaws, is scaled across much of the hiring economy. Under the previous status quo, employers no doubt had biases. But those biases varied from company to company, which might have cracked open a door somewhere for people like Kyle Behm.” (p. 111)

“Finally, consider the feedback loop that the Kronos personality test engenders. Red-lighting people with certain mental health issues prevents them from having a normal job and leading a normal life, further isolating them. This is exactly what the Americans with Disabilities Act is supposed to prevent.” (p. 112)

WMD example: hiring models

“Naturally, many hiring models attempt to calculate the likelihood that each job candidate will stick around. Evolv, Inc., now a part of Cornerstone OnDemand, helped Xerox scout out prospects for its calling center, which employs more than forty thousand people. The churn model took into account some of the metrics you might expect, including the average time people stuck around on previous jobs. But they also found some intriguing correlations. People the system classified as “creative types” tended to stay longer at the job, while those who scored high on “inquisitiveness” were more likely to set their questioning minds toward other opportunities.” (p. 118)

“But the most problematic correlation had to do with geography. Job applicants who lived farther from the job were more likely to churn. This makes sense: long commutes are a pain. But Xerox managers noticed another correlation. Many of the people suffering those long commutes were coming from poor neighborhoods. So Xerox, to its credit, removed that highly correlated churn data from its model. The company sacrificed a bit of efficiency for fairness.” (p. 119)

WMD example: Scheduling software

“Scheduling software can be seen as an extension of the just-in-time economy. But instead of lawn mower blades or cell phone screens showing up right on cue, it’s people, usually people who badly need money. And because they need money so desperately, the companies can bend their lives to the dictates of a mathematical model.” (p. 128)

Models can fail when they don’t account for how humans want to behave

“A few years ago, MIT researchers analyzed the behavior of call center employees for Bank of America to find out why some teams were more productive than others. They hung a so-called sociometric badge around each employee’s neck. The electronics in these badges tracked the employees’ location and also measured, every sixteen milliseconds, their tone of voice and gestures. It recorded when people were looking at each other and how much each person talked, listened, and interrupted. Four teams of call center employees—eighty people in total—wore these badges for six weeks.” (p. 131)

“These employees’ jobs were highly regimented. Talking was discouraged because workers were supposed to spend as many of their minutes as possible on the phone, solving customers’ problems. Coffee breaks were scheduled one by one.” (p. 132)

“The researchers found, to their surprise, that the fastest and most efficient call center team was also the most social. These employees pooh-poohed the rules and gabbed much more than the others. And when all of the employees were encouraged to socialize more, call center productivity soared.” (p. 132)

WMD example: e-scores

“Today we’re added up in every conceivable way as statisticians and mathematicians patch together a mishmash of data, from our zip codes and Internet surfing patterns to our recent purchases. Many of their pseudoscientific models attempt to predict our creditworthiness, giving each of us so-called e-scores. These numbers, which we rarely see, open doors for some of us, while slamming them in the face of others. Unlike the FICO scores they resemble, e-scores are arbitrary, unaccountable, unregulated, and often unfair—in short, they’re WMDs.” (p. 143)

“But consider the nasty feedback loop that e-scores create. There’s a very high chance that the e-scoring system will give the borrower from the rough section of East Oakland a low score. A lot of people default there. So the credit card offer popping up on her screen will be targeted to a riskier demographic. That means less available credit and higher interest rates for those who are already struggling.” (p. 144)

“Much of the predatory advertising we’ve been discussing, including the ads for payday loans and for-profit colleges, is generated through such e-scores. They’re stand-ins for credit scores. But since companies are legally prohibited from using credit scores for marketing purposes, they make do with this sloppy substitute.” (p. 144)

Modeler’s oath

“Following the market crash of 2008, two financial engineers, Emanuel Derman and Paul Wilmott, drew up such an oath. It reads: 

  • “I will remember that I didn’t make the world, and it doesn’t satisfy my equations.

  • “Though I will use models boldly to estimate value, I will not be overly impressed by mathematics.

  • “I will never sacrifice reality for elegance without explaining why I have done so.

  • “Nor will I give the people who use my model false comfort about its accuracy. Instead, I will make explicit its assumptions and oversights. 

  • “I understand that my work may have enormous effects on society and the economy, many of them beyond my comprehension.” (pp. 205-6)

Quotables

 

“Whether or not a model works is also a matter of opinion. After all, a key component of every model, whether formal or informal, is its definition of success. This is an important point that we’ll return to as we explore the dark world of WMDs. In each case, we must ask not only who designed the model but also what that person or company is trying to accomplish.” (p. 21)

“In other words, the modelers for e-scores have to make do with trying to answer the question “How have people like you behaved in the past?” when ideally they would ask, “How have you behaved in the past?”” (p. 146)

“According to a survey by the Society for Human Resource Management, nearly half of America’s employers screen potential hires by looking at their credit reports. Some of them check the credit status of current employees as well, especially when they’re up for a promotion.” (p. 148)

“At the high end of the economy, human beings tend to make the important decisions, while relying on computers as useful tools. But in the mainstream and, especially, in the lower echelons of the economy, much of the work, as we’ve seen, is automated. When mistakes appear in a dossier—and they often do—even the best-designed algorithms will make the wrong decision. As data hounds have long said: garbage in, garbage out.” (p. 150)

“In other words, how you manage money can matter more than how you drive a car. In New York State, for example, a dip in a driver’s credit rating from “excellent” to merely “good” could jack up the annual cost of insurance by $255. And in Florida, adults with clean driving records and poor credit scores paid an average of $1,552 more than the same drivers with excellent credit and a drunk driving conviction.” (p. 164)

“As is often the case with WMDs, the very same models that inflict damage could be used to humanity’s benefit. Instead of targeting people in order to manipulate them, it could line them up for help.” (p. 197)

“Of course, the human analysts, whether the principal or administrators, should consider lots of data, including the students’ test scores. They should incorporate positive feedback loops. These are the angelic cousins of the pernicious feedback loops we’ve come to know so well. A positive loop simply provides information to the data scientist (or to the automatic system) so that the model can be improved.” (p. 209)

Clients, please email to request the full notes from this book.

Leadership Library