The Big Problem With Big Data

Cathy O’Neil (aka MathBabe) has posted another good piece on the drawbacks of using big data-driven algos to make important choices. Via Bloomberg View:

One issue is that the algorithms tend to use linear models, so they assume that more is always better, and way more is way better. This can be fine when dealing with attributes such as education or experience. Something like Facebook activity, by contrast, could have a golden mean — a reasonable amount might suggest engagement in a community, while an abundance could indicate addiction.

More important, such algorithms will tend to discriminate against attributes that, though beyond people’s control, have historically been correlated with a lack of success. A marker of poverty or race, for example, can translate into a demerit, even if the person is eminently qualified — thus reinforcing the historical pattern that the algorithm finds in the data.

This follows on an earlier post (Insurance and Big Data Are Incompatible) regarding the drawbacks of allowing health insurers to use big data-fed algos to make coverage and premium-setting decisions.

These algos are touted as impartial arbiters, free from human bias and prejudice. They’re not. They draw conclusions from properties exhibited by large groups of people – their Facebook likes, zip codes, career choices – and apply them to individuals. In the aggregate, this might work, but for the individual person seeking insurance, applicant seeking a job, inmate seeking parole or homebuyer seeking a mortgage, the outcome can be manifestly unfair and riddled with the types of biases that the systems were meant to eliminate in the first place.

via Bigger Data Isn’t Always Better Data – Bloomberg View