Algorithms hold a pivotal and particularly mysterious place in public discussions around data. We speak of Google’s and Facebook’s algorithms as wizards’ spells, cryptic things that we couldn’t possibly understand. Algorithmic bias is raised in almost every data discussion, in classrooms and congressional hearings, as if all of us have some kind of shared definition of what an algorithm is and just exactly how it might be biased.
Computers run by executing sets of instructions. An algorithm is such a set of instructions, in which a series of tasks are repeated until some particular condition is matched. There are all kinds of algorithms, written for all kinds of purposes, but they are most commonly used for programming tasks like sorting and classification. These tasks are well suited to the algorithm’s do/until mentality: Sort these numbers until they are in ascending order. Classify these photographs until they fall neatly into categories. Sort these prisoners by risk of re-offense. Classify these job applicants as “hire” or “do not hire.”
A neural network is not an algorithm itself, because, when activated, it runs only once. It has the “do” but not the “until.” Neural nets are almost always, though, paired with algorithms that train the network, improving its performance over millions or billions of generations. To do this, the algorithm uses a training set—a group of data for which the programmer knows how the neural network should behave—and at each generation of training the network gets a score for how well it’s doing. The algorithm trains and retrains the network, rolling down a gradient of success, until the network passes a threshold, after which training is finished and the network can be used for whatever classification task it was designed for.