Failing to reproduce Axelrod, 1980

Today I learned that a foundational result in game theory hasn’t been successfully reproduced, and not from lack of trying.

Effective choice in the prisoner’s dilemma by Robert Axelrod, 1980, describes the original iterated prisoner’s dilemma tournament. This was the paper that made the Tit-For-Tat strategy famous and laid the foundation for subsequent theories in the evolution of cooperation. Veritasium has a great explainer.

The original computer code for the first tournament has been lost, but Vincent Knight and collaborators have recreated a version in Python. The resulting library implements the original 15 strategies and makes it easy to re-run a version of the first tournament.

>>> import axelrod as axl
>>> players = [s() for s in axl.axelrod_first_strategies]
>>> tournament = axl.Tournament(players, turns=200, repetitions=5)
>>> results = tournament.play()

The results of this tournament are non-deterministic as some strategies have a random element to their behaviour. Axelrod re-ran the tournament 5 times to allow for this randomness.

However, when I re-ran the tournament Tit-For-Tat doesn’t win as expected. Instead it consistently ranks 3rd or 4th on successive runs. First place goes to a strategy labeled as Stein and Rapoport. I am using the same same tournament parameters (5 games of 200 iterations each) as the original paper, so I found this result surprising.

Here is the ranking of the strategies and their total scores from the Python tournament.

For comparison, here is the original ranking and average scores, extracted from Axelrod, 1980.

Grudger in the Python implementation is named FRIEDMAN in the original tournament.

Assuming that I made a mistake, I had to ask what I was doing wrong on GitHub. It turns out that its not just me.

Vincent Knight also failed to reproduce Axelrod’s first tournament and gives three possible reasons why Axelrod’s results are not reproducible:

There are differences in the Python implementation of some strategies
Axelrod reported incorrect results
There are errors in the Python implementation of the framework

I think the failure to reproduce is probably caused by differences in implementation of individual strategies. While I don’t think this undermines the original claims of Axelrod’s paper, it does show how contingent the iterated prisoner’s dilemma model is. Very small differences in implementation can cause big differences to the results and how those results are interpreted. Axelrod noted this in his paper, pointing out how a small change to a strategy such as DOWNING could have made it the overall winner.

The consistent winner of my Python tournament is Stein and Rapoport. This strategy came 6th in Axelrod’s original tournament and was never given a catchy name. So, how does this strategy work and what makes it so successful? Here is the description from the original paper:

Sixth Place with 477.8 is a fifty-line program by William STEIN of the Mathematics Department, Texas Christian University and Amnon RAPOPORT of the Department of Psychology. University of North Carolina. This rule plays tit for tat except that it cooperates on the first four moves, it defects on the last two moves, and every fifteen moves it checks to see if the opponent seems to be playing randomly. This check uses a chi-squared test of the other’s transition probabilities and also checks for alternating moves of CD and DC.

One of Axelrod’s main observations from the first tournament was that “nice” strategies are more effective. A nice strategy never defects first, apart from in the last few moves of a game. Stein and Rapoport is a nice strategy, as were the other strategies in the top 8 of the original tournament. Those same 8 strategies retain the top 8 rank in my Python tournament, so this observation holds.

Vincent Knight and colleagues did manage to reproduce Axelrod’s second tournament re-using the the original FORTRAN code for each of the competing strategies. Aside from some minor differences, the results hold and Tit-For-Tat remains the overall winner. Owen Campbell has a good talk on this project.