Imitation Game – Trondheim

The IMGAME team went to Trondheim, Norway in early November to conduct another Imitation Game on religiosity. The main aim was to find out how well non-Christian students were able to pretend to be active and practicing Christians. This experiment was the second of its kind in Trondheim and was designed as a repeat study of an experiment conducted 12 months earlier in November 2012.

During the 2012 study, we experienced a number of technical difficulties during Step1. This meant that, despite playing 24 games at Step1, only 12 Question Sets could be carried over into Step2. The remainder of the fieldwork appeared to go well, with 180 Non-Christians taking the role of Pretender during Step2 and a further 40 Christians recruited as Judges and asked to distinguish between Non-Pretenders (active and practicing Christians) and Pretenders (non-Christians). The Pass Rate obtained during the 2012 Trondheim Imitation Game was about 0.6.

This year, no technical problems were encountered and 21 of 23 Imitation Games were played at Step1 ‘survived’ the filtering process and were carried over to Step2. The 2 other Question Sets were discarded because they did not meet the minimum requirement of containing more than 3 questions. 211 non-Christian students then participated in Step2, with each answering one Question Set in the role of the Pretender, i.e. they answered the questions set by Christian Judges as if they were active and practicing Christians too. For Step4, 55 active and practising Christian students were recruited as Judges and each Judge was given about 8 different Dialogue Sets to look at and judge.

The provisional Pass Rate obtained during the 2013 Trondheim Imitation Game (2 Judges still need to return their verdicts) is 0.4. This means that compared to 2012, non-Christian students were less able to pretend successfully to be active and practicing Christians. It also means that, unlike in Helsinki, the goal of repeating the 2012 by obtaining a similar result experiment result was not reached.

But what does this failure to replicate last year’s result mean in the context of Imitation Games in Trondheim or for Imitation Games as a new method? The short answer is that it is too early to tell…

The longer answer goes as follows: The Experimenter’s Regress, which is a well-known concept in Science and Technology Studies, describes the problem that arises when genuinely novel research is being carried out. In essence, the argument is that, in order to know whether or not an experiment has worked you already need to know what the correct outcome is. Given that the correct outcome is unknown without performing the experiment correctly, the experimenter’s regress leads to a potentially infinite loop of dependence between theory and evidence.

With regard to the Trondheim results, the experimenter’s regress applies because, in order to know which of the 2012 and 2013 attempts to measure the Pass Rate was (most) successful, one needs to know what the ‘real’ Pass Rate should be. But to know the real Pass Rate, you have to conduct an well-run Imitation Game. But to know whether the result of this Imitation Game is ‘correct’  you need to know the ‘real’ Pass Rate. And so on, ad infinitum…

Given this, what we can conclude so far is the following: First, we do not know whether the 2012 or the 2013 Pass Rates are ‘true’ reflections of reality or, indeed, if both are correct because the underlying population parameter has itself changed in the intervening 12 months! Second, both results are a long way from the near 100 per cent Pass Rates we have observed in more religious countries, so the big picture remains consistent, even if the details are not entirely fixed. Thirdly, we are gaining a better understanding of the difficulties of carrying out Imitation Game research and reducing the sources of noise that tend to drive up the Pass Rate.

In the case of Trondheim, however, we can only deal with the data we have and see how analysing the data differently affects the results. For example, if we exclude from the analysis any Dialog Set in which the weighted average of the Pass Rate for both sets of Judges is more than 100 per cent, then the results agree: the aggregate Pass Rates for 2012 drops to 0.4, while the 2013 figure remains unchanged. The rationale for doing this would be that, for the Pass Rate to be more than 100 per cent, the Pretenders would have to be, on average, more plausible than the Non-Pretender. This could be interpreted as prima facie evidence of a ‘noisey’ Question Set.

Of course, this decision is somewhat arbitrary. For example, if we exclude any question set in which EITHER set of Judges returned a Pass Rate of over 100 per cent, then some differences re-emerge — the aggregate Pass Rates remains at 0.39 for 2012 and but the exclusion of two Question Sets from 2013 pushes the Pass Rate down to 0.34. Nevertheless, the difference (approx 0.05) remains much less than those reported in the overall headline figure.

What this suggests is that, if we could find more reliable ways of running Step1 and hence preventing some of these low-quality Question Sets being generated in the first place, we might be able to generate more reliable results without the need for retrospective filtering and stratification. Until then, however, it is impossible to know whether the discrepancy between the 2012 and the 2013 results really means that a repetition has failed or just that the technical difficulties encountered in 2012 reduced the quality of play at Step1 and therefore artificially increased the Pass Rate. The fact that there were a higher proportion of Question Sets with Pass Rates in excess of 100 per cent in 2012 is certainly consistent with this hypothesis. In other words, even if neither result is perfect, it seems likely that last year’s experiment is the more imperfect of the two, and that an overall Pass Rate  0.4 is closer to the ‘correct’ result.


No comments.

Leave a Reply

Your email address will not be published.