In 2019 we published Exploring the impact of analysis software on task fMRI results in Human Brain Mapping, showing how the choice of software package used for analyzing task fMRI data can influence the final results of a study. We reanalyzed data from three published task fMRI studies: Schonberg et al., 2012 (referred to as ‘ds000001’ in the article); Moran et al., 2012 (‘ds000109’); Padmanabhan et al., 2011 (‘ds000120’); reproducing the main group-level contrast maps from each publication within the three main fMRI software packages: AFNI, FSL, and SPM. We then applied a variety of quantitative and qualitative comparison methods to assess the similarity of the statistical maps between the three software packages.
We have recently become aware that five out of the fourteen analysis results used in the article contained errors.
The first two of these erroneous results were our AFNI nonparametric reanalyses of the Schonberg et al., 2012 (ds000001) and Moran et al., 2012 (ds000109) datasets. In both cases, the wrong sub-bricks of the 4D subject-level results files had been specified in the group-level permutation test model, meaning that permutation tests were carried out on the subject-level statistic images rather than the subject-level parameter estimate images as was intended. A similar problem was also found for our AFNI parametric analysis of the Padmanabhan et al., 2011 (ds000120), where again, the subject-level statistic images were wrongly entered into the group-level mixed-effects analysis rather than the intended parameter estimate images. Corrections to these three sets of results have led to minor changes in the quantitative comparisons that were originally reported. Most notably, the within-software Dice coefficients comparing the thresholded parametric and nonparametric results for AFNI are now slightly worse in light of these new results; for example, the AFNI parametric/nonparametric dice coefficient for ds000001 has decreased from 0.833 to 0.700, and the corresponding dice coefficient for ds000109 has decreased from 0.899 to 0.819.
The final two results were our FSL parametric and nonparametric reanalyses of the Moran et al., 2012 (ds000109) dataset. In both cases, the linear model contrast had been incorrectly specified; the FSL parametric and permutation results in the article were for the False Photo Question vs False Belief Question contrast rather than the intended False Photo Story vs False Belief Story contrast used in the original publication. Correcting these analyses has led to notable improvements in the ds000109 AFNI-FSL and FSL-SPM inter-software comparisons, in the form of higher correlations between the unthresholded statistical maps and larger Dice coefficients for comparisons of the thresholded statistical maps. Specifically, correlations now range from 0.429 to 0.896 (was 0.429 to 0.747), and Dice coefficients range from 0 to 0.769 (was 0 to 0.684) for between‐software comparisons.
In response to finding these errors, alongside reanalyzing the datasets we have produced an erratum in HBM (https://doi.org/10.1002/hbm.25302) and uploaded a new preprint to our OSF repository that integrates the corrected results: https://osf.io/7ze9w/. In addition to this, we have created a new release on the Github repository connected to this project that includes all the code for the corrected analyses (Release Software_Comparison_0.9.0, https://github.com/NISOx-BDI/Software_Comparison/releases/tag/0.9.0), and made new Neurovault repositories containing the corrected statistical maps (ds000001: https://neurovault.org/collections/8447/, ds000109: https://neurovault.org/collections/7782/, ds000120: https://neurovault.org/collections/8468/). We have also updated the Jupyter Notebooks originally provided with this work so that users can recreate the amended figures obtained with these new results.
We apologize for any inconvenience this error has caused. The management of the model specifications across the three software packages and three studies used in this investigation required extensive scripting. While we take full responsibility for this mistake, it emphasises the need for a software-independent approach to model specification, and in this regard, we would like to highlight the BIDS Extension Proposal 002 (BEP002): The BIDS Stats-Models Specification as one possible way forward. A standardised way to represent fMRI models across neuroimaging software would reduce the risk of analysis configuration errors and facilitate comparative and multiverse analyses like this one.
Alex Bowring, Camille Maumet, Thomas E. Nichols