APS Workshop on Involving Undergraduates in Replications

I had the pleasure of presenting with Geoff Cumming, John Grahe, and Fiona Fidler at this year’s APS meeting on the topic of involving students in replication projects (also, our discussant was Rebecca Saxe, who was terrific).

For my presentation I tried to collect together what I’ve learned from supervising student replication efforts. I especially tried to emphasize the benefits of using positive controls in psychology research to help make replication data (or any data for that matter) more interpretable.

In developing the talk it dawned on me that it would be useful to collect materials together to walk students through the process of developing a replication project. So I created a project page on the Open Science Framework where I’ve put together a bunch of resources for selecting projects, developing materials, including positive controls, etc. It’s all available on the OSF here: https://osf.io/jx2td/

APS Presentations

APS was in Chicago this year, so the replicators I have been supervising were out in full force.

Clinton Sanchez presented his replications of a study claiming that analytic thinking promotes religious disbelief. [cite source=’doi’]10.1126/science.1215647[/cite]. His manuscript is having a rough time, but we’re hoping it will be out soon. Clinton is now in a MA program in Clinical Counseling at DePaul. Data from his project is here: https://osf.io/qc6rh/

Elle Lehmann presented a poster of her replications of a studies showing that red enhances perceived attractiveness of men rating women [cite source=’doi’]10.1037/0022-3514.95.5.1150[/cite] and women rating men [cite source=’doi’]10.1037/a0019689[/cite] . Elle’s paper is in submission–she found little to no effect for either gender. She’s now working on a meta-anlaysis which has become quite a project, but really interesting. She has graduated and will be applying for a Fullbright in the fall. Data from here project is here: https://osf.io/j3fyq/

Last but not least Eileen Moery presented a poster of her replications of a study which claimed that organic food makes you morally judgemental [cite source=’doi’]10.1177/1948550612447114[/cite]. Eileen’s studies were recently published [cite source=’doi’]10.1177/1948550616639649[/cite]. She found little to no effect of organic food exposure on moral judgements. She’s starting an MA program in clinical psych at IIT in the fall!. Data from here project is here: https://osf.io/atkn7/

Photos came out a bit blurry (new phone, but crappy camera!).

Elle and me at her poster.
Elle and me at her poster.
Clinton and Elle at his poster.
Clinton and Elle at his poster.

Pre-Order Now: Introduction to the New Statistics

Can a statistics textbook change the world? Maybe yes! At least that’s the aspiration behind Introduction to the New Statistics: Estimation, Open Science and Beyond, a new textbook by Geoff Cumming and yours truly [cite source=’isbn’]978-1138825529[/cite]ITNS

How can a statistics textbook change the world? By teaching the estimation approach to data analysis–one that emphasizes confidence intervals, replication, and meta-analysis. The estimation approach is vast improvement over Null Hypothesis Significance Testing. Students find estimation easier to learn and it supports better inference. Hopefully, this book will be an important step in the ongoing battle to abolish p values.

Switching to estimation is not enough, though. Students of research also need to learn the new Open Science practices that are evolving to enhance research rigor: pre-registration, Open Data, and Open Materials. So the textbook is also the first to teach these essential practices from the start.

To sum it up, Geoff and I believe that this is a unique book in a large sea of statistics texts–one which we hope will inspire the next generation of researchers who will be raised from the start on good statistical and methodological practices. Perhaps the replication crisis will fade into the

The book is now available for pre-order on Amazon. It should be published by August, 2016–which is running just a bit late for fall adoptions. If you’d like a desk copy, shoot me an email or leave a comment.

Rest easy — organic food probably does not make you into a jerk…

My student Eileen Moery and I have a new paper out today in Social Psychology and Personality Science. It’s a replication paper that I’m quite proud of [cite source=’doi’]10.1177/1948550616639649[/cite]. It represents some evolution in how I’m supervising replication projects.

The new paper replicates a study purporting to show that being exposed to images of organic food produces a strong decrease in prosocial behavior and a strong up-tick in being morally judgmental [cite source=’doi’]10.1177/1948550612447114[/cite]. This is a potentially fascinating phenomenon–something like ‘moral licensing’, the ironic effect of good behavior fostering subsequent bad behavior.

The original paper caught fire and the media covered these findings extensively. Rush Limbaugh even crowed about them as evidence of liberal hypocrisy. I noticed the media coverage, and this is how the original study made it onto my ‘possible replication’ list. Eileen found it there, read the paper, and developed a fantastic honors project to put the initial study to the test.

For her project, Eileen contacted the original author to obtain the original materials. She planned and executed a large pre-registered replication attempt. She included a positive control (Retrospective Gambler’s task) so that if the main study ‘failed’ we would have a way to check if it was somehow her fault. She also devised a nice memory manipulation check to be sure that participants were attending to the study materials. She conducted the study and found little to no impact of organic food exposure on moral reasoning and little to no impact on prosocial behavior. She did find the expected outcome on the positive control, though–so sorry, doubters, this was not an example of researcher incompetence.

One of the things I don’t like about the current replication craze is the obsessive emphasis on sample size (this paper is not helping: [cite source=’doi’]10.1177/0956797614567341[/cite]). Sure, it’s important to have good power to detect the effect of interest. But power is not the only reason a study can fail. And meta-analysis allows multiple low-power studies to be combined. So why be so darned focused on the informativeness of a single study? The key, it seems to me, is not to put all your eggs in one basket but rather to conduct a series of replications–trying different conditions, participant pools, etc. The pattern of effects across multiple smaller studies is, to my mind, far more informative than the effect found in a single but much larger study. I’m talking about you, verbal overshadowing [cite source=’doi’]10.1177/1745691614545653[/cite]

Anyways, based on this philsophy, Eileen didn’t stop with 1 study. She conducted another larger study using Mechanical Turk. There are lots of legitimate concerns about MTurk, so we used the quality controls developed in Meg Cusack’s project [cite source=’doi’]10.1371/journal.pone.0140806 [/cite]–screening out participants who don’t speak English natively, who take way too long or too short of a time to complete the study, etc. Despite all this care (and another successful positive control), Eileen still found that organic food produced about 0 change in moral judgments and prosocial behavior.

Still not finished, Eileen obtained permission to conduct her study at an organic food market in Oak Park. Her and I spent two very hot Saturday mornings measuring moral judgments in those arriving at or leaving from the market. We reasoned those leaving from had just bought organic food and should feel much more smug than those merely arriving or passing by. Yes, there are some problems of making this assumption–but again, it was the overall pattern across multiple studies we cared about. And the pattern was once again consistent but disappointing–only a very small difference in the expected direction.

Although Eileen and I were ready to call it quits at this point, our reviewers did not agree. They asked for one additional study with a regular participant pool. Eileen had graduated already, but I rolled up my sleeves and got it done. Fourth time, though, was not the charm–again there was little to no effect of organic food exposure.

With all that said and done, Eileen and I conducted a final meta-anlysis integrating our results. The journal would not actually allow us to report on the field study (too different!?), but across the other three studies we found that organic food exposure has little to no effect on moral judgments (d = 0.06, 95% CI [0.14, 0.26],N=377) and prosocial behavior (d=0.03, 95% CI [?0.17, 0.23],N=377).

So–what’s our major contribution to science? Well, I suppose we have now dispelled what in retrospect is a somewhat silly notion that organic food exposure could have a substantial impact on moral behavior. We are also contributing to the ongoing meta-science examining the reliability of our published research literature–it gives me no joy to say that this ongoing work is largely painting a relatively bleak picture. Finally, I hope that we have now gained enough experience with replication work to be (modestly) showing the way a bit. I hope the practices that are now becoming routine for my honors students (pre-registration, multiple studies, positive controls, careful quality controls, and synthesis through meta-analysis) will become routine in the rest of replication land. No, strike that–these are practices that should really be routine in psychology. Holding my breath.

Oh – an one other important thing about this paper–it was published in the same journal that published the original study. I think that’s exactly as it should be (journals should have to eat their own dog food). Obviously, though, this is exceptionally rare. I think it was quite daring for the journal to have published this replication, and I hope the good behavior of its editors are a model for others and a sign that things really are changing for the better.

What’s the best way to teach methods and statistics?

No easy answer, but I’m co-author on a new paper that has some hints [cite source=’doi’]10.1177/0098628315573139[/cite]. Well, actually, it’s just a summary of some assessment data my department has been collecting to help evaluate an integrated and inquiry-based approach to teaching methods and stats. We provide lots of opportunities for apprenticeship, with student completing independent correlational and experimental projects across a 2-semester sequence. Currently, we’re using Nolan & Heinzen as a stats text, but I’m hoping that we’ll be using Cumming & Calin-Jageman by 2017.

Pliske, R. M., Caldwell, T. L., Calin-jageman, R. J., & Taylor-ritzler, T. (2015). Demonstrating the Effectiveness of an Integrated and Intensive Research Methods and Statistics Course Sequence. doi:10.1177/0098628315573139

New Publication on ERIN, Educational Resources in Neuroscience

Just after I started at Dominican in 2007 I had the good fortune to team up with Richard Olivo to help in the development of ERIN, an online curated database of educational resources for neuroscience. It was a long process, but ERIN has now been online for the past three years serving up great resources to faculty prepping their activities for neuroscience courses. You can take a spin for yourself at http://erin.sfn.org/.

To cap off our efforts with ERIN, Richard took the leading in writing up an article for JUNE, the Journal of Neuroscience Education. You can find it online here: [cite source=’pubmed’]26240519[/cite].

Qualtrics Tips – A HTML5 and JavaScript Word-Search Task

Update – all the code is now posted to github:

This includes a sample qualtrics template you can import to get you started. Here’s the sample qualtrics survey where you can try the task for yourself:


Here’s a second experimental task I wrote for use with online social psychology experiments. This one is a word search. Again, the code is a series of kludges cobbled together from examples I could findwith Google. But it works out pretty well as a task you can embed in a Qualtrics survey. You can define the grid and the word list as you’d like, and you can have Qualtrics pass parameters that specify which grid and words list to use for a specific participant. It looks nice, and I’ve had good success using this online.

This particular grid is a control grid is a sample I adopted from a book of word searches. It’s a really tough one due to the size of the grid. If you want to see what finding a word looks like without, you know, actually finding a word: “Tradition” starts in the 8th letter of the bottom row.

I have a couple of papers I’m working on that use this task. When either goes to press, I’ll post the reference here for citation. The first one is finally in press, [cite source=’doi’]10.1371/journal.pone.0140806[/cite] with three student co-authors:

Cusack, M., Vezenkova, N., Gottschalk, C., & Calin-Jageman, R. J. (2015). Direct and Conceptual Replications of Burgmer & Englich (2012): Power May Have Little to No Effect on Motor Performance. PLOS ONE, 10(11), e0140806. doi:10.1371/journal.pone.0140806, http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0140806

Email me if you’d like the source code, or scrape it directly from this page. The code is a hot mess, but it should be enough to get you started.

Qualtrics Tips – A HTML5 and JavaScript Mirror-Tracing Task


— I’ve now organized the code for this task and posted it to GitHub:

And you can try the demo out here:


In GitHub, you’ll find a directory with qualtrics templates. These have been updated and tested; all seems to be working ok with them. Try importing these to qualtrics, and then adapt from there.

Within the templates I posted some explanations about how the code works and what to change to adapt to your purposes. Hope it all makes sense. Basically:


Social relationships can confer power on some to control the fates of others. There is a large and growing body of social-psychology research examining the psychological effects of power, with findings documenting profound changes in cognition, motivation, moral reasoning, and more.

One intriguing finding is that having social power can actually improve motor skills. Specifically, Bugmer & Englich (2012) have shown that manipulating power substantially improves power at mini-golf and darts [cite source=’doi’]10.1177/1948550612452014[/cite].

I’ve been working over the past two years to replicate this finding. While it was easy enough to replicate mini-golf with real, live participants, I wanted to test the study online, with a much larger and more diverse pool of participants. But how to measure motor skill online? My solution was to develop an online version of the classic mirror-tracing task, using HTML 5 and Javascript.

A paper describing the results is finally in published [cite source=’doi’]10.1371/journal.pone.0140806[/cite] with three student co-authors:

Cusack, M., Vezenkova, N., Gottschalk, C., & Calin-Jageman, R. J. (2015). Direct and Conceptual Replications of Burgmer & Englich (2012): Power May Have Little to No Effect on Motor Performance. PLOS ONE, 10(11), e0140806. doi:10.1371/journal.pone.0140806, http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0140806

Below you can see a sample in action. The top box is the mirror–the bottom box is the drawing pad. Move the mouse into the green target in the drawing pad–this starts the trial. From this point on, you mouse will leave a mirrored trail in the mirror-box. Try to trace to the red target while staying in the line of the figure in the mirror. When you are within the line, the trail will be red–go outside the line and the trail is blue. You’ll see your score as a % of time within the line just below the drawing pad. The trial ends automatically when you reach the green target. Note that this WordPress theme uses a bunch of white-space—you may need to increase or decrease the zoom on your browser to get both boxes to show within your screen without scrolling

I kludged this task together even though I’ve never previously written javascript before. I relied heavily on examples found via Google. The final produce works, but could really use a code review both a) to clean up all the horrible on-the-fly decisions I made in getting this to work, and b) to credit the sources I used in taping this together.  Still, there are some cool features to this task:

  • The final drawing is saved back to a server as an image, complete with score–this allows you to visually inspect performance to get a feel for how different participants are scoring.
  • The scoring is fully automated.
  • The script runs fine within a Qualtrics survey, and Qualtrics can read the scores back from the script.
  • As expected, I found a consistent negative correlation between performance and age–this indicates the validity of the task and provides a useful covariate for reducing within-subjects noise.
  • As expected, performance improves from first to second-trial.
  • With this task, one can analyze overall performance and performance change across trials (though fatigue seems to set in within a few closely-spaced trials for most participants)
  • Difficulty can be varied by changing line thickness.
  • Any arbitrary line tracing can be uploaded–I matched all mine for number of turns and total line length.
  • It’s a fun task!

If you’d like to adopt this for your own studies, please shoot me an email and I’ll be happy to send you the code and some instructions for how to use it (or you could probably just rip it off directly from this page).  When (if) I get the paper using this task published, I’ll post the reference for citation.

Mirror Tracing Demo