Blog Posts

How one paleo-participant can change the outcome of a study

1/6/2015

Lewandowsky, Gignac, and Oberauer (2013) authored "The Role of Conspiracist Ideation and Worldviews in Predicting Rejection of Science" in PLOS ONE. (Paper here.)

This study has many of the same features as their Psychological Science scam. In that study, they falsely linked belief in the moon-landing hoax to climate skepticism when in fact only three participants out of 1145 held both of those beliefs, and over 90% of climate skeptics in their sample rejected the moon-landing hoax.

In the PLOS ONE study, we see the same broken conspiracy items, e.g. the New World Order item erroneously refers to the NWO as a group, the JFK item doesn't describe much of a conspiracy, the free market items are written from a leftist perspective, using proprietary leftist terminology. The validity of this study would be in doubt regardless of the results.

A much more serious problem, however, is that there is bad data in the sample. Most consequentially, there is a 32,757-year-old, a veritable paleo-participant. (Data here.)

There are also seven minors, including a 5-year-old and two 14-year-olds, two 15-year-olds, and two others.

They were alerted to the presence of the minors and the paleo-participant over a year ago, and did nothing.

This would be a serious problem in any context. We cannot have minors or paleo-participants in our data, in the data we use for analyses, claims, and journal articles. It's even more serious given that the authors analyzed the age variable, and reported its effects. They state in their paper:

--- "Age turned out not to correlate with any of the indicator variables."

This is grossly false. It can only be made true if we include the fake data. If we remove the fake data, especially the 32,757-year-old, age correlates with most of their variables. It correlates with six of their nine conspiracy items, and with their "conspiracist ideation" combined index. It also correlates with views of vaccines – a major variable in their study. See the graph below.

See the full Plotly graph here. (By "fake ages", I mean that the 32,757 age is presumably fake, and I would assume the 5-year-old is not in fact a precocious 5-year-old who somehow got through the uSamp.com / Qualtrics participant pool that the authors used. As we get into the 14- and 15-year-olds in the sample, it's easier to imagine these might be true ages, and I think we become very concerned about the possibility of actual minors in the sample.)

(As noted in the graph, all the correlations between age and conspiracy theories were negative, perhaps contrary to common stereotypes.)

It's highly unusual to have out-of-range ages, especially five-digit ages, in survey data obtained electronically. Any of the online survey systems we use will validate the age field for us. That is, they won't accept an invalid age. No one should be able to say that they're 5 years old, or 32,757, and proceed to participate in an IRB-approved psychology study. The authors apparently used Qualtrics. I use it all the time. When building a survey, you customarily set the age validation right there on the side panel, like so:

What's even more concerning is that the authors reported the median age (43), and even the quartiles, in the paper. And as noted above, they analyzed age in relation to other variables. It's difficult to understand how any researcher would know the median age, quartiles, and correlations with other variables, but not encounter the mean age, which was 76. That mean would immediately set off alarms for any researcher using normal population samples. Any statistical software is going to show the mean as part of a standard set of descriptive statistics. (The mean after removing the fakes and minors, is 43.)

It's also hard to imagine how they did not notice the 5-year-old, or the 32,757-year-old, which is the outlier responsible for the inflated mean age. Min and max values are given by default in most descriptive statistics outputs.

That one data point – the paleo-participant – is almost single-handedly responsible for knocking out all the correlations between age and so many other variables. If you just remove the paleo-participant, leaving the minors in the data, age lights up as a correlate across the board. Further removing the kids will strengthen the correlations.

What concerns me the most is that these researchers were alerted that their data was bad on October 4, 2013 and did nothing about it. A commenter posted directly on Lewandowsky's webpage where he had announced the paper, a mere two days after the announcement:

"Additional problems exist as well. For example, one respondent claims an age of 32,757 years, and another claims an age of 5. Do you believe this data set should be used as is, despite these obvious problems?"

Almost a year later, on August 18, 2014, I posted a comment directly the PLOS ONE page for their paper, and noted the bogus age data¹. They've known for a very long time that there is a 32,757-year-old in their data, along with a 5-year-old, two 14-year-olds, two 15-year-olds and two other minors, and they've known that they reported analyses on the age variable in their study. They did nothing.

I think it's safe to assume that they've known for quite some time that the above-mentioned claim is completely false: "Age turned out not to correlate with any of the indicator variables."

A 32,757-year-old will grossly inflate the mean and corrupt the deviation scores and SD – any trained researcher would know that this could severely impact any correlation analyses.

Some of their other effects seem to hold, but the coefficients are smaller controlling for age. However, I would not take any of their findings seriously given that:

Too many of their items are of very low psychometric quality. They're often vague, double-barrelled, and politically biased, e.g. "The free market system may be efficient for resource allocation but it is limited in its capacity to promote social justice." (R) Social justice is a term of art of the left – no one else uses that term. It denotes a contemporary left/liberal conception of justice, focused on issues like income equality, the welfare of minorities, etc. Conservatives, libertarians, moderates, et al. will have different conceptions of justice, focused on different concerns, and more importantly, they don't use that term and I would not be confident in their interpretations of it, or what their responses signify.

Their composite variables are somewhat arbitrary and don't survive factor analysis (e.g. there is an environmentalism factor tucked away in their "free market" items, and its relationship to some of the science variables is unflattering, and of course, unreported.)
They deleted hundreds of participants – a full 28% of their data – including anyone who did not answer every single question. No one is obligated to answer every question, nor will they necessarily have opinions on bizarre theories they've never heard of, or on lots of things. I'm completely unfamiliar with the practice of deleting large swaths of data.
They've known for more than a year that there was a 32,757-year-old and seven minors in their data, and did nothing. This suggests a somewhat broad lack of concern about data quality and truth. I think it would be overventuresome to be interested in the rest of their data, their claimed effects, and so forth. We have much higher-quality data on conspiracy beliefs from professionals at places like Gallup.

How one paleo-participant can change the outcome of a study

New Yorker article

Heritability of academic prowess

Lewis and Curry

Your mileage may vary

No results

Classifying words during sleep

The art of evasion

Harms

Medical records hacked

José L. Duarte

Archives

Categories