This is interesting. Lewis and Curry estimate equilibrium climate sensitivity (ECS) at 1.64°C. That's remarkably low. If it turns out to be true – and it seems like we won't know the true value for ECS for another decade or longer – then AGW would have been largely overstated. The estimates constantly change, which is frustrating.
I can't evaluate their methods. I'm curious to read any critiques by other climate scientists.
ECS is the long-term change in temperature from a doubling of atmospheric CO2. It's arguably the most important fact or estimate in the AGW issue. Ultimately, this is what anthropogenic warming is about. I think the baseline is the pre-industrial level, for which I've seen different estimates. I think it was 280 ppm. Now we're around 400 ppm. A doubling of the baseline would be 560 ppm. I have no idea when we're supposed to hit that.
One of the things that surprises me about AGW skeptics and lukewarmers is that they don't seem to worry about the future consequences to themselves if they're wrong. Most of them aren't elderly, so they'll likely be around for decades. If AGW turns out to be a serious problem, I expect skeptics to be pilloried. So it's ballsy to be a skeptic – I won't be surprised by violence against them in 2050 if warming becomes a problem.
It's also ballsy to be a climate scientist who offers low estimates of ECS. We haven't yet built a rational civilization when it comes to science and politics. An honest scientist who offers low estimate of ECS which turn out to be wrong will be savagely attacked. They'll be savagely attacked even before we discover whether they're right or wrong. So I think it's remarkably brave of them – being wrong on this issue is so risky, not just to themselves, but to the world. It would weigh heavily on me if I were a climate scientist. I'd have to quadruple-check the data, the analyses, and think very deeply about the validity of the methods and framework. I'd be terrified to offer estimates of ECS. It would feel like so much responsibility. At the high end, AGW could do serious harm to lots of people (well, they'd be able to see it coming decades in advance, so that might reduce the harm)
The fact that the estimates always change would give me pause, because it would imply that my estimates would change, or at least be replaced by others' future estimates using new and better methods, where the "future" could be next year. The epistemology or meta-epistemology of ECS looks very tough from the outside looking in – if the estimates are always changing, what does this mean about the methods used to generate them? Or the nature of our knowledge of climate circa 1990 - 2014? Thankfully, they've tended to go down, like from IPCC4 to IPCC5, or at least the low end of the range has gone down. It's odd. We don't keep revising Planck's Constant or our estimate of the acceleration of gravity on earth, though admittedly these aren't the best analogies. It's just tough to know what to do with ECS estimates, because we know they always change. That's the one thing we know about them. This implies that we should expect them to change in the future, until we know something that tells us to stop expecting that.
This is great work:
"As expected, across these experiments, the researchers found that the more people smiled, the happier they reported being. But only some people. Surprisingly, for a section of the population, smiling actually reduced well-being. The more these people smiled, the less happy they were. This is like finding that there are some diners who, after consuming a four-course meal, feel less full!"
It makes perfect sense to me, and it's related to what I told The Chronicle of Higher Education a while back, that some people don't dispositionally approach happiness the same way. This finding is more at the implicit and affective level, but I wouldn't be surprised if it corresponded to a different broad disposition toward happiness. For example these might be high-meaning people, less affectively driven.
It also something I plan to add to my Media Tips page: Many main effects are misleading. Some main effects are false for the majority of the population, meaning that an effect of the form X causes Y will often only be true for a minority of people. That's not the case here, but we still see a significant population for whom a famous effect does not hold, and in fact is reversed. Significant correlations between X and Y do not require that X predict Y in the majority of cases, or that being high in X (assuming a continuous variable now, like a personality trait) means one is likely to be high in Y. A significant correlation does not mean that at all. It can be driven by 10% of the sample.
I've developed some new statistics, new coefficients, to help clarify what a correlation actually represents in many cases. These will be detailed in some upcoming journal articles, hopefully. It would apply nicely to the example above, and the pencil-in-the-mouth experiments.
A colleague perused the the Cook rater forums and sent me this example:
"I got one last night, though, that I thought was funny - I rated it as "Explicit >50%", because it *was* attributing all the warming to human GHG emissions. It just claimed there wouldn't *be* much warming, asserting that the IPCC had grossly over-estimated climate sensitivity & feedbacks. I had to look it up to see who the authors were - Michaels & Knappenberger. I laughed out loud when I found that out. :-D"
Like I've said, we'll never lack for fraud examples here. They claimed in their paper that raters were blind to authors, which they'd have to be to conduct valid subjective ratings, but they weren't. They broke blindness all the time, even divulging the authors to each other in the forums, mocking them, e-mailing entire papers, etc. I'm not sure everyone knew they were supposed to blind to authors (and journal), or that the paper would subsequently claim that they were, or the crucial importance of independent ratings. They didn't even know what interrater reliability was. And of course the substantive nature of the above rater's decision – and of the coding scheme that empowered him – hits on other invalidity issues.
I stopped going through the discussions a long time ago, because the initial evidence was clear and I'd rather just do my own research. I've repeatedly said "there's much more" and I meant it – I was confused by people who took me to be presenting an exhaustive set of evidence, or tried to recalculate the 97% based on removing the handful of psychology and survey papers I listed. I supplied a batch of evidence that I thought was more than sufficient and expected the journal or IOP to do the rest – the burden of validating the ratings lies entirely with the authors, not anyone else, but that's not possible at this point.
A broader a point I'd like to briefly expand: There are no results. I'm getting other e-mails about them claiming the "results don't change" from the fraud or something like that – the e-mailers understand that this is absurd, but I think my report was too long, and some people might not read the whole thing.
There aren't any results that could change, because there were never any results to begin with. There is no 97%. Or 99% or 82% or 10%. There's no percent, no numbers to evaluate. Normally we'd vacate the paper for fraud, which I assume will ultimately happen here, unless fraud is defined differently at ERL. Even absent fraud, there is no procedure known to science that would enable us to generate meaningful results here. Taking any number from their study is just roulette – the numbers won't be connected to any underlying reality.
The results of a scientific study only have meaning by reference to the methods by which they were obtained. Results cannot be divorced from methods. They falsely described their methods, on almost every substantive feature. This means we no longer have results to speak of – we don't know what we have, because we don't know how the results were obtained. We have a vague sense, and we know that raters were never blind or independent, and that such features could not even be enforced given their at-home procedure and bizarre online forum. We know that we could never do anything with a subjective rating study that employed raters who had a profound conflict of interest with respect to the results. This was a subjective human rater study – the results were entirely in the hands of the raters, like the one we see up top, who were political activists rating abstracts on a dimension that would serve their political aims. We'd have to ignore decades of scientific research on cognitive biases, motivated reasoning, implicit and non-conscious biases, and our basic grasp of conflict-of-interest in order to take this seriously. I think a lot of people would assume this was a prank, maybe like Sokal's – a test to see whether a scientific journal would publish a consensus study based on lay political activists divining the meaning of scientific abstracts for their cause. No one has ever done this.
There are no results here for several other reasons. If you run a casual search on "global warming" and just start counting papers, deciding whether they endorse AGW, even implicitly, that's not anything. It's not any kind of measure of a consensus, not by anyone's definition. Especially if you're including a bunch of social science, polls, mitigation papers, psychology, etc. (They said they excluded social science and polls. They're remarkably casual about the importance of the truthfulness of the methods section of a scientific paper. Scientists would not be nearly so casual.) And we can't calculate interrater reliability here because they broke independence, and their ratings are contaminated by other raters' feedback. A subjective rating study that can't generate interrater reliability estimates no longer has usable data. The lack of meaningful results here, the lack of validity, is multifaceted and impossible to resolve.
One reason this isn't a measure of consensus is that you're giving some people far more votes than others. Each paper is a vote by this method. There are many obvious biases and weights this will impose, even assuming a neutral or valid search. Old men will get a lot more votes than others, and there will be many other large issues that real researchers would think through and account for.
But the search is invalid too – it gives unacceptable results and was apparently never validated or tested. We could never go with a search that excludes everything published by Dick Lindzen since 1997. You just took away dozens of his votes, for unknown, arbitrary reasons. That discovery would stop researchers in their tracks – they would test and dig into the different results of different searches, and figure out what was happening, and what terms were needed to sweep up all the relevant papers. A study of consensus based on a literature search obviously depends critically on the validity and completeness of that search – if the search is bad, it's over. Knowing what we know about this study and the search, there's nothing we can do with the data. Humans have an anchoring bias, and there's a small epistemological dysfunction in some quarters of science where people think that because something is published, or because it was peer-reviewed, it must be somewhat valid or have some kind of legitimacy (social psychologists call this "social proof".) That's not a survivable claim, and it's important to not let those sorts of biases affect your judgment of published studies. There's nothing we can do with this study.
Someone would have to reconduct the study, using valid methods and avoiding fraud, to generate any numbers we could use. There's just no way to get meaningful numbers in this case. And a reconducted study would have to be a very different study -- it wouldn't actually be a repeat of this design. The set of papers would be very different, given a validated and rigorous search procedure. It would have none of the invalid stuff, would include a lot of climate science papers that they excluded. I hope that illustrates my central point -- there's no way to generate results from this study, because the underlying data is invalid in multiple critical ways. Taking a number from the Cook study -- any number -- is just rolling dice. The number won't have any connection to the reality the study was supposed to measure.
It's not clear that we can study a consensus by counting papers. We certainly can't just count papers without any weights, even if we have a valid search. Before we get there, we'd need some kind of theory of consensus, some reason to refer to literature as opposed to careful surveys of climate scientists, some specific hypotheses about what the literature gives us that surveys do not. There are a few possibilities there, but we'd need to think it through. A reasonable theory of literature-based consensus is probably not going to be temporally arbitrary – you won't weigh papers from 1994 as heavily as those from 2009. You won't count everyone's papers additively – you're not going to want to give Dick Lindzen 140 votes and some young buck 4 votes, based simply on paper count. You'll probably decide, based on theory, on a variable weighting system, where someone's 5th through 15th papers on attribution count for the most, but after 15 maybe you taper down the weights, limiting the impact of incumbency. You'd probably think about giving weight to heavily cited papers. You would not include most of the papers Cook et al included, like mitigation and impacts papers, for reasons detailed in the report (unless you could argue that impacts papers, say, carried epistemic information about human forcing. In that case, it would probably only be certain kinds of impacts research.)
A paper-counting method is also extremely vulnerable to historical lag and anchoring effects. They went back to 1991, and there are an alarming number of papers from the early 1990s in their set. This sort of design will not detect recent advances that move the consensus in one direction or another. If we had a breakthrough in estimates of equilibrium climate sensitivity, something that changed the game, something that made scientists a lot more confident in the stability of the new estimates of ECS, ending all these years of constantly revised estimates, this breakthrough might manifest in three or seven papers over the last two years. If it were a significant downward or upward revision, it wouldn't be captured in the bulk of the literature back to 1991. The method is also vulnerable to Least Publishable Units practices and publication bias. The publication bias argument is popular with skeptics – it's an easy argument to make and a hard one to prove, which essentially leaves me with nothing much to say about it at this point.
There are lots of other issues you'd have to think about, but even with rigor and qualified, unbiased raters who are blind and independent, in controlled environments like a lab, and only including climate science or attribution research, this could still all be useless and undiagnostic. We'd want to think about the difference between measuring consensus and measuring reality – the latter is what a meta-analysis does, in a sense. It aggegates findings, with strict methodological inclusion criteria, in an attempt to come to a credible conclusion about true effects (and effect sizes.) It doesn't measure consensus. People in most sciences generally don't talk about consensus much – look at the literature. For example, no one in my field is trying to get anybody to come to consensus or hold a bake sale or anything like that. Cook and company's claims about science as such being driven by consensus are strange – you'd probably have to do some serious empirical work to support such a sweeping positive empirical claim about so many fields. Consensus is unlikely to be a universally reliable, portable heuristic for reality across domains and arbitrary timepoints.
You'd have to really dig in, and have an account for whether consensus will be epistemically diagnostic or reliable for climate science in particular. That's a very hard question. You'd want to consider that dissent in general will cost a person much more than assent or conformity, and there's some reason to believe that dissent in climate science is extraordinarily costly. You'd have to think about how you could account for this in any estimates of consensus, what weighting might be appropriate, whether the dissent in climate science represents ever-present base rates of less competent researchers or contrarian personality traits, or whether it represents something else entirely. For one thing, you'd probably be curious about the dissenters, and might want to talk to them and learn more about their reasoning.
Some scientific truths won't be amenable to discovery by consensus-measurement, certainly not by the methods they used. Sometimes the only way to know something is to know it. Some problems in science might only be understood by 30 people in the world. Or 9. I don't know if that's true of any climate science problems, where the key issues are probably ECS and TCS. Some issues need very careful and advanced reasoning, real substantive engagement. Validity issues are like this. I've never dug into the controversy over climate models and their validity. That's a good example. A scientific method can be completely invalid, though this is probably less likely as time goes on. It's easier to do invalid science than valid science. I doubt the models are invalid, but if there is something fundamentally wrong with how climate models are used by climate scientists, and this leads to a systematic error in their estimates, then the consensus won't matter. If a field makes a fundamental, pervasive validity error, then it will just be wrong. This can definitely happen, though I think social science is much more vulnerable to this than climate science.
More broadly, I don't think people have a full account of all the ways we can be wrong. We can be wrong in our fundamental frameworks, in all sorts of ways that we're not even trained to think about. I doubt that climate science is wrong, but these are the kinds of issues that you'd have to think about before giving much weight to a consensus. Consensus is not necessarily important or interesting – you have to make it so. And you definitely have to develop a valid way of measuring it. There might be better tools than consensus-measurement for public assimilation of scientific realities. For example, well-structured debates might be more informative and valid than consensus. I think there's some emerging research there.
It's alarming that they were able to do this, and get published. The good news, such as it is, is that their false claims were in the area code of reality, as far as anyone knows. Valid studies find consensus in the 80s or maybe 90% (Bray and von Storch are good, as is the AMS report.) Some questions yield much lower estimates, but in general it's up there. 97% is inflated, and in this case the number had no real meaning anyway, just biased paper-counting from an invalid starting set. The inflated values might make a difference in policy choices, which is disturbing. We can't be so risky with science. It horrifies me when people shrug off such malpractice because it's only 10 or 15 percentage points off of reality. The danger of tolerating malpractice because the results are sort of close to accurate seems obvious, and the consequences of that kind of tolerance would likely extend well beyond climate consensus research.
This amazing study reminds me of Flowers for Algernon:
"...the researchers recorded the EEG of human participants while they were awake and instructed to classify spoken words as either animals or objects by pressing a button, using the right hand for animals and the left hand for objects. The procedure allowed Kouider and his colleagues to compute lateralized response preparations—a neural marker of response selection and preparation—by mapping each word category to a specific plan for movement in the brain. Once that process had become automatic, the researchers placed participants in a darkened room to recline comfortably with eyes closed and continue the word classification task as they drifted off to sleep.
"Once the participants were asleep, the testing continued but with an entirely new list of words to ensure that responses would require the extraction of word meaning rather than a simpler pairing between stimulus and response. The researchers' observations of brain activity showed that the participants continued to respond accurately, although more slowly, even as they lay completely motionless and unaware."
Here's an update on the Cook 97% scam. They apparently have no answers.
Cook and some of his crew had a chat event at Reddit. Last I checked, they never addressed any substantive issues in the report, even though various people asked them about it. They don't argue against the fraud reports, the validity issues, nothing. Instead, they tried to attack me, or issued vague filibusters that had no bearing on anything happening here. To my knowledge ERL has yet to issue a statement or offer an explanation for the paper's comprehensively and substantively false description of its methods.
To review, in their paper, they described their method as: "Abstracts were randomly distributed via a web-based system to raters with only the title and abstract visible. All other information such as author names and affiliations, journal and publishing date were hidden. Each abstract was categorized by two independent, anonymized raters."
All three substantive features of their method are false. Raters were not blind to authors (or any of the other info.) Raters were not independent. Raters were not anonymized.
They falsely described their methods. That is a very, very serious thing. There is no science without an accurate description of methods, and this paper, like all papers, was published on the assumption that they followed the methods they described.
Normally the way science works, that's the end. Nothing else needs to be done by anyone. There are no results to evaluate if they didn't follow their methods. Why? Because valid results critically depended on those methods, and when people don't follow their stated methods, we don't know what they did and thus can't rely on the results. Climate science, or its journals, can't be an exception to this basic norm and epistemic requirement of valid science. Why would they be an exception? (This has nothing to do with the truth of AGW or the reality of a consensus -- this is about a fraudulent and invalid study.)
This is not a pedantic issue. This was a subjective rating study where human raters read authored works and decided what they mean. Such a study could never be valid without blindness to the authors of the works they were rating, since knowing who the authors are is a multifaceted source of bias. Nor could it be valid without independent ratings – raters discussing their ratings in an online forum contaminates those ratings, exposing one rater's ratings to the views of other raters, and makes it completely impossible to calculate interrater agreement or reliability from that point forward. We have no valid numbers with respect to agreement – the crude percentages they strangely offered instead of proper interrater reliability coefficients don't mean anything given that the raters weren't independent. If we can't calculate interrater reliability, we don't have a study anymore.
Having humans read complex text and decide what it means, what position it's taking on some issue, is a very special kind of research because it enables to researchers to create the data to a degree unparallelled in typical science, and the data is the result of subjective human appraisals of text, which is extremely vulnerable to bias, many sources of error, and in some cases the task is not theoretically possible or coherent. Such research demands great care, and won't be valid if things like blindness and independence are not observed. That these people gave themselves a category of implicit endorsement of AGW only exacerbates this. That they excluded papers they interpreted as taking no position on the issue, which were the majority, and calculated a consensus excluding all the non-polar papers, takes this all further from a universe governed by natural laws. They also thought they could just count papers, which is surprising – that papers, not people, are the unit of consensus – after excluding everything they excluded, and given all the obvious weights and biases that would be applied by a paper-counting method (research interest, funding, maleness, age, connections, whiteness, English-speaking, least publishable unit practices, # of graduate students, position as a reviewer, position as an editor, to name a few...)
Moreover, in their online forum, the third author of the paper said: "We have already gone down the path of trying to reach a consensus through the discussions of particular cases. From the start we would never be able to claim that ratings were done by independent, unbiased, or random people anyhow."
Maybe we should quote their methods section again: "Each abstract was categorized by two independent, anonymized raters."
There appears to be no question that they knew, well before submitting the paper, that they had not implemented independent ratings, since as she mentioned, they were discussing particular papers in the forums the whole time. Yet, they still reported in their article that they used independent raters. What is this?
The percentages for rater agreement that they report in the paper are voided because of this -- those weren't independent ratings, so there's no way to measure agreement between them.
Nor can we assume that the fraud was limited to the forums, since the raters worked at home and could just google the titles of the papers to break blindness and see who wrote them. You can't perform this kind of study in such uncontrolled conditions – there's no way to credibly claim blindness here, which is crucial feature. In any case, they freely revealed authors of papers to each other on the forum, sometimes with malicious mockery if it was someone they had already savaged on their weird website, like climate scientist Richard Lindzen. It's incredible – they exposed the authors of the articles. They did so repeatedly and without censure, lacked any apparent commitment to blindness, and were e-mailing papers to each other, so it would be their burden to show the fraud was limited, if for some reason we cared that it was limited. Since they were e-mailing each other, they violated their claim of anonymized raters – they used their real names in the forum, had each other's e-mails, knew each other, were political activists from the same partisan website.
That last fact also invalidates the study in advance – we can't have political activists rating scientific abstracts on their implications for their political cause. That's an obvious, profound conflict of interest, and empowers people to deliver exactly the results they desire. I want to stress that this is unprecedented. No one ever does this. It's too absurd and invalid on its face. Studies based on subjective human raters are a small fraction of all social science, and many researchers will never need such a design, but using subjective raters who desire a certain outcome and are in a position to deliver that outcome is simply not an option.
For the defenders of this study, I think the absurdity of the design would be obvious in any other domain – e.g. a group of Mormon activists reading scientific abstracts and deciding what they mean concerning the effects of gay marriage, much less falsely claiming blindness to authors, falsely claiming to have used independent ratings, etc. If ERL or IOP want to argue that it's fine to use subjective raters who have an explicit, known-in-advance conflict of interest with respect to the outcome of the study, and thus their ratings, and who we can see in the forums gleefully anticipating their results in advance, bragging to each other that they've hit 100 abstracts without a single "rejection" of AGW, further biasing and contaminating others' ratings, posting articles and op-eds smearing "deniers" while they rated abstracts on the issue, and the leader of this rodeo telling the raters that this study was especially needed after another study found an unacceptably low consensus, again, while they were conducting ratings -- well, I'd really like to see that argument. It would break new theoretical ground. I'd normally say they need to talk to some experts in the subjective rating methodological literature, but in this case, you really don't.
(I'm not aware of any study that has ever used laypeople to read and interpret scientific abstracts. This method is remarkable, since it seems implausible that laypeople would consistently understand specialized scientific abstracts. As scientists, we won't understand every abstract from our own fields, much less other fields. Some of the explicit violations of the claim of independent ratings in the forum were cases where raters didn't understand what an abstract was saying, which is easily predictable. Then of course others offer their interpretations, and the rater's subsequent rating is essentially someone else's rating. There is some research on bias and interrater reliability where scientists or doctors rated abstracts in their fields, as for a conference, as well as research on reviewer agreement in peer-reviewed journal submissions, which is historically quite low. Those researchers did not contemplate the idea of laypeople or non-experts rating scientific abstracts. You'd need a lot of training, I think, and it's unclear what the point of such a study would be, for reasons I detail in my report.)
Relatedly, on the reddit forum all I saw were bizarre ad hominem attacks. One of the raters said I had no climate papers. That's goofy. You'd need deep climate science expertise to rate climate science abstracts, but you don't need it to point that out. The Cook paper isn't a climate paper – it's not about cloud feedbacks or aerosols. It's methodologically a social science paper, a subjective rating study, which is well within my area code. Calling out the fraud and invalid methods has no more to do with climate science than their paper does.
For example, when someone says "Abstracts were randomly distributed via a web-based system to raters with only the title and abstract visible. All other information such as author names and affiliations,
journal and publishing date were hidden. Each abstract was categorized by two independent, anonymized raters."
...you don't need to know anything about climate to read this, read the forums where they disclose the authors of papers and discuss their ratings with each other, refer back to the above text from their Methods section, and observe that they falsely described their methods. It might help to know what blindness means and why it matters for subjective ratings, but that's not hard. Moreover, while I think technical ad hominem is valid in some very restricted cases (like a probabilistic inference that non-experts are unlikely to be able to understand specialized scientific material, such that the burden is on them to show they can), it's never valid to respond to fraud reports by saying "He doesn't have any climate papers." You'd want to respond to the report, to answer it substantively. They did something very serious, and they need to answer for it.
Others said my report needs to be published in a peer-reviewed journal in order to matter. Fraud is not normally debated in journals. I don't want to publish this in a journal (nor do I want any "climate papers".) It might be an example in a future publication, but why kill trees to point out fraud? We can actually see and know reality with our mortal eyes and mortal brains. And since the Cook paper made it through peer review, it's not clear that peer review should carry a lot of epistemic weight for the rational knower, at least in some journals or on some issues. My points are going to be true, false, valid, invalid, or some mix – whether they're published in a journal isn't going to tell us that. You can just read the arguments, consider the evidence, and decide – you don't need other people thinking for you. I could be a welder from Wyoming who just got out for good behavior – it would be remarkable for such a person to blow the whistle on a journal article, maybe worthy of a Lifetime movie, but it wouldn't change a damn thing. It wouldn't change anything about the Cook study, or the validity of my points. Nothing about me alters those realities. Journal publication might not even be a better-than-chance heuristic in such cases. This "consensus" epistemology is taken too far – people forget that consensus is only a heuristic, of variable and context-dependent reliability, not a window into reality.
In any case, I reported it to ERL and IOP, and they should be able to handle it without my needing to write it up for a journal. The authors need to answer some very simple questions. There need to be answers. Without some sort of miracle redemption, some way of the fraud not being fraud, it should be retracted. I don't know if this a wild and crazy idea to some people, but science would mean nothing if we can falsely describe our methods (and use methods guaranteed to produce a desired outcome.)
Cook responded to questioners by saying "The common thread in all criticisms of our consensus paper is that the 97.1% consensus that we measured from abstracts is biased or inaccurate in some way. Every one of these criticisms fails to address the fact that the authors of those climate papers independently provided 97.2% consensus. This is clear evidence that attacks on our paper are not made in good faith."
This is bizarre. They may be counting on people not reading my report. I explicitly address this issue in the report, so his claim is false. Here are a few reasons why we don't care about the author self-ratings anymore:
1) We learned that Cook included psychology, social science, public survey, and engineering papers in their "consensus". This after they explicitly said in their paper that social science and surveys of people's views were not included as climate papers. Chalk up another false claim. But, since they included a bunch of invalid papers, this means their author survey included the authors of those papers. This in turn means we can know longer speak of the authors' self-ratings percentages – those figures have no meaning anymore, given that we don't know how many were from psychologists, sociologists, pollsters, and engineers.
2) I pointed out in the report that their counting method is invalid because they count mitigation and impacts papers that have no obvious disconfirming counterparts. For example, if an engineering paper counts as endorsement because it mentions climate on its way to discussing an engineering project, how would an engineering paper count as rejection? By not mentioning climate? If a paper about TV coverage of climate news counts as endorsement (in contradiction of their stated criteria), what sort of study of TV coverage would count as rejection? One that doesn't mention climate? An analysis of Taco Bell commercials? Where's the opportunity for disconfirmation? There's no natural rejection counterpart to such categorization (it won't matter if you find a mitigation paper that they counted as rejection -- this is about the systematic bias, and the endorsements will be far greater than the rejections here as a result.) This all means we won't care about the authors' self-ratings, because of this systematic selection bias in the method (anything that biases the selection of articles biases the subsequent pool of authors rating those articles.)
I also pointed out that we can't validly measure consensus by excluding the vast majority of actual climate science papers that do not take polar positions of endorsement or rejection, which is what they did. Consensus cannot exclude neutrality. We can't assume that neutrality represents a consensus, as they do. And we probably can't count papers to begin with. This is all in the report.
3) Pointing at squirrels is never good when a study has been rebuked for fraud or invalid methods, both of which are the case here. You cannot redeem the malpractice in the first part of the study -- all their false statements about their methods, the invalidity and meaninglessness of their results -- by talking about a completely different part of the study. That's pure evasion. They need to answer for what they did.
4) There are serious questions about their literature search, such as how it excluded everything Dick Lindzen has done since 1997. There are no results without figuring out what's going on with that search, how it excluded all the modern work of a seminal lukewarm climate scientist. You can't just run a lit search based on a couple of terms and then declare that you've got a valid, representative sample of studies. No way. Science can't be so haphazard – you have to try. There's work involved. New methods need to be validated. We can't begin to talk about percentages and numbers without first establishing that our data is valid, this this literature search is valid (and dealing with all the other issues, the first of which is the fraud.) The search issue will interact with point 2 above. Validating the search will require careful thinking, testing, etc. Some of the methodological meta-analysis literature will have guidelines. How to do a valid search for this purpose is a nontrivial issue – nothing that happens after the search matters if the search isn't valid. You have to figure out if there are selection effects, what you're including, what you're excluding, especially with respect to your hypotheses, what happened to Lindzen's papers, and so forth -- you don't just run a search and start rating papers. This is science, not numerology.
(The equivalent would be if some people rented a couple of hot-air balloons for a few days, took pictures of some clouds, wrote it up for a journal submission on cloud cover in North America, and we simply accepting their method without question. Science can't be that sloppy.)
(And only 14% of authors responded to their survey, and they got about two votes each, which highlights the oddity of simply counting papers. I've already pointed out in my report many of the biases and potential weights that will be applied by a simple paper-counting method.)
Cook's statement is also bizarre because of his conclusion. Saying that people who contest the validity of their ratings never address the purportedly similar results of the authors' self-ratings, and that this "is clear evidence that attacks on our paper are not made in good faith" is so strange. Evidence that people are not acting in good faith? Because they don't address some other part of a study? That's such a strange model of human psychology, and of how science works. They seem to have a remarkable immune system or strategy of not engaging substantive criticism, characterizing it as not in good faith, "denial", misinformation, etc. If genuine, it's a remarkable worldview.
Another Cook tactic was to talk about other studies finding similar results. I was dumbfounded. When someone is referring to your false claims about your methods, and your invalid results, the subject under discussion is your study. Nothing else matters. There could be a thousand 97% papers. It wouldn't matter. That there is a consensus does not matter. The issue is the Cook study, its fraudulent claims, its invalid results. Fraud is fraud. We don't redeem fraud by talking about other people's studies. Some of those studies are almost as bad and unusable, like the one-pager, but that's beside the point. The true, meaningful consensus could be 99.997% – that won't matter here. (We also don't need this study to establish a consensus, just as we don't need a particular hockey stick paper to establish anthropogenic warming. We might consider that the case for the consensus likely weakens if this study is included.)
It's incredibly disturbing to repeatedly see this kind of response from people who were published in a scientific journal. Not only did these people have no idea what they were doing, they betray no sign that they're aware of the basic norms of science, the importance of faithfully describing one's methods, why fraud is a serious thing, what it means to say that results are invalid, or how the existence of other people's studies won't intersect with or override the false claims made about one's own study. If their responses are being validated or accepted by ERL or IOP, we've got a much bigger problem. People might be justified in ignoring climate science if climate journals are fine with what Cook did. We wouldn't have any way of knowing that other research was valid, since climate papers would be much harder for us to evaluate. If ERL is fine with fraud, we wouldn't know what to do with any future ERL articles, or with the field as a whole. That's not anywhere we want to be. Some of the confusion or delay might be due to ERL's unfamiliarity with these methods, the need for blindness, independence, interrater reliability, probably a lack of awareness that they falsely described their methods. But those can't be issues anymore – if people don't get it, don't understand why a subjective rater study necessitates that the ratings be blind and independent, don't understand why this study destroyed our ability to calculate interrater reliability, etc. they can just consult some relevant experts in social science.
Let me tell you something else. Some people are saying this paper won't be retracted because the journal is biased, or because it would look bad for Dr. Kammen (the editor), or because he works with the White House, that science is rigged. We need to cut the crap. I don't care who is who or how much power they have. This was fraud. Falsely describing one's substantive methods is fraud. This paper is invalid in ten different ways, but fraud is fraud. They have not answered for it. They apparently have no answers. If the relevant decisionmakers are thinking that they can just ignore this, or issue some PR blather, they need to think long and hard about what the hell we want science to be. They need to think long and hard about the long-term consequences of failing to retract fraud when the evidence of said fraud is so publicly accessible and straightforward. They might think about how far we have to throttle down our brains in order to believe that this study is remotely valid. They might think about what it will do to the reputation of climate science not only that this study was published in a climate science journal, but that it wasn't retracted when it was revealed to be fraudulent and multiply invalid. There are real consequences here, long-term impacts. We can't be this bad at science, this corrupt. We need to stand for things that aren't in a party platform. These people made a complete mockery of the institutions and safeguards that we take for granted. They torched them.
I think it's worth considering what could happen in the long-term if this nonsense were tolerated, if we couldn't get people to act against fraud if the fraud were politically convenient, in combination with all the conventional fraud and malpractice. Cultures and civilizations evolve in countless ways. The meanings and usages of words can change. There's no a priori hard constraint on this process. Science doesn't have to mean something like rational inquiry, or the systematic, reproducible validation of empirical claims or hypotheses, or any of our other definitions (in capsule form.) The word science could end up meaning something very different. For example, in the worst case, it could end up being used only ironically, as a term for scams. As it stands, we probably use the term too broadly -- science is very diverse, and the issues discussed above will be alien to many scientists. (The idea of gathering some political activists to "rate" scientific abstracts is far removed from most scientists' methods, and the type of fraud we see here is very different from prototypical fraud in cellular biology, for example.)
I don't mean that what we've classically known as science would cease to exist -- that would likely require a large asteroid strike. I mean that science could plausibly evolve into a label for a corrupt and privileged subculture whose divinations are no more tied to reality than a day-old religion. In such a case, actual science as we mean it today would be called something else, perhaps interrogo, ρωτήστε, or bluefresh, and the professionals who practiced it would meet high standards -- the standards people expect of scientists today. I'm not saying this is likely -- I don't quite think it is. But I think it's plausible, and I'd caution you against underestimating the degree of change and evolution that human culture and language can experience, even in our lifetimes. I don't mean to make a trivial point about changes in terminology and phonemes. I mean to illustrate that continuing to generate garbage and fraud under the banner of science will despoil that banner. Circling wagons to protect such fraud and garbage is tautologically incompatible with our prior commitments to scientific integrity. In any case, nothing in life is assured -- not prestige, not funding, not an audience. I don't like being pragmatic about this, since I think idealism should dominate here. We like to say we stand on the shoulders of giants -- we might want to think about whether anyone will be able to stand on ours.
Hi all. I'm working on a side project, and I ask for your help.
Have you been harmed by others' judgments or perceptions of you as a climate skeptic? Please send me your stories.
Climate skeptic is broad and vague, but what matters here are others' perceptions of you as questioning human-caused warming, or questioning or doubting its future severity, and so forth.
By harmed I mean things like discrimination at work or in business, or damage to personal relationships of all kinds, due to others' perceptions of you on this topic. For example, has it ever come up in a job interview or promotion? (Seems like it wouldn't come up, but there's a lot of variance in what happens in interviews, and sometimes people will just have casual conversations about whatever, and political or social issues can come up.)
Are any of you stereotyped or marginalized as cranks, conspiracy nuts, "deniers", etc. in your workplaces? Have people ascribed any other views to you based on you being climate skeptic?
Now I wonder about something more specific. Have any of the Lewandowsky scams come up yet in real life? For example, have has anyone assumed that you believed the moon landings were a hoax? Have people assumed that you don't think that HIV causes AIDS, or that smoking causes lung cancer? Have any of these specific issues popped up?
Moreover, did people who know that you're pro-free market or an economic conservative (for Europeans, I mean an economic liberal) think that you dispute that HIV causes AIDS?
The vector here is from the journal inexplicably publishing the Lewandowsky scam paper --> the various media outlets that covered it --> people read the articles and carry the false associations in their minds.
For example, there were articles in Mother Jones, the New York Times, and countless other places, that repeated the false claims made by Psychological Science and Lewandowsky. I'm wondering to what extent it's out there in the streets, so to speak. There are 7 billion people on earth, and I think any false psychology findings ascribing beliefs to a large group of people and reported in the mainsteam media will have an impact on some number of people.
Here are some examples of what I mean:
You go out on a first date, or several dates, then the other person learns about a view you hold and they end it. This happens all the time on anything. They learn that you're an atheist and it's over (happened to me once.) Or that you're a Baptist. They learn that you're a liberal/Democrat. Or that you're a conservative/Republican. These are dealbreakers for extremely partisan people where a large part of their moral identities and self-concept is tied up in their political affiliation.
You're at work and you say something about your views on human-caused warming, maybe minimize it's long-term impact. It has a chilling effect, and coworkers see you differently from that point on, think you're a nut, don't invite you to happy hour, etc. I'm guessing this is rare. In such a case, the coworkers would likely have to be very political and caught up in the "denier" narrative. I'm even more interested in cases where the Lewandowsky scams popped up, where someone said something like "Do you think the moon landings were a hoax?" or whispered to others that you dispute that HIV causes AIDS, after reading some junk article by Chris Mooney passing on the false findings.
And again, this doesn't have to be about climate skepticism necessarily. The false associations about AIDS and smoking were linked to pro-free market views, not climate skepticism, but I think all this would likely be jumbled together in the minds of people who read a hit piece in the media.
If you have any experiences you think are relevant here, please send me your stories. Thank you!
4.5 million hospital records were hacked, apparently by a notorious Chinese hacking group. I've been concerned about the security of medical records for years – the issue has been obscured by the huge business breaches like Target, Nieman Marcus, and now Home Depot (Russian hacking teams, usually based in Odessa, Ukraine.)
There's a basic structural problem right now with IT security. There aren't enough security experts to go around, and businesses and institutions often have no clue what they're doing. To secure computer records and networks, especially those that have any link to the internet, requires a lot of expertise and constant up-to-date awareness of threats and types of attacks. Think about all the doctor's offices and hospitals you go to. The small practices in particular are not going to have a clue about IT security – yet they're going to have a lot of sensitive information about you. Psychologists, psychiatrists, drug rehab facilities, etc – they have very sensitive records, but I'd not expect them to do a good job of securing them. If they've got PCs with web access, which they will, and those PCs are on a network with PCs that have your records, there's your vector. (And it will often be the same PCs...)
I think it's best to assume our records are compromised. It's much easier to breach a medical practice than to breach Target. I think it's probably wise to extrapolate from all the breaches you hear about and assume there are a lot more. Security experts say this all the time.
This means you might want to think about what you disclose when you complete all the forms these various places require. It might be wise to just tell a doctor or other professional something you think might be important about your medical history instead of putting it down on a form (though it still might end up in your file because they take notes and so forth – ultimately our medical records are going to have to be populated with medical information.) And when you're visiting unrelated practitioners, you don't need to disclose everything about your medical history, even though the forms ask for it – for example, if you're seeing a physical therapist for a bad knee, maybe you don't need to document your colon cancer or drinking habits, even though their intake forms ask about that kind of stuff (the forms are often boilerplate – they're not necessarily designed for the needs of the practice that uses them.)
José L. Duarte
Social Psychology, Scientific Validity, and Research Methods.