Here is a tiny preview of upcoming methodological work relevant to recent events:
The Frog Jump You have two survey variables: Variable 1: AB Variable 2: CD AB: A and B are opposites sharing a 4-point scale (2 pts for A; 2 pts for B) CD: C and D are opposites sharing a 4-point scale (2 pts for C; 2 pts for D) What do I mean by opposites? We have lots of Opposite Scales. Canonical forms are: Disagreement -- Agreement Dislike -- Like Unhappy -- Happy They can vary in how stark the flip is, but in general one side is expressing the substantive opposite of the other side, and has profound implications for how we describe things like correlations – or whether it's even appropriate for a scientist to use correlations on such measures. In our example, it's a stark flip, a virtually dichotomous four-point scale: Absolutely Disagree, Disagree, Agree, and Absolutely Agree. There's not even mild or moderate disagreement or agreement, nor is there a neutral or ambivalent midpoint, nor is there a separate unscored No Opinion option (which I strongly advocate.) Descriptives: 99% of the sample is at the D side of CD. - 99% of people at A are at D (A is large majority of sample) - 95% of people at B are at D (B is small fraction of sample) Still, for some reason you decide to run a Pearson correlation between AB and CD. Negative correlation, nice mystical, meaningless p-value; sacrifice a goat to the Gods. Write it up and say "B predicts C." (Instead of saying AB is negatively correlated with CD, which will be hard because A and B are opposites, and C and D are opposites. Choose to talk only about B and C, even though no one is at C.) (And don't tell anyone that there's no one at C, or that 95% of people at B are at D.) Part of the dark magic here is using variance within D – remember D includes two points of the scale (Absolutely Agree and Agree) – to power a negative correlation between AB and CD (there won't always variance on one side that will power a correlation – it just happened to go down that way here), then frog jump it to C when you write it up. Even though everyone's at D, the unscrupulous researcher only refers to C, which is the substantive opposite of D. There are several root causes here. One is that disagreement and agreement scales are being wrongly treated as continuous variables. This is why we can't use a single letter for each variable in the example. There are really two variables in each variable. One is disagreement, and one is agreement, which are substantive opposites (and there was no midpoint here) This will be less bad if you've got variance across both sides, not a 99% on one side situation. But if all your participants are one side of a substantively dichotomous scale, if you treat that scale as continuous, run a correlation on it, and then frog jump to the other side of the variable (where no one is) when you write it up, that's dark, dark business. It's wildly irresponsible, and the severity of the misconduct goes up as the amount of harm people could suffer from being falsely linked to that empty side of the opposite variable goes up. In this case, C was disagreement with "HIV causes AIDS." (D was agreement, where 99% were.) Running linear correlations on substantively dichotomous variables opens up the opportunity to frog jump from one side to the other when researchers write up the results. It conflates direction (+ or - correlation) for destination (C when everybody's at D), and creates enormous opportunities for bias and fraud. A researcher could use a correlation to proclaim the opposite of the truth, in this case "B predicts C", when in fact almost all B are at D, not C. The word "predict" is a common way to describe correlations and regressions, but using it as it was used in this case is false ("B predicts C".) If you wanted to find out if in fact B predicts C, a linear correlation between AB and CD will not give you the answer. Completely different analyses are needed, which will require real scientists. This might be underexposed, but linear correlation is not an inherently valid methodological decision. An r statistic with a desirable p-value is not inherently meaningful. Nothing is. This is especially true when the variables are opposite variables. We can't use variance on one side of an opposite variable to say anything about the other side. People who would frog jump on something like "HIV causes AIDS" and falsely link millions of innocent people to denial of that fact should turn in their lab coats. Editors who allow this should edit no more. Journals with the word "science" in their titles and who publish this incredibly harmful, wildly unethical malpractice should vigorously reform and reconnect with their calling.
3 Comments
Let's say you want to find out what climate skeptics think and why they think it. What's a better method?
1. Find a registry on the blogosphere where climate skeptics have laid out their views in some detail, and how they came to those views, as well as their educational and professional backgrounds. Read these entries and report their content in some sort of organized fashion in a journal article. 2. Post a survey on environmentalist, anti-skeptic blogs, open to anyone in the world of any age, including fakes. Ask participants if the moon landings were a hoax and if HIV causes AIDS, along with climate questions. Notice that about 2% of self-identified climate skeptics in your survey endorse the moon-landing hoax. Report that climate skepticism springs from belief in the moon-landing hoax. 1 or 2, you be the judge. When the story of Method 2 is told, in magazines, books, television specials, and so forth, wow. I want to see the looks on people's faces. It's going to be fun. The University of Western Australia and Bristol might want to get ahead of this thing before they end up under it. Eric Eich's role in all of this is fascinating, as is APS. It's actually sparked some scientific curiosity, in myself and others, on the cognitive processes involved, the strange combination of the exquisiteness of the human eye and its willful non-use. Blind people should be offended by this. All these good eyes going to waste. In any case, regarding Method 1, here it is. Nice work by applied mathematics Professor Paul Matthews of Nottingham, a gentleman and a scholar. Such men were once the rule in the academy. My hat to you, sir. For years I've thought it's ridiculous that women are smaller than men. There's no reason why we should just accept this.
Embodied cognition is an endlessly interesting area of research. I don't know what's been done on the condition of chronic smallness with respect to other beings, mates, authorities, and so forth. The fact that women are smaller than men is a profound state of nature that could easily shape hundreds of downstream consequences, and can certainly be a factor in the cloak of silence that is all familiar to too many women. We evolved this way for a variety of reasons, and I don't remember all the factors. I haven't thought about sexual dimorphism in a long time either. Evolution is a bitch though. It staggers me, takes my breath away, to think about how we evolved, how humans emerged from all of this. It staggers me to think of how a conceptual consciousness emerged. That is not a normal thing, an ever-before thing. It staggers me to think of the evolution of human language, with is tightly bound up in our conceptual consciousness. Evolution sprouted beings who appear to have free will at important levels of analysis, beings who do art and airplanes and archaeology. Out of the blood and death and relentless math of fitness value, the punctuated equilibria, genetic bottlenecks and asteroid reboots, we got rappers, Frederick Douglass, the Magna Carta, telescopes, Mighty Mouse, and I Know What You Did Last Summer. And I can see evolution as granular, ruthless, arbitrary, and charmless. Our adaptations are often orthogonal to our current values and aims, or even incompatible with them. Separately, I think it's perfectly reasonable for someone to not understand evolution, to find it unintuitive, and to be unmoved by it. I think screaming at people who aren't charmed by evolution is a fascinating strategy, and an unimpressive scholarly pastime. Evolutionary psychologists like to point out that we're still evolving. True dat. But this observation misses an important fact. We are clearly going to take the reins on a much shorter timescale than it will take for any ongoing adaptive processes to yield noteworthy changes in who we are. It's 2015. We know we're going to improve ourselves in the future. It will probably start with disease prevention. China is already looking into boosting average IQ. I think we know far too little about IQ, intelligence, human cognitive processes that bear on both intelligence and aptitude, the facets of intelligence, the trade-offs, or the meaning of life to be mucking around with genetic engineering aimed at producing more talented engineers or better clay for tiger moms anytime soon. It's too early. Gattaca was a warning. But we know it's inevitable that we will improve ourselves – "improve" by our own standards, which are the only ones that matter. Within a hundred years, we will surely see some movement there, and probably within fifty (I think Kurzweil overestimates the speed of progress.) At some point we'll be able to sort out this whole chronic smallness situation, this fundamental fact that shapes the way women and men interact. Our size difference shapes the nature and relation of men and women in more ways than I will ever be able to identify. It's not going to be simple. I might be missing something. Perhaps we would lose something important, something more important than the benefits of equalization. I doubt it. It's ridiculous that women are smaller than men. Marquette's decision to strip Professor John McAdams of tenure is a disgrace and may influence my long-term plans with respect to the intellectually craven environment that is American academia.
A key pivot point for this story is that a student in a philosophy class was concerned about the tolerance for dissent in that class on the issue of gay marriage. He spoke with the instructor, a graduate TA. She ultimately told him that "homophobic" views would not be tolerated, and added the obligatory pre-emptive censure of "racist" and "sexist" views. (Professor McAdams criticized the teacher's handling of this episode in a blog post, and also criticized prior hyper-PC behavior by the Assistant Dean and the Philosophy Department Chair, which is what I believe actually did him in.) The student had not expressed homophobic views and seemed articulate. His only specific comment seemed to be the idea that there is empirical evidence of bad outcomes for children of gay couples. I was struck by the teacher's robotic injunctions against the recurrent and seemingly empty phoneme strings: racism, sexism, and homophobia. We can't have any of that stuff, cause it's bad m'kay? I'd like to make this clear. I want to hear racist, sexist, and homophobic ideas. I think you should want to hear them too. I mean it, and I mean it at two levels of analysis. Level 1: I most definitely want to hear ideas that academic leftists label as racist, sexist, and homophobic. I want this because experience tells us that the ideologies and theories academic leftists apply in tagging discourse with those labels will be unsatisfying to many scholars on counts of rigor, merit, and framing assumptions. Similarly, we know that the vast majority of the earth's population will not consider those discourses and ideas to be racist, sexist, or homophobic, which might suggest fertile ground. So we most definitely need to hear, discuss, test, and debate those ideas given that their prohibition by a the current ideological spasms of American academia should be of no interest to us. We should brush away this unscholarly screaming without even slowing down. Level 2 breaks into near isomorphic variants. For my purposes here I'm basically aggregating the three variants. Level 2a: I want to hear ideas that I would consider racist, sexist, or homophobic. Level 2b: I want to hear ideas that most people who are not American academics would consider racist, sexist, or homophobic. Level 2c: I want to hear ideas that are in fact racist, sexist, or homophobic. Why? First, I have no reason to suppose – to decide in advance – that nothing can be learned, no insights gained, by hearing and perhaps engaging or interrogating robustly racist, sexist, and homophobic ideas. I'm surprised people would just assume that nothing can be gained. This seems unlikely to me, and I would place the burden on those people to prove that nothing can be gained. The null here, or the default intellectual/scholar mode, is the mode of engagement, exposure, listening, considering, weighing, arguing, etc. Second, I have no reason to suppose – to decide in advance – that there will be no merit or wisdom in canons of racist, sexist, and homophobic thought. That would be strange to me, seems unlikely given what we know about the scope and texture of human discourse, what we know about history, about human psychology, about the marginalization of ideas and peoples, and so forth. I wouldn't be shocked to find a diamond in the rough, a speck of gold in the prospector's pan. I imagine these wisps or chunks of truth or wisdom will be orthogonal to the broader racist, sexist, or homophobic systems to which they belong, but I don't want to pin too much on that assumption. I want to go in clean. Third, I freely embrace a notion that would perhaps not be controversial in brighter eras. I think it's possible that I'm wrong. I think it's possible that I'm wrong about lots of things. I think it's possible that I'm wrong in the beliefs in which I'm most certain. It surprises me that today's scholars don't seem to imagine a universe where they are substantially wrong. This also means you might be wrong, any of us might be wrong. I might be wrong, in some sense, about racism, or some subset of it, or some other dimension of it that I can't foresee. What would that look like? Well, I'm never going to be a racist in the sense of malevolence or hate toward humans because of their race. It's not realistic that I could change that much. It's like imagining I was raised by a different set of parents – I wouldn't be me anymore. There are levels of analysis with respect to racism, sexism, and homophobia that are philosophical. That's where philosophers go to work. Then are levels of analysis that are empirical, things we can measure, things we can go find out and come back with the answer. That's where scientists go to work. There may be things racists, sexists, and homophobes believe that are simply true as an empirical matter. For example, there might be stable innate differences in IQ between various racial-ethnic groups. Here I tend to bring the same cautious attitude I invoke above – I think it's nuts for anyone to be settling on the answer to that issue right now, especially racists. It's way too early. Give it another fifty years at least. There are a host of complex cultural and environmental issues that may be in play, including ones we don't know about. There are many known unknowns and unknown unknowns here, and I really don't like jumping the gun. There's no reason to assume that any question we have can be readily answered by some dude in a lab coat in the era in which we live. That's just not going to work out. So while I don't think this can be settled in the near future, I am open to the idea that there are innate differences and that we will know this in 2060 or 2090. These differences may even be unfavorable to my group, Mexicans, Native Americans, whatever we'd call the brown or the genetic substrate in my case. I do not assume that these sorts of differences matter, or will matter, or that people have to care about them, or that it is rational or ethical to employ heuristics based on them in day-to-day life. Also, I don't assume there will be such differences. I have no idea. Reality is a very complicated place. They might be faint. Who knows, but what to do with that kind of reality is a philosophical question. We also have to remember variance and how that works, get people trained up to not focus on mean differences. It's the same with the outcomes for children of gay couples. I doubt there's much of an effect there, nor do I assume anyone has to care about such an effect, but I wouldn't want to shut down that conversation. The student in this case was wrong in saying such children "do a lot worse in life." I've not seen large effects, not the last time I checked. The teacher was wildly incompetent in responding to the student's arguments, shifting the issue to single people having kids or adopting or something. That was so lazy and invalid. It's also not a fruitful path because we know that the number of children born out of wedlock has exploded, and we know that there are indeed very real consequences for those children. We know a lot about that. Social Science Qualifier: From experience, I know people don't come in with the assumptions about probabilistic truths that social scientists take for granted. When I say "those children", I am speaking of statistical effects that rest on mean differences and differences in variance detected by inferential statistical methods. Call it averages. I am not saying anything about you. I'm not saying anything about your family, your background, your parent(s), or your friends. Any given single-parent household might be the best, most loving environment possible in our civilization. Any given single-parent household might send kid after kid to Harvard. There is plenty of room for variance. It's the aggregate reality that is at issue here, the net effect. You can assume that it's a bigger problem for lower income contexts, rather than Murphy Brown affluent professional single mother situations. There do appear to be father-specific benefits and all sorts of interactions, but this is always about aggregates, not any one context. What we do with statistics about children of gay couples or children born out of wedlock is a whole different journey than the empirical journey we just shared. This is philosopher-hat business. You get to decide whether or how to use these empirical findings, how to situate them in a broader context or a political platform. Let many flowers bloom. We need many voices. I for one am not going to stop supporting gay marriage if it turns out that children of gay parents have 4-point lower average SAT scores than children of straight Catholics. See what I mean? An effect doesn't imply a political position. I'm not a utilitarian, certainly not a knee-jerk rationalistic drowning in data utilitarian. (I do think we have a real problem with children born out of wedlock. The conservatives were simply right in their intuitions about how that would impact children and family life. You should give them props on that one.) Back to Marquette, the bastards. I'm worried about scholarship in our time. Do people have no sense of history? Do they have no sense of the grand sweep and our place in it? Do they have no sense of the enormous range and complexity of fruitful inquiry and scholarship? Are they really that small that their ideology reduces to empty phoneme strings and incantations about racism and all the other baddies? Do they not understand that some or all of their ideological tenets can be disputed by capable, rational, and benevolent thinkers? Do they really not consider the possibility that they may have gotten something wrong? What the hell are these people doing in a university? Do they not realize that modern American academia is a culture, and that their culture is going to yield a different set of experiences and insights than people who come from other cultures, or even twenty miles west? Why would their culture be better than all the others, with a complete package of The Truth over the Iowa farmers, the Montana ranchers, the Brooklyn ballers, the Phoenix suburbanites, or the church choir? I hate cowardice, and these are cowards. Cowardice will have some implications for the quality of scholarship and how much longer we can keep the lights on. What bastards. Give Professor McAdams his damn chair back. I was struck by this quote from a Forbes piece on the secondhand smoke research.
"there’s no such thing as borderline statistical significance. It’s either significant or it’s not." It's attributed to a journalist named Christopher Snowdon (I don't know who that is.) It's false, and I think it's important for us to convey a clearer message to the public about what statistical significance is. tl;dr: It's a business decision, and by the way, how many fingers do you have? (thumbs included...) Significance is not a binary or discrete property of a scientific finding. Our convention in social science, and I think in lots of biomedical fields, is the .05 threshold. I'll return to this. The statistical significance of an effect is the likelihood of drawing a random sample with the measured characteristics of our sample if the null hypothesis is true. Note that this is not the same thing as saying the likelihood of our research hypothesis being true is 1 – p, or 95% or greater given our standard .05 threshold. Significance is often mis-explained as the inverse likelihood of our hypothesis being true. That's not what it means. And there are other assumptions, particularly regarding normal distributions, that will impact the meaning of all of this. By measured characteristics, I mean a sample that looks like our sample in the study. So if we conduct a longitudinal study with a large sample of women and track them on variables like lung cancer and passive smoking, we end up with X% having lung cancer, Y% having lived in home with a smoker, Z% who have lung cancer and lived with a smoker, and variance on other variables like length of time a person has lived with a smoker, age, race, lifestyle, etc. The null hypothesis is that passive smoking does not cause lung cancer in nonsmokers – that there is no relationship between these variables (that would be one of several hypotheses in the actual study referenced by Forbes, because they also tracked smokers.) So significance here means the probability of drawing a random sample from the population, with the exact percentages and so forth that we see in our sample, assuming there is in fact no link between passive smoking and lung cancer rates (that the null hypothesis is true.) We can see a few things here. Given typical sample sizes, if 2.00% of women who lived with a smoker get lung cancer and 2.00% of women who never lived with a smoker get lung cancer, there won't be a significant effect. The core reason is that this is exactly what we'd expect to see if the null hypothesis is true. If there's no actual link between these variables in the population, it's likely that we'd draw a random sample that looked like ours – a sample with no differences between the groups. This likelihood goes up as the sample size goes up. In this kind of scenario, your p-value might be something like 0.70 or 0.80. The particular value doesn't matter – what matters is that it's well above our threshold of 0.05 (and more importantly, that there is no difference between groups, no effect.) If we do see differences between groups in our sample, the p-value will be lower, because if the null hypothesis is true, we wouldn't expect to see such differences. How low that p-value goes will depend on the size of the difference between these groups (the effect size) and the sample size. As the effect size goes up, the p-value goes down because it becomes less and less likely that we'd see such differences in a random sample if the null hypothesis is true. If the sample size goes up, the p-value goes down, because a larger sample size reduces the likelihood of random sampling error. It's like flipping a coin over and over. If you flip it only three times, you could easily get three straight tails, but as you keep flipping you'll get to a more or less even split of heads and tails. As I said, our threshold is 0.05, meaning a 5% or less chance that we would draw a sample like ours if the null hypothesis were true. Why .05? At the margins, it's arbitrary. What's the point of a threshold? The point is to have some standard that reduces Type 1 error – detecting an effect that is not real. At the same time, we want to be able to talk about effects and to report findings that are likely to be real. Scientists could have settled on .10 or .04 or any of a number of values. Like I said, at the margins it's arbitrary. The specific choice of .05 is partly due to the fact that you probably have ten fingers. You might remember I asked you to count them. A lot of our choices of thresholds and rules of thumb are driven by the fact that we use a base-ten number system. Five is half of ten, and so we tend to settle on values that are multiples of five or ten. There was almost no chance we would've chosen .04 or .06. Those numbers don't satisfy us the way fives and tens do. (If humans had eight fingers instead of ten, we might very well have chosen .04.) As you can infer from above, significance is a continuum. We could have a p = 0.08 situation and that effect could easily be a true effect. In fact, an effect with p = 0.30 could be a true effect. But especially in that case, when we're getting up to a 30% chance of drawing a sample like ours, we don't want to report that as significant. Whereas, effects with 0.06 or 0.09 p-values are often reported, and should be. We report them as something like "this was significant at p = 0.06" or "this was marginally significant at p = 0.09". Note that we're still using the word significant. We can use that word given any p-value, as long as we include the p-value. That's why Snowdon is wrong. The choice of 0.05 is a business decision that achieves a good tradeoff in our levels of Type 1 error vs Type 2 error (failing to detect a true effect.) But there's nothing natural or inherently meaningful about p = 0.05. It's not a value derived from nature, like Planck's constant. It wasn't discovered. There's no "significance" in nature. Like I said, it's a business decision. I wanted to clarify something based on a comment by a supporter on the discrimination post.
Here's my recent Comment on Verheggen et al. in Environmental Science & Technology. Some of my mentors advised me to take down the post from July 22, 2014 about discrimination because it would hurt me in the job market – that hiring committees would discriminate against me (again) for my perceived political views, or would discriminate against me for documenting past discrimination.
Hi all – I'm looking for volunteers to do some light editing on my longer reports / posts, particularly those on the Cook and Lewandowsky scams.
I'd like to migrate that content to the new site, but with better structuring. If anyone is interested in adding a Table of Contents to the longer posts, along with corresponding section headers, and perhaps general edits, I'd deeply appreciate the help. I keep telling myself that I'm going to do this work, but I'm in the middle of dissertation work, and I know I won't get to it soon enough. I used to do this for other people, and I remember some of you inquired about this months ago. In general, I think I'd probably benefit from editing on an ongoing basis. So far, my girlfriend has erratically served in that role (typically after I post an essay), but if anyone is interested in editing my essays, or writing their own, I'm quite open to it. The new site will run on WordPress, and I can easily add editors/users. There will also be room for guest columnists on the new site, so let me know if you'd like to write on a topic. I'm just thinking broadly right now – I'm open to all sorts of ideas. But for now, what I need are volunteers for the existing canon. Please e-mail me if you'd like to help. Thank you! Yes, I know promised a new website yesterday. (Sorry, Lucia I deleted that post for no good reason.) I've delayed the launch – it wasn't quite right yet, and I need to focus on dissertation and other work for a couple of days.
|
José L. DuarteSocial Psychology, Scientific Validity, and Research Methods. Archives
February 2019
Categories |