There are several alternative psychometric measurement models that can operationalize a latent construct

These manuals provide an evolving checklist of possible indicators of drug abuse and/or drug dependence, some subset of which will trigger a distinct categorical diagnosis. Because these diagnoses are seen as consequential for clinicians, clients, treatment facilities, third-party payers, and for the development of addiction science, the years preceding each revision always see a lively and vigorous debate among experts about which indicators of substance abuse and dependence – e.g., withdrawal, tolerance, cravings, legal problems – do or don’t belong in the checklist. To an outside observer, the process can appear chaotic and as political as it is scientific. But somehow, the resulting checklist seems to have a noteworthy psychometric property. Using popular psychometric methods, it has been argued that the DSM diagnostic criteria for substance dependence or a substance use disorder form a unidimensional scale – implying that they are tapping a single, coherent latent construct, either “substance abuse” , “substance dependence,” or in the newest iteration, the combined construct “substance use disorder” [e.g., ]. But there is something odd about this. If indeed the DSM criteria form a unidimensional construct, then there should be little reason to spend years debating specific items to include in the construct. Under the measurement model that characterizes most psychometric analyses of DSM data, these indicators should be roughly interchangeable, in the same way that different items on an attitude scale, vocabulary test,cannabis grow equipment or personality trait inventory tap different manifestations of the same underlying construct. And the corollary observation is that if the criteria that get debated – withdrawal, tolerance, craving, and the like – are indeed conceptually and empirically distinct , then the evidence for the unidimensionality of the DSM criteria is perhaps puzzling or even troubling, rather than reassuring.

This essay does not contend the DSM diagnostic criteria are foolish or meaningless, or that adopting them was a serious mistake by some criterion of harm to patients. Rather, I argue that there is confusion about the underlying structure of the DSM substance-related diagnostic criteria, and greater clarity might promote the development of better science, better practice, and better inputs to management and policy making. These are analytic issues that deserve attention in the coming decade, in anticipation of the eventual next iteration, DSM-6. What would it mean for a list of such criteria to constitute a unidimensional latent construct?They quite literally imply different metaphysical assumptions – ontologically, what construct exists, and epistemologically, how to do we identify it? – but also different mathematical definitions. The discussion that follows gets slightly technical, and requires a few simple equations, but to keep things simple I assume there is only one latent construct and that the terms in the model have unit weights . Traditional factor-analytic models are usually specified mathematically as a set of structural equations of the form Xi = F + ei , where each X is one of i observed or “manifest” variables , and F is the underlying latent construct thought to cause each X to take on its observed values [e.g., ]. Importantly, the e terms reflect any idiosyncratic variance associated with the observed variables but not caused by the underlying latent construct of interest. This has an important implication; if any two observed variables share a common latent factor, it is assumed that these variables share nothing systematic in common other than that factor – they are “conditionally independent” unless the default assumption of uncorrelated error terms is explicitly overridden. Any model with these features is now commonly referred to as a“reflective model”. The reflective model is a method of constructing unidimensional composite scales and justifying their interpretation as such.

The most common theoretical justification for this interpretation is the domain sampling assumption that the observed variables we retain as indicators of the latent construct are essentially interchangeable exemplars sampled arbitrarily from a much larger domain of possible expressions of the construct. “The model of domain sampling conceives of a trait as being a group of behaviors all of which have some property in common. . . .If the sample [of indicators] we draw from domain is representative, than its statistical characteristics are the same as those of the total domain” [, p. 211–212]. Specifically, in expectation any sufficiently large random sample of indicators from the domain should yield the same average value, and the same correlations among indicators. This notion of sampling is of course hypothetical, not literal, and that creates an important conceptual twist: “Instead of specifying a population of some set of entities and then drawing a sample randomly from it, . . . we have a sample in hand that in turn implies a population . . . having the same characteristics as the sample” [, 214 p.]. Most of the published psychometric analyses of DSM criteria that I have examined adopt the reflective model of factor analysis, without explicit justification. But this creates an unacknowledged conceptual puzzle: according to that model, any differences between two criteria – say, withdrawal symptoms vs. interference with important activities – are simply part of the error structure of the model rather than the construct itself, or its composite score. In other words, the distinctive features of each criteria that form the basis for expert debates about DSM construction are actually irrelevant to the model. Under the domain sampling assumption, there should be relatively little to argue about; we can inductively generate large sets of candidate criteria and simply cull out the ones that don’t “load” on the common factor . I very much doubt this is how most DSM experts view the diagnostic criterion list, yet this is how the analyses treat it. There is a less familiar alternative way of specifying a latent factor model – the “formative model” [Figure 1B; see Ref. ]. This model is superficially similar – it consists of the same observed indicator variables, plus one or more latent factors, and an error term.

But the assumptions are quite different. In a formative model, the latent factor does not cause the observed variables; rather, they cause – or more accurately, “constitute” – the latent factor. Mathematically,cannabis cultivation technology the model would be represented by an equation of the form F = ΣXi+e but F is now the dependent variable, and there is a single error term for the factor, rather error terms for each observed variable . That means that anything distinctive or idiosyncratic that distinguishes two observed variables – say, withdrawal symptoms vs. interference with important activities – is part of the construct and its measurement. As a result, formative models are not assumed to be “unidimensional” and indeed, some heterogeneity among the criteria is seen as desirable. Formative models are not inductive, at least not in the sense that a latent construct emerges from the observed variance of a reflective model. Rather, formative models are a form of “measurement by fiat.” The analyst, or some other authority, decrees that certain observable criteria will collectively constitute what the latent construct actually means. An example is professional accreditation . A formative model seems to better capture the way many psychiatrists actually debate the DSM criteria, and it also better characterizes the actual decision process – organizational fiat – that determines which criteria are included vs. excluded. But a growing number of simulation studies show that when data actually have a formative structure, fitting them using reflective models can lead to significantly biased and misleading estimates of model fit and factor scores [e.g., ]. Whether this helps to explain the puzzle I noted earlier – high unidimensionality despite what are surely conceptually distinct DSM criteria – would probably require focused re-analysis of major DSM data sets1 . There is, however, an alternative psychometric model that might produce unidimensionality despite conceptually distinct measures – a Guttman scale [or a stochastic variant, the Mokken scale; see Ref. ]. As suggested by Figure 1C, the variables in a Guttman scale have a cumulative structure. An example might be a diagnosis of AIDS; anyone who has the disease AIDS is infected with HIV, and anyone who is infected with HIV was exposed to HIV at some earlier date when they were still HIV-negative. Thus if we determine that someone is HIV-positive, we can conclude that they were exposed to HIV, but we cannot conclude that they have AIDS or will necessarily have AIDS in the future.

Like formative models, Guttman scales can emerge by fiat: with rare exceptions, we decree that those with a Ph.D. must have a Bachelors Degree, and that those with Bachelors Degree must have completed high school. Alternatively Guttman-scaled phenomena can emerge through a chain of causal processes that occur in a consistent order. If the DSM criteria formed a clear Guttman scale, this might provide a tidy resolution to the puzzle noted at the outset – the fact that psychiatrists argue over the distinct features of DSM criteria and yet claim that the DSM provides a unidimensional diagnosis of substance use disorder. But the empirical literature is not encouraging. I have only been able to locate two studies that test whether the DSM substance-use criteria form a Guttman scale. Kosten et al. computed Guttman scale scores using DSM-III-R criteria for each of seven substance classes for 83 psychiatric patients. Carroll et al. followed the same procedures using DSM-IV criteria for six substance classes for 521 people drawn from a variety of different clinical and general population sources. Across four substance classes, the Guttman reproducibility coefficients averaged 0.89 for the DSM-III-R study and 0.80 for the DSM-IV study. Common benchmarks for this coefficient are 0.85 or 0.90; diagnoses met the lower standard in both studies for alcohol, cocaine, and the opiates, but not for sedatives, stimulants, or marijuana. More troublingly, if we limit the focus to four criteria that are roughly the same in both version of the DSM – withdrawal, tolerance, “giving up activities,” and “use despite problems” – their relative rankings within a given substance category are inconsistent across the two studies, with correlations ranging from −0.57 to 0.11 . Granted, some differences are expected due to differences in year and sample, but it is difficult to see anything like a coherent Guttman measurement model either within or across substance categories. Another source of evidence comes from comparisons of the prevalence of each criterion, by substance, in different studies. If the criteria come close to forming a Guttman scale, then different studies should find a similar ordering, with the prevalence of some criteria being consistently higher than other criteria . I compared prevalence estimates from three samples . The average correlations of the criterion ranks across studies were only 0.54, 0.32, and 0.25 for cannabis, opiates, and cocaine, respectively. One reason why items have Guttman scale properties is if they have form a simple causal chain , but the evidence against a Guttman scale interpretation, reviewed above, also casts doubt on any simple causal model. Figure 1E shows a causal model that is more complex than a simple causal chain. Even a cursory examination of the DSM substance-use criteria suggests that they might have this kind of complex internal causal structure. First, many of the criteria require the clinician or the patient to make causal attributions: e.g., “Recurrent use resulting in a failure to fulfill important role obligations” or “continued use despite recurrent social problems associated with use” . Second, many of the criteria are likely to have causal linkages to each other. For example, tolerance implies that the user will seek larger doses, which might well increase the time taken to obtain the drug will increase. Withdrawal symptoms and craving have long been implicated in income-generating crime, needle sharing, prostitution, and other forms of physically hazardous and socially dysfunctional behavior [e.g., ]. Third, in addition to any psychopharmacological mechanisms, most of the criteria are causally influenced by the social, cultural, economic, and legal context in which substance use takes place [see Ref. ]. A striking illustration comes from clinical trials for heroin maintenance in Europe ; when registered addicts are allowed easy access to high-quality heroin, their criminality drops, their health improves, and they are increasingly like to hold a job.