Per Bech: Clinical Psychometrics

Hector Warnes’ Commentary


This is a 202 page book with 10 chapters, a glossary, 33 appendices, 192 references and an Index. Each chapter is devoid of mathematical formulas, has a strong clinical orientation, an amazing clarity and historical perspective, with countless figures and tables which are highly enlightening and erudite.  On the first page, the author cites Karl Jaspers, Aubrey Lewis and Max Hamilton, and dedicates the book to Ole Rafaelsen and Erling Dein.  Professor Per Bech expresses his gratitude to Peter Allerup, Professor of Theoretical Psychometrics at the Äarhus University, to Lone Lindberg his research coordinator, to Ove Aaskoven his statistical assistant and several others.

Per Bech received his medical degree from the University of Copenhagen in 1969.  In 1972, he received a gold medal from Äarhus University for his thesis on cannabis and psychometric tests that included time experience, reaction time and simulated car driving. His doctoral thesis on the validity of rating scales in depression and mania was completed in 1981 at the University of Copenhagen. From 1992 to 2008, he was Professor of Psychiatry at Odense University and since 2008, has held the position of Professor of Clinical Psychometrics at the University of Copenhagen. He is also chief psychiatrist and director of research at the Mental Health Centre North Zealand in Hillerod.

In the preface, the author outlines the Wundt and Kraepelin-inspired Pharmacopsychometric Triangle which consists of three parts: A) changes observed in the clinical effects of a drug administered to patients, with each psychometric scale designed to test a particular cluster of symptoms; B) adverse or side-effects; and C) patients' reported quality of life.

 The author separates clinical psychometrics into two eras. One covering the period from 1879 to 1945, starting with Wilhelm Wundt, who was the founder of psychometrics in 1879, along with his two pupils, Kraepelin and Spearman. While Kraepelin attended Wundt’s lectures and his laboratory practices, another American doctor, psychologist and philosopher was also in attendance. His name was William James, who did not practice medicine in spite of having an MD degree from Harvard Medical School (1872-1879).  James wrote his doctoral thesis on astereognosia and taught comparative anatomy and physiology at the school for many years.

The modern era of clinical psychometrics was launched with Eysenck, Hamilton and Pichot after 1945, but Professor Per Bech also wanted to acknowledge the contributions of Francis Galton, who founded a London psychometric laboratory in 1884, along with two of his disciples Pearson and Fisher.  Bech also recognized Rasch, Siegel and Mokken who were responsible for the development of psychometric analyses.

According to Bech, Kant was well aware that behind the phenomenon of pure reason there was another hidden reality and established the division of ‘das Ding für uns’ versus ‘das Ding an sich’, or appearance versus the hidden reality. The first led to phenomenology or psychopathology because it is based on events or symptoms as we perceive them in context, time and space when measuring them (quantity, quality, relation and modality). Professor Bech insists that the more experience the psychiatrist has the better he is able to distinguished traits, states, symptoms and gestures. Eventually, we should discover ‘the unknown’ underlying ‘Ding an sich’, which for Professor Bech points to biological factors.

Wittgenstein and Quine were considered by Bech as neo-Kantians in so far as they proposed the quantification of endophenotypes in order to sort out the hidden reality. Citing the exhausting “imagenological scanning” carried out by Nancy Andreasen, the author concludes that, according to her findings, schizophrenia affects many different regions of the brain that cannot be visualized, e.g., das ‘Ding an sich’.

Figure 1.3 in Chapter 1 shows us the earliest symptom check list (sorting cards) devised by Kraepelin.  Use of these cards led him to conclude that there are symptom clusters (‘shared phenomenology’) which persist over time and that in an 80% of patients the clusters were different for dementia praecox and manic-depressive illness. Kraepelin even tested the drugs available at the time (morphine, barbital and chloral hydrate) and found that the results were extremely poor in these two major psychoses.

Later in the chapter, Bech devotes a section to Eysenck, a prominent psychologist at the Maudsley, who was inspired by Jung’s typology (extroversion-introversion) and Freud’s soul-searching studies of neuroticism.  He introduced a Neuroticism Scale (Fig. 1.4) and an Extraversion Scale (Fig. 1.5) which appear to bear similarities to Spielberger’s trait anxiety scale (related to personality traits).

Personality traits are consistent behavior across situations to be differentiated from other personality models, such as the psychoanalytic model.  The psychodynamic formulation or interpretations between psychoanalysts show hardly any inter-rater reliability.  Further, any psychodynamic formulation of a case when compared with psychopathological measurements does not measure up to tests of reliability. The contextual or situationism (post-stress traumatic disorders) and the interactionism (a circular set of interactions between two people that invariably influence the response of the other) or, in general, the position of the observer, his theoretical biases, his experiences and, not to be dismissed, his Proteus inclinations, should not be overlooked.

We must keep in mind that psychometrics is not only the use of rating scales but also involves testing the theory behind its findings, its consensual validity and reliability, and factor analytic studies. In fact, psychometric scales were used by Fechner. He was able to measure the quantification of the stimuli and the degree of the psychological reaction to them including words, symbolic stimuli and even subliminal stimuli. We can see that Jung’s ‘word association test’ has been influenced by Fechner’s psychophysics. With time, there was a shift from the subjects' introspective observation of his internal states to the more behavioristic stimulus - overt response paradigm.

Hamilton was prominent in psychopharmacology in the boom time of the 1950s and became instrumental in the development of rating scales, which are still in use today.  He also did research, following scientific methods, of placebo-control, random assignment of patients, double blind trials and so on. Hamilton, influenced by Eysenck’s and Spearman’s factor analysis (a factor is one of the bases for structuring the experimental design) was able to differentiate between somatic and psychic anxiety symptoms.

Pierre Pichot studied psychometrics (in the faculty founded by Alfred Binet) at the Sorbonne immediately after getting his MD degree in 1947 and worked under Professor Jean Delay.  Pichot tested Overall and Gorham’s Brief Psychiatric rating scale and pointed out that out of 60 symptoms 18 were sensitive to change during chlorpromazine therapy in psychotic patients and imipramine therapy in depressive patients. In the BPRS there were three subscales: one for mania, one for depression and the other for schizophrenia.

Professor Bech emphasized the point that classical psychometrics in psychiatry has mainly been influenced by Kraepelin, Hamilton and Pichot, three outstanding psychiatrists. He further noticed that in using the Rorschach test, the coefficient of reliability or Kappa coefficient is around 0.50, yet to be clinically meaningful it must be around 0.80.

Georg Rasch, a Danish Professor of Statistics and Mathematics, wrote his thesis entitled On Matrix Calculus and its application in Differential Equations. The psychometric model developed by Rasch and inspired by his studies at Fisher’s London Institute became the basis of modern psychometrics. On page 35, Bech, based on Rasch postulates, outlined the invariant structure of the six depression symptoms: lowered mood, loss of interest and tiredness followed by anxious mood, guilt feelings and psychomotor retardation. On page 37, Bech, citing Rasch, writes: “If we want to know something about a quantity, then we have to observe something that depends on that quantity, something that changes if the quantity varies materially. In that case we have a sufficient statistic.” It must be pointed out that other studies have shown cross-cultural differences in this prevalence rate. In some, somatic symptoms predominate, in others, guilt feelings and in others, suicidal behavior.

In Chapter 7, Professor Bech offers us an insightful view on Hans Selye’s stress experiments (biological stress models that predict illness behavior), particularly ratings at  the work environment of patients, e.g., being listened to, search for meaning, achievements,  relevant information, social support, recognition, degree of demands and conflicts (Fig. 7.2). Professor Bech also elaborates on the Beck’s cognitive model of depression, which is indeed creeping into our society and goes mostly unreported: negative view of the future (hopelessness), negative view of the past (guilt feelings and or worthlessness) and negative view of the present (helplessness) time orientation, which are considered to be endophenotypes.

Heinz Lehmann describes the necessary or invariant core of depression which to-date is unsurpassed and is pointed out by Rasch (cited above): 1) reduction of interest (apathy); 2) reduction of capacity to enjoy (anhedonia); and 3) reduction of energy (asthenia), vital core symptoms to be  set apart from sufficient factors (hopelessness, guilt, somatization, etc) (p. 801).  It goes without saying that this triad should be present in the absence of organicity.  I understand that the division of organic and functional has been questioned.

For Lehmann, “the ideal rating scale should be constructed on the basis of both clinical experience and statistical analysis” and, most important, “it must be validated - that is proved that the scale really measures what it claims to measure - and its reliability, both between different raters (interrater) and at different points in time (test-retest) must be demonstrated” (p. 806). I would add that regarding the points of time, we should not be satisfied with weekly assessments but in long-term assessment of validity. Lehmann points out that a quantitative measurement of the severity of a disorder, the identification of special patterns or clusters of symptoms and finally an attempt to isolate personality characteristics for the prediction of risk and the treatment response are critical. The latter has developed further in the last decade.

Professor Bech could have written a more extensive glossary for didactic purposes. The word “clinimetrics” was introduced by Alvan R. Feinstein. High clinical validity (face validity) means that its questions correspond with the depression symptoms of the DSM-IV.

The Appendices are outstanding, indeed. They add considerable information about the minute and continuous research on the Hamilton’s scales and its analogues, the Montgomery-Äsberg Depression scale (MADRS), the Bech-Rafaelsen Melancholia Scale (MES), the major depression inventory (MDI), accompanied by a critical statement of the missing ítems in this inventory, the Bech-Rafaelsen Mania Scale (MAS), the Brief Psychiatric Rating Scale (BPRS), the NewCastle Diagnostic Depression Scale, and the modified PRISE questionnaire for side effects of antidepressants and etiological considerations in major depression through use of the Clinical Interview for Depression and Related Syndromes ( F-1 to F-16, CIDRS) (from pages 170 to 175).

 I shall try to complement this Review with comments not addressed by Professor Bech  but cited from the Encyclopedia of Psychology written by Eysenck, Arnold and Meili.  On page 958 of the Encyclopedia, the authors write: “A strict definition of scaling must be based on measurement theory”…, in other words, “Measurement consists in transforming an empirical system onto a numerically relational system.” Another way of expressing this is to establish a one-to-one mapping of a relational system to another for the purpose of establishing its validity (in turn, its validity is drawn from a priori axioms). There are one-dimensional and multi-dimensional methods of scaling and in my mind a critical component is the difference between the stimulus centered scaling and the reaction centered scaling which covers both judging individuals and their judgement” (p. 959). 

Professor Bech, like most psychiatrists, has set himself apart from factor analysis, mathematics, matrix algebra and statistics, not denying its scientific basis on its long-term usefulness.  In a  rather pragmatic point of view,  a clinical assessment of a patient during an hour would tell us phenomenologically far more that a 10-point scale which takes about 10 minutes to complete. We are, at times, complacent when the result of treatment confirms our presumptive diagnoses, which should not be taken for granted, because there are many variables at play in the outcome of treatment and when the contrary occurs, the word iatrogenia rarely is mentioned.

            The fifth edition of the DSM, headed by David Kupfer, was launched in 2012, after 20 years of stagnation and realization that, with few exceptions, the genetic, molecular, metabolic and cellular bases of mental illnesses were largely unknown, unlike the impressive advances in other medical fields such as cardiology. Kupfer’s aim was to change from the categorical to a dimensional spectrum of mental disorders. An important breakthrough was the research conducted by Jordan Smoller.  His team, who studied the genome of 33,000 patients who were diagnosed with five different mental disorders, were able to isolate four chromosomal loci associated with five disorders: autism, attention deficit disorder with or without hyperactivity, bipolar disorder, depression and schizophrenia.

            Measuring symptoms and signs is not an easy undertaking, unless they are correlated with biometrics and validated illnesses. In measuring a psychopathological cluster of symptoms, one has to evaluate the patients’ ability to communicate with the doctor and with himself (insight versus self-deception) and trust versus mistrust. Otherwise some scales would be inaccurate in their intended measurements.

            I must congratulate Professor Bech for a highly readable publication.  It is well researched, with multi-dimensional and integrative perspectives, and shows him to be a great clinician, academician and researcher.


Cross-Disorder Group of the Psychiatric Genomics Consortium. Identification of risk loci with shared effects on five major psychiatric disorders: a genome-wide analysis. The Lancet 2013; 381 (9875): 1371-9.

Eysenck HJ, Arnold W, Meili R.  Encyclopedia of Psychology. New York: Continuum Press; 1982.

Kupfer DJ. Regier DA. Neuroscience, clinical evidence and the future of psychiatric classification in DSM-5. Am J Psychiatry 2011 168 (7): 672-4,

Lehmann HE. Affective disorders: Clinical features. In Kaplan HI, Sadock BJ, eds.  Comprehensive Textbook of Psychiatry. Volume 4. Fourth edition. Baltimore: Williams and Wilkins; 1985.


Hector Warnes

November 3, 2016