What is the ESI?


Effect Size

What is the ESI?

Interpreting Effect Sizes

Effect Sizes and NNTs


Effect Size


In health research, effect size has become a commonly used expression of the effectiveness of a treatment, intervention, or program. In general terms, “effect size” means the strength of the relationship between two variables, such as when calculating an odds ratio or correlation coefficient, however there is another unique meaning of this term – the standardized mean difference or SMD. With this specific meaning in mind, use of the term effect size is applicable when working with continuous variables (e.g., weight, depression symptom scores, pain severity, cognitive test performance, blood pressure, etc.). In its simplest form, effect size is calculated by taking the difference of two means and dividing it by their pooled standard deviation.


General equation:

Effect size = (mean1 – mean2) / SDP


It is not a measure of statistical significance, but rather it provides a standardized assessment of the magnitude of difference between two groups. For treatment studies it provides an indication of the magnitude of the treatment’s effect. Unlike statistical significance, it is not particularly sensitive to sample size. Therefore, it can be useful in revealing statistically significant findings that are essentially clinically meaningless, or potentially clinically important findings in which the role of chance cannot be excluded.


While there are several ways to calculate effect size, Cohen’s d and the slightly more conservative Hedges’ g are the most commonly used. The resulting value is, in essence, a signal to noise ratio – the higher the number the greater the signal relative to the noise. However, putting the calculated value into context requires due consideration of other factors and an appreciation of its assumptions and limitations (see below).


Back to the Top


What is the ESI?


Gardner’s Effect Size Illustrator (ESI) is an intuitive, clinically informative, interactive online illustrator that supports the clinician, researcher, educator, and policy maker in determining the apparent importance of the effect size observed between two treatment groups using an advanced graphical approach. Effect sizes are calculated based on the assumption that the distribution of responses around the means are normal. This implies that the mean and median are the same, or at least very similar. The effect size illustrator is based on the same assumption and illustrates not only the means of the groups being compared but also the distributions around each mean.


This assumption allows for a useful interpretation of the results when calculating an effect size. The effect size illustrator shows the percentile ranking of those in one group relative to the average person in the other group. For example, a calculated effect size of 0.5 (when the pooled SD is twice the size of the difference in means) indicates that the average person in one group (the 50th percentile) is at the level of the 69th percentile of the other group. If this were based on a study evaluating two treatment options for pain, it would be said that the average person in group 1 would have less pain than 69% of people in the other group. If the effect size was 0.47 the average person in group 1 would be doing better than 68% of those in group 2 in terms of pain severity. This percentile comparison, especially when simultaneously illustrated, provides for a clearer indication of how the two groups compare. A table is provided that relates the calculated effect size to the estimated percentile comparisons.


Regarding research on health interventions, the ESI is a unique resource that can help clinicians and health policy makers use evidence from research studies in making patient-specific decisions and health policy decisions that affect our community. This online tool allows users to input data from continuous outcome measures taken directly from the results of a clinical study (e.g. randomized controlled trial or meta analysis) to create illustrations that provide specific visualizations and interpretations of effect size calculations.


Back to the Top


Interpreting Effect Sizes


It is important to address the mythologies that relate to the interpretation of effect size values. Professor Jacob Cohen reluctantly suggested thresholds of 0.2, 0.5, and 0.8 as indicators of small, medium, and large effects. However, in his statistical text he warns:


“The terms ‘small’, ‘medium’, and ‘large’ are relative . . . to each other . . . the definitions are arbitrary . . . these proposed conventions were set forth throughout with much diffidence, qualifications, and invitations not to employ them if possible. … The values chosen had no more reliable a basis than my own intuition.”


In: Cohen J. Statistical power analysis for the behavioural sciences. Second edition, 1988. Hillsdale, NJ: Lawrence Earlbaum Associates. p. 532.


Despite Professor Cohen’s warnings and pleadings not to use them, these thresholds are being perpetuated and applied uncritically as if they provide meaningful distinctions. The Effect Size Illustrator allows for a more precise interpretation of the effect size expression and in doing so obviates the need for arbitrary thresholds.


It is also important to keep in mind that fully understanding the meaning of any calculated effect size goes beyond the number calculated or any picture illustrating the two groups’ means and variances. It requires consideration of the baseline severity, the amount of change observed within groups, the difference between means, the measurement used, the study’s methods, and related clinical and research findings.


The following questions provide a guide to understanding the implications of any calculated and illustrated effect size:

What is the effect size?

The larger the number the greater the difference between the two groups being compared. It is important to keep in mind that larger effect sizes can be a result of important differences in means between groups OR small variances in individual responses within groups.

Were change scores or final scores used to calculate effect size?

Calculations can be very different depending on whether you calculate an effect size based on changes in score from baseline (“change scores”) or from scores observed at the end of the study (final scores). An impressive appearing effect size from change scores can be quickly nullified, in terms of its importance, when the final scores are considered.


Baseline A1c

Change in A1c

Final A1c

Antidiabetic drug + diet




Diet only




The change score effect size is (0.5-0.3)/0.2=1.0 and the final score effect size is (8.2-8.0)/0.6=0.33. Drug treatment confers a precise advantage in terms of change from baseline. However, the amount of change is small when the final scores are compared.

What is the difference in means between the two groups?

To more accurately understand the clinical meaning of the calculated effect size it is important to look at the difference between the groups’ mean scores and consider its clinical meaning. Is the difference in means an important difference or is it relatively trivial? What is the minimal difference that would be considered to be important or worthwhile?

What is the variance measured?

It should be noted that effect size calculations are sensitive to the standard deviation calculated. A large effect size may result from a small standard deviation as opposed to an impressive mean difference between groups. A homogeneous set of responses (small standard deviation) may not represent the effect of the intervention when used under normal clinical circumstances. The sample studied may not be representative of the clinical population. A larger variance might indicate either the presence of subgroups who respond well and less well to the treatment or that treatment response is highly unpredictable and varies from positive to negative among individuals.

What was measured?

All outcome measures are not created equal. An effect size of 0.9 that measures pain is likely more clinically important than an effect size that measures cortisol levels. The former is a patient-oriented effect that matters whereas cortisol levels are non-specific biomarkers that may or may not have any relevance to the individual’s current or future well-being. It is also important to consider other characteristics about the measure including its validity and reliability. Is it a measure of what it says it measures? Is it sensitive to changes in the individual’s status? Does it reliably reflect the individual’s status and does it do it similarly when repeated?

If the measure is a biomarker (or surrogate outcome), is this difference associated with clinically important differences. How strong or weak is the link between the biomarker and meaningful clinical outcomes.

What else could have affected the results?

The findings of comparative studies can be a result of chance, differences caused by the intervention under study, confounding variables, and a multitude of possible biases. Better study designs are used to markedly reduce the latter two concerns and are able to report the role of chance (e.g., p-value). However the effect size calculated still has a rather narrow interpretation. It applies to the people who were studied (or others who would be considered “not different in important ways from those studied”), for the duration that they were studied, and under the circumstances of their treatment (e.g., adherence to the treatment protocol, the frequency or dose of the intervention, the training of the provider of the intervention, the purity of the intervention, and other considerations such as history of other treatments and concurrent interventions).

What have other studies found?

Of course the effect size estimated from a single comparative study or a single meta-analysis does not exist in isolation. It adds to a pre-existing and always expanding knowledge base that includes related research and theories as well as clinical and patient experiences. The meaning of the effect size at hand should be considered in the context of other findings.

Back to the Top

Effect Sizes and NNTs


The translation of research findings into clinical practice requires that clinicians, educators, and policy makers be competent in understanding pivotal research. Significant progress has been made to the benefit of clinicians regarding understanding the clinical importance of results that are based on dichotomous outcomes (e.g., calculating the number needed to treat (NNT) for one person to benefit). Intuitive online illustrators are also available that can be used to translate study results using dichotomous variables (e.g., responder/non-responder) to support informed decision making (e.g. Chris Cates Visual Rx). Similar tools, such as the Effect Size Illustrator, have long been needed to support the clinical interpretation of findings that are based on continuous variables, for example symptom scales (e.g. pain, depression), severity of impairment or disability, quality of life, cognitive test performance, laboratory indices, exercise tolerance, and so on, all of which serve as the basis of effect size calculations.


Back to the Top