Advertisement

SKIP ADVERTISEMENT

On Medicine

Surgical Checklists Save Lives — but Once in a While, They Don’t. Why?

Credit...Photo illustration by Cristiana Couceiro. Source photograph from Getty Images.

Late last year, I witnessed an extraordinary surgical procedure at the Cleveland Clinic in Ohio. The patient was a middle-aged man who was born with a leaky valve at the root of his aorta, the wide-bored blood vessel that arcs out of the human heart and carries blood to the upper and lower reaches of the body. That faulty valve had been replaced several years ago but wasn’t working properly and was leaking again. To fix the valve, the cardiac surgeon intended to remove the old tissue, resecting the ring-shaped wall of the aorta around it. He would then build a new vessel wall, crafted from the heart-lining of a cow, and stitch a new valve into that freshly built ring of aorta. It was the most exquisite form of human tailoring that I had ever seen.

The surgical suite ran with unobstructed, preternatural smoothness. Minutes before the incision was made, the charge nurse called a “time out.” The patient’s identity was confirmed by the name tag on his wrist. The surgeon reviewed the anatomy, while the nurses — six in all — took their positions around the bed and identified themselves by name. A large steel tray, with needles, sponges, gauze and scalpels, was placed in front of the head nurse. Each time a scalpel or sponge was removed from the tray, as I recall, the nurse checked off a box on a list; when it was returned, the box was checked off again. The old tray was not exchanged for a new one, I noted, until every item had been ticked off twice. It was a simple, effective method to stave off a devastating but avoidable human error: leaving a needle or sponge inside a patient’s body.

In 2007, the surgeon and writer Atul Gawande began a study to determine whether a 19-item “checklist” might reduce human errors during surgery. The items on the list included many of the checks that I had seen in action in the operating room: the verification of a patient’s name and the surgical site before incision; documentation of any previous allergic reactions; confirmation that blood and fluids would be at hand if needed; and, of course, a protocol to account for every needle and tool before and after a surgical procedure. Gawande’s team applied this checklist to eight sites in eight cities across the globe, including hospitals in India, Canada, Tanzania and the United States, and measured the rate of death and complications before and after implementation.

The results were startling: The mortality rate fell to 0.8 percent from 1.5 percent, and surgical complications declined to 7 percent from 11 percent. In the decade that ensued, surgical checklists proved effective in diverse settings. In South Carolina, in hospitals that implemented checklists, the 30-day mortality for certain surgical procedures fell to 2.8 percent from 3.4 percent. In the Netherlands, the deployment of checklists along a patient’s full surgical journey — from admission to discharge — caused a striking decrease in complications and mortality.

In 2014, Gawande’s team (now an organization called Ariadne Labs in Boston) started perhaps their most ambitious study on the impact of checklists. This time, the team focused its attention on the practice of childbirth in India: Could adherence to a checklist containing “essential birth practices” reduce the rates of infant and maternal mortality? The Ariadne team identified 60 matching pairs of facilities in cities like Lucknow and Agra in the state of Uttar Pradesh. A 28-item checklist was created: Was a clean towel provided at birth? Were there sterile scissors at hand? Did the birth attendant remember to wash his or her hands? Was the infant’s temperature measured after delivery? An intensive eight-month peer-coaching program to implement this checklist was used in half the paired hospitals. The study enrolled nearly 160,000 pregnant women. The chances of mother or infant dying, or of severe maternal complications, was measured. Despite the team’s strenuous attempts to implement this checklist, there was no discernible impact: The rate of adverse outcomes in the experimental group was identical to the rate in the control group — around 15 percent. About 7,400 babies were stillborn or died within the first week of life.

Image
Credit...Photo illustration by Cristiana Couceiro. Source photographs from Getty Images.

What happened? How could an idea that worked so effectively in so many situations fail to work in this one? The most likely answer is the simplest: Human behavior changed, but it didn’t change enough. Coached attendants washed their hands 35 percent of the time, while the uncoached group, the control group, washed only 0.6 percent of the time. Coached birth attendants measured a newborn’s temperature 43 percent of time, compared with participants in the control group, who measured it 0.1 percent of the time. Yet these differences in behavior weren’t ample enough to have an impact on maternal or fetal morbidity and mortality. (By contrast, in the original study on the eight hospitals, compliance with items on the checklist was typically 60 to 80 percent to begin with, and rose to nearly 100 percent after deployment.) The “childbirth checklist” may have been a perfectly effective intervention; it was the implementation of the list that failed in India.

It’s also possible that a host of “unknown unknowns” caused the failure: some fundamental feature of checklists, or their deployment, had been lost in translation from Boston to Agra. In medicine, we often expect robust interventions, typically proved by randomized trials, to keep working when moved from one context to the next. But not everything applies everywhere; context can be crucial. Every intervention cannot be tested in every context — that strategy would bust the bank — and so we use our best judgment to extend the data from one study to another. But we often fail in our judgment, and a constant vigilance is needed even after a study has shown proven benefit.

And the childbirth trial, notably, was not just a replica of earlier studies. A previous trial in Namibia — smaller and nonrandomized — worked: Adherence to essential birth practices increased to 95 percent from 68 percent, and infant mortality dropped. But features unique to labor and delivery in obstetric hospitals in India may have made checklists ineffective. What if, rather than an absence of knowledge, there was the perception of too much local knowledge: What if birth attendants did not bother using the checklists because they thought that they already knew what to do? Were there other habitual practices in Uttar Pradesh that made “checklisting” ineffective? And how do we learn to account for such local effects when shifting a medical intervention from one context to another?

Perhaps the most important insight from the “checklist in childbirth” study is the extent to which human behavior remains an uncharted frontier for medicine. In recent times, the imagination of experimental medicine has been dominated by mechanisms to alter human physiology. But these new drugs and treatments won’t work if we don’t simultaneously target human behavior: Our latest cancer immunotherapies or the newest cardiac drugs would be rendered useless if the patients don’t turn up for their infusions on time or if the nurses administer them to the wrong patients or doctors fail to note allergic complications. A high level of a particular form of cholesterol in the blood has turned out to be a powerful biomarker for cardiac risk, enabling doctors to prescribe Lipitor, say, to patients who carry this marker and decrease their chance of a heart attack. But there’s no biomarker to identify patients who actually comply with taking their pills on time.

In particularly risky circumstances, such as the treatment of TB or H.I.V. (in which the risk of drug resistance increases if medicines are taken off schedule), doctors have resorted to trials that observe patients actually swallowing their medicines in front of a physician’s eyes — a strategy called Directly Observed Therapy. These “DOT” trials work at first, but patients often become noncompliant in the real world once the direct observation ceases. And so, in time, do nurses, hospital staff and birth attendants: In the childbirth trial in India, 35 percent of the attendants started off washing their hands during the first months of the study, while coaching and supervision were still active. By 12 months, when the coaching had ceased, that proportion had dropped to 12 percent.

We might describe this situation as a “behavioral relapse,” akin to the physiological relapse of cancer or of an immunological illness. Unlike cancer, though, behavioral relapse has no measure: no marker, no biopsy, no powerful predictive test; it remains undetectable by most methods. As much as we need experimental tools to survey human physiology, doctors need experimental tools to understand, survey and change medicine’s least familiar frontier: human behavior.

Siddhartha Mukherjee is the author of “The Emperor of All Maladies: A Biography of Cancer” and, more recently, “The Gene: An Intimate History.”

Sign up for our newsletter to get the best of The New York Times Magazine delivered to your inbox every week.

A version of this article appears in print on  , Page 14 of the Sunday Magazine. Order Reprints | Today’s Paper | Subscribe

Advertisement

SKIP ADVERTISEMENT