Get acquainted with four of the abstract submissions for EUROSPINE 2021 in Vienna, which have been elected as Best of Show and will be presented in the Best of Show and Award papers session
Gain in HRQL after ASD Surgery is maintained between 2 and 5 years’ Follow-Up. Results from A Prospective Multicentre Observational Cohort Study
Presentation by Ferran Pellisé, Alba Vila-Casademunt, Maria Capdevila-Bayo, Susana Núñez-Pereira, Aleix Ruiz-De Villa, Sleiman Haddad, Javier Pizones, Frank Kleinstück, Manuel Ramírez-Valencia, Ibrahim Obeid, Ahmet Alanay, Anne Mannion, ESSG European Spine Study Group
Introduction: Despite the increasing number of surgeries done for Adult Spinal Deformity (ASD), there is a lack of data with more than five years of follow-up. Most studies on ASD have no follow-up beyond two years after surgery and those that do exist are analyses done in small, single-centre cohorts or with considerably low follow-up rates. Emphasis on value-based outcomes and long-term follow-up studies are needed. The main objective of our study was to investigate the durability of ASD surgical outcomes. The secondary objectives were to evaluate predictors of loss to follow-up at five years and to search for predictors of HRQOL at five years postop.
Methods: We included all surgical patients enrolled in an international (5 sites) ASD database with more than 5 years of follow up (operated before March 2015) with at least 2 year follow up data. We analysed the long term (5 years postop) ASD surgical treatment outcomes (radiographic correction and HRQoL) and rates of adverse events.
First of all, predictors of loss to follow-up at 5 years were evaluated to rule out selection bias during subsequent analyses. To do so, different logistic regression models controlling for confounding variables were constructed.
After the analysis on the selection bias for the 5 years of follow up, we assessed 2- and 5-years surgical outcomes: adverse events (complications and reinterventions), HRQL (ODIv2, SF-36v2.0™, SRS-22R) and standing radiographic (coronal and sagittal) parameters.
Complications and reinterventions at 2 and 5 years of follow-up were descriptively compared. Rates of complications were analysed before 2 years and between 2- and 5-years postop dividing them by their impact (minor and major) and by their category (mechanical, medical, infectious and neurological). Rates of reinterventions were also analysed dividing them by the adverse event leading to the reintervention.
HRQOL (ODIv2, SF-36v2.0™, SRS-22R) and radiographic outcomes were analysed between preop-2 years, preop-5 years, 2 years-5years. HRQOL was also evaluated by analysing the proportion of patients reaching published PASS (Patient Acceptable Symptoms State) and MCID (Minimal Clinical Important Difference) using multivariable linear regression, controlling for confounding factors.
We also evaluated predictors (preoperative demographic variables, HRQOL, radiographic characteristics and surgical correction), for HRQOL at 5 years postop using multivariable linear regression, controlling for confounding variables.
Results: There were 1237 surgical patients in the database at the time of the analysis. 361 ([77.8% women; mean (SD) age 52.1 (19.17) y), mean 8.9 fused levels, 16.6% 3CO, 36.3% pelvic fixation, 94.6% posterior only] were operated before March 2015 and were eligible for the present study.
316 (87.5%) had follow-up data at 2 years and 258 (71.5%) at 5 years postop. Lack of 5 years follow-up was related to site (p<0.05) but not to baseline patient characteristics (demographic, radiographic) or 2 years follow up outcomes (HRQL, major complications, reinterventions, radiographic parameters).
There was no change (p>0.05) in coronal alignment, lumbar lordosis, PI-LL mismatch or SVA from six weeks postop to 5 years postop. A significant increase in T2-T12 kyphosis (43.4º vs 50.6º, p=0.02), PT (18.1º vs 21.7º, p=0.02) and global tilt (18.6º vs 24.4º, p=0.03) was observed between 6 weeks and 5 years postop. The incidence of major complications (24.9% vs 10.5%, p<0.001) and reinterventions (18.8% vs 12.2%, p<0.0018) was greater during the first 2 years than between 2 and 5 years. Mean HRQL scores, proportion of patients reaching MCID and PASS, and satisfaction with treatment were similar at 2 years and 5 years postop.
Worse baseline HRQL scores and sagittal alignment (Global Tilt, PI-LL mismatch) were associated (p<0.05) with a greater gain in 5 years HRQL while postoperative major complications and reinterventions were associated with a lesser gain (p<0.05).
Conclusions: This study represents the largest prospective multicentre surgical cohort of ASD patients, with more than 5 years of follow up reported in the literature to date. This study provides strong evidence to suggest that surgery for ASD is associated with durable outcomes that do not deteriorate over time. Even though, major complications and reinterventions occurred in 10% of the patients between 2 and 5 years, suggesting that adult deformity treatment is not definitively resolved at the 2-year benchmark. The 5-year follow-up rate was not related to patient characteristics, surgical outcomes (HRQOL, deformity correction) or complications at 2 years. The extent of the gain in HRQL at 5YFU depends on baseline HRQL and sagittal alignment, as well as the occurrence of major complications and unplanned reinterventions. The observed increase in thoracic kyphosis and PT throughout follow-up should be further analysed in future studies.
Is VBT safe and effective for ≥60º main thoracic curves? A matched cohort analysis
Presented by Yilgor C, Miyanji F, Parent S, Blakemore L, Neal K, Hoernschemeyer D, Lonner B, Samdani A, Newton P, Alanay A and Harms Non-Fusion Study Group
Background: Vertebral body tethering (VBT) has emerged as a promising non-fusion surgical technique that offers curve stabilisation, preservation of functional motion and continued spinal and chest cage growth. Although experimental and clinical evidence has been growing for the last two decades, there still is a paucity of information on outcomes of VBT surgery. Patient preference and surgeon curiosity create a tendency towards applying VBT for bigger curves (i.e. curve magnitudes over 60º). However, the efficacy, let alone safety, of this procedure is not assessed for such magnitudes.
Purpose: The aim of the study is to compare clinical and radiographic results as well as complications and reoperations rates of VBT for large curves to VBT for smaller curves and to Posterior Spinal Fusion (PSF) for ≥60º Main Thoracic (MT) curves.
Methods: Prospectively collected fusion data and retrospectively collected non-fusion data were evaluated. A multicenter database was queried for JIS and AIS patients having 60º-80º baseline main thoracic curves who had ≥2 years follow-up after VBT surgery. Patients were 1-to-1 matched with 40º-60º MT curve VBT patients using UIV and LIV locations, Sanders Skeletal Maturity Stages and Risser Scores; and 1-to-1 matched with 60º-80º MT curve PSF patients using curve types, exact Cobb angles and Risser Scores.
Perioperative data, curve flexibility, pulmonary, mechanical (broken tether and proximal junctional failures) and growth-related complications (adding-on, overcorrection and crankshafting), and reoperations were compared using Chi-squared, Mann-Whitney U and one-way ANOVA. Radiographic success was defined as having ≤35º MT curve at final follow-up.
Results: 108 patients (93F, 15M) were included in VBT40-60, VBT60-80 and PSF60-80 groups (n=36 for each). On average, the cohort displayed significant growth potential (mean age 12.4±1.6y, median Sanders 3, median Risser 0 and 63.5% TRC open). Curve flexibility was similar among groups.
Additional interventions: for VBT40-60 were 3% thoracoplasty; for VBT60-80 were 6% annulotomy and thoracoplasty; for PSF60-80 were 19% anterior release, 27.8% thoracoplasty and 52.7% posterior column osteotomies. PSF surgeries lasted longer (306 minutes compared to 235 minutes for VBT40-60 and 248 minutes for VBT60-80) and resulted in higher estimated surgical blood loss (917 ml compared to 163 ml for VBT40-60 and 159 ml for VBT60-80). Length of hospital stay were similar among groups.
Pulmonary complications were similar among groups. Two out of three complications in VBT40-60 group were resolved with medical treatment, while the other necessitated an intervention. Both complications in VBT60-80 group were resolved with medical treatment, while both complications in PSF group necessitated an intervention.
Growth-related complications were similar among groups. One of three complications in VBT40-60 group necessitated a tether removal. Three out of five complications in VBT60-80 group necessitated a tether release or removal, while two were converted to fusion. One patient who had crankshafting and adding-on in the PSF group underwent a revision posterior spinal fusion.
Mechanical complication and reoperation rates were more frequent after VBT, but were similar between VBT groups. Most of such complications were broken tethers, some of which were revised with another VBT procedure, while others were converted to fusion. One mechanical complication observed in the PSF group was a proximal junctional kyphosis. There were one dural tear in each of the VBT40-60 and PSF groups, while no patient in VBT60-80 group experienced a dural tear.
Percentage of patients having ≤35º MT curve at last follow-up were 36%, 67% and 94% for the VBT60-80, VBT40-60 and PSF60-80 groups, respectively. Fusion was avoided in 86% of VBT40-60 and 92% of VBT60-80 patients (Figure 1).
Conclusion: Radiographic success rate (≤35º MT curve at last follow-up) of VBT for 60º-80º MT curves (36%) was lower compared to both VBT for 40º-60º (67%) and PSF for 60º-80º (94%).
For patients with 60º-80º preoperative main thoracic curves, VBT resulted in higher mechanical complication and reoperation rates compared to PSF. However, 92% of the patients avoided fusion at a mean of 3 years after surgery, a goal for those who prefer VBT.
Pulmonary and growth-related complications were similar, while mechanical complications were higher in both VBT groups compared to PSF. Future work is required to understand if and when 60º-80º curves should undergo VBT.
Artificial Intelligence can accurately and reliably detect traumatic thoracolumbar fractures on sagittal radiographs
Presented by Enrico Gallazzi, Guillermo Sanchez Rosenberg, Fabio Galbusera, Peter Varga, Boyko Gueorguiev, Mauro Alini, Giuseppe Rosario Schirò, Pietro Domenico Giorgi, Andrea Cina
The explosion of availability of labelled data, namely ‘big data’, and the exponential increase in computing performances of CPUs and GPUs has brought upon the era of artificial intelligence (AI). In general terms, AI ‘learns’ from input data and became able to generate an output based on some features of the inputs themselves: the most notable example of this model is image classification. Deep learning (DL) is a branch of AI based on neural networks, such as Convolutional Neural Networks (CNN). In particular, we used supervised neural networks where the input and the ground truths are known. These methods has been reported to perform as good or even better than humans in image classification. Thus, in the medical field, the area that benefited most from AI in recent years is radiology.
Thoracolumbar (TL) fractures are the most common traumatic fracture of the spine, with an incidence of 32-64/100.000 per year. The severity of traumatic TL fractures varies greatly, ranging from simple fracture without structural impairment to complete dislocation of the spine. Usually, the first diagnostic step in ER are sagittal and anteroposterior radiographs. However, they are not very reliable, with a worldwide reported false-negative rate ranging from 25 to 30%. In this context, developing and implementing a diagnosis aid tool specially trained for TL fracture identification in clinical practice could improve the diagnostic performances and reduce the rate of missed vertebral fractures in ER.
The first aim of this study was to develop an AI-based, DL algorithm capable of accurately detecting vertebral fractures in sagittal radiographs of the TL spine. To achieve this aim, we utilised a CNN-based supervised learning approach. Through a database search, we identified 151 patients with final diagnosis of traumatic TL fractures, for which sagittal X-Rays, CT scans and MRI were available. For the AI training purposes, the ‘ground truth’ was set by three expert spine surgeons, who evaluated the plain sagittal radiograph together with second level imaging, namely CT and/or MRI and annotated each single vertebra image as ‘fracture’ or ‘non-fracture’. Overall, 630 single vertebra images were annotated, of which 302 as ‘fracture’ and 328 as ‘non fracture’. The image dataset was split into a training set (N = 578) and a test set (N = 52). Two different DL architectures (i.e. VGG16 and ResNet18) were compared to identify the most performant. Overall, fractures distribution and AOSpine Type on our series was comparable to the one reported in literature, with 48% of fractures found from T12 to L2, and AO Spine types A4 (32%), A3 (26%) and A1 (25%) most frequently reported. Both architectures proved to be accurate in identifying fractures, with accuracies of 88% and 84% with ResNet18 and VGG16 respectively. The sensitivity was 89% with both architectures but ResNet18 showed a higher specificity (88%) compared to VGG16 (79%). The AUC was 0.88 with ResNet18 and 0.86 with VGG16.
The secondary aim was to gain a deeper understanding of the model’s interpretation of the ‘fracture zone’ through a heatmap representation. The heatmaps were used to visualise which part of the image was the most important for model classification decision. Thus, “warmer” zones indicated the presence of a fracture (Figure 1). The heatmaps were generated after model training using the computed optimal parameters.
In summary, the AI model presented in this study can accurately identify thoracolumbar vertebral fractures in sagittal radiographs. Specifically, a version based on ResNet18 architecture showed a better performance than those reported of expert surgeons and radiologists. Moreover, the heatmaps gave a deep understanding of model reasoning in the classification making them suitable in clinical practice. As a matter of fact, in two cases the model and the heatmaps had questioned the surgeon ground truth classification and, after evaluating the CT and MRI scans, the two cases were reassigned in the opposite class.
Based on those results, it seems plausible a future clinical application of this AI model to minimise diagnostic errors in fracture detection in sagittal radiographs of the TL vertebrae in an emergency setting.
Development of a cross-walk for the bidirectional mapping of two commonly used condition-specific patient-reported outcome measures, the Oswestry Disability Index (ODI) and the Core Outcome Measures Index (COMI)
Presentation by Anne F Mannion, Achim Elfering, Frank Kleinstück, Markus Loibl, Tamás F Fekete, Ferran Pellise, Alba Vila-Casademunt, Francine Mariaux, Sarah Richner-Wunderlin, Javier Pizones, Francisco S Perez-Grueso, Ibrahim Obeid, Ahmet Alanay, Adam Pearson, Jon Lurie, François Porchet, Dezsö Jeszenszky, Daniel Haschtmann. ESSG and Spine Tango Registry.
Have you ever tried to review the effectiveness of a particular spine intervention, only to find out that all the studies used different outcome measures, making it difficult to synthesise their findings? Or been forced to decline collaboration in a multicentre study because your routinely-used outcome measures were not compatible? Or tried to compare the post-surgery improvements in your own patients with those reported in the literature, only to be thwarted by the use of different outcome tools?
The aim of this study was to facilitate such comparisons in relation to the use of two commonly used patient-reported outcome instruments in patients with spinal disorders — the Oswestry Disability Index (ODI) and the Core Outcome Measures Index (COMI) — by creating a “cross-walk” derived from the data of thousands of patients. “Cross-walking” is a method of mapping scores on different instruments that measure similar domains. The ODI and the COMI don’t purport to measure exactly the same thing and so a “1-to-1” agreement between them would not be expected: the ODI is a disability questionnaire and the COMI is a multidimensional instrument covering not only pain and disability but also quality of life, symptom-specific well-being, etc. Nonetheless, both are condition-specific instruments that quantify the impact of the spinal problem on aspects of life important to the patient. We therefore hoped to generate a cross-walk that would allow us to (at the very least) interpret scores and change-scores of one in relation to the other for groups of patients, and maybe even for the individual patient, with a slightly greater margin of error.
Our study involved a secondary analysis of data from conservative and surgical patients with spinal disorders, collected during their involvement in one of two multicentre observational studies or an international spine surgical registry, in which they had been required to complete both an ODI and COMI at baseline and 1-year follow-up (FU). We included the data from a total of 3324 patients with a mean (SD) age of 57 (17) years, 60.3% of whom were female. Cross-walking requires that changes in outcomes from two measures in the same individuals be correlated and similarly responsive to change (Morris et al 2015). Hence, correlation coefficients were determined for the relationship between the two instruments’ baseline scores, their FU scores and their change-scores (from baseline to 1y FU), and regression equations predicting ODI from COMI and COMI from ODI were produced. The Cohen’s κappa for agreement (κ) was also calculated, with respect to achievement of the minimal clinically important change (MCIC) score on each instrument, using cut-offs of a 12.8-point change for the ODI (Copay et al 2008) and a 2.2-point change in the COMI (Mannion et al 2009). This was done using the actual change-scores recorded with the given instrument as well as the change-scores predicted on the basis of the alternative instrument. It was hypothesized that baseline, FU, and change-scores for the two instruments would be at least moderately correlated (r >0.5) and have moderately similar responsiveness (κappa >0.4 for agreement in % reaching MCIC).
The data showed that all pairs of measures were significantly (p<0.001) positively correlated: at baseline, the correlation coefficient was 0.73; at 1yr FU, 0.84; and for the change-scores, 0.73 (Fig 1, for COMI predicting ODI). Overall, 53.9% patients achieved MCIC based on COMI change-scores and 52.4%, based on ODI change-scores; on an individual basis, there was 78% agreement between them for achievement or not of the respective MCIC, with κ = 0.56. Simple algorithms are available to convert the scores (baseline, follow-up, or change-scores, depending on the need) for one instrument into those on the other, based on the corresponding regression equations.
The ODI is a long-established instrument, existing in dozens of languages, and is the most commonly used outcome tool in spine research, worldwide. However, since it strictly only measures pain-related disability, many research groups choose to employ additional tools to capture other outcomes such as pain intensity, quality of life, etc. The COMI was developed as a brief, multidimensional instrument to obviate this need for multiple questionnaires by covering all core domains with just one question each. It has been adopted for use in EUROSPINE’S Spine Tango Registry and validated in 16 languages, with additional language versions rapidly emerging, suggesting that its popularity is growing. Many institutions exhibit a preference for the use of one outcome instrument over another and have a history of data collection with their chosen instrument; the ability to convert scores between the two scales and share historical data via the developed cross-walk should open up more centres/registries for collaboration and facilitate the pooling of data in meta-analyses. The increased statistical power obtained from the merging of studies that have used either ODI or COMI should serve to increase our knowledge and evidence base. Our future analyses will examine whether there are differences in the conversion algorithms for subgroups related to pathology, severity, treatment group, gender, language, age group, yellow-flag status, etc.
References
- Copay AG, Glassman SD, Subach BR, Berven S, Schuler TC, Carreon LY (2008) Minimum clinically important difference in lumbar spine surgery patients: a choice of methods using the Oswestry Disability Index, Medical Outcomes Study questionnaire Short Form 36, and pain scales. Spine J 8:968-974
- Mannion AF, Porchet F, Kleinstuck FS, Lattig F, Jeszenszky D, Bartanusz V, Dvorak J, Grob D (2009) The quality of spine surgery from the patient’s perspective: Part 2. Minimal clinically important difference for improvement and deterioration as measured with the Core Outcome Measures Index. Eur Spine J 18:374-379
- Morris T, Hee SW, Stallard N, Underwood M, Patel S (2015) Can we convert between outcome measures of disability for chronic low back pain? Spine (Phila Pa 1976) 40:734-739.
For more information about EUROSPINE 2021 and to book your place, visit: https://www.eurospinemeeting.org/