Statistical Shape Modeling of US Images to Predict Hip Dysplasia Development in Infants
Abstract
Background
The current widely applied Graf classification used on US images for developmental dysplasia of the hip in infants does not enable prediction of the development and outcome of well-centered stable dysplastic hips (Graf type II).
Purpose
To use statistical shape modeling on US images to identify acetabular shape characteristics of Graf type II hips, which enable prediction of the development of Graf type II hips, and to identify which hips benefit from Pavlik harness treatment.
Materials and Methods
In this secondary analysis of a prospective multicenter randomized trial on treatment of 104 infants aged 3–4 months with Graf type IIb or IIc hip dysplasia conducted between 2009 and 2015, a statistical shape model was developed on baseline US images. With multivariable logistic regression adjusted for infant sex and treatment (Pavlik harness treatment vs active observation), shape modes were correlated with the outcomes of persistent hip dysplasia on US images (α angle <60°) after 12-week follow-up and residual hip dysplasia on pelvic radiographs (Tönnis classification: acetabular index greater than 2 standard deviations) around 1 year of age. An interaction term (treatment with mode) was used to investigate if this result depended on treatment.
Results
Baseline US images were available in 97 infants (mean age, 3.37 years ± 0.43 [standard deviation]; 89 [92%] girls; 90 cases of Graf type IIb hip dysplasia; 52 cases treated with Pavlik harness). Shape modes 2 and 3 of the statistical shape modeling were associated with persistent hip dysplasia on US images (odds ratio [OR] = 0.43; P = .007 and OR = 2.39; P = .02, respectively). Mode 2 was also associated with residual hip dysplasia on pelvic radiographs (OR = 0.09; P = .002). The interaction term remained significant after multivariable analysis, indicating that Pavlik harness treatment was beneficial in patients with negative mode 2 values (OR = 12.46; P = .01).
Conclusion
Statistical shape modeling of US images of infants with Graf type II dysplastic hips predicted which hips developed to normal or remained dysplastic and identified hips that benefited from Pavlik harness treatment.
© RSNA, 2022
Summary
Statistical shape modeling of US images of well-centered stable dysplastic hips in infants predicted which hips developed to normal or remained dysplastic and indicated which hips benefited from Pavlik harness treatment.
Key Results
■ In this secondary analysis of a randomized trial, statistical shape modeling of US images of 97 well-centered stable dysplastic hips predicted which hips developed to normal or remained dysplastic.
■ After 12 weeks, modes 2 (odds ratio [OR] = 0.43, P = .007) and 3 (OR = 2.39, P = .02) were associated with dysplasia on US images irrespective of Pavlik harness treatment.
■ Around 1 year of age, mode 2 was associated with dysplasia on radiographs (OR = 0.09, P = .002), but this was counteracted by Pavlik harness treatment (OR = 12.46, P = .01).
Introduction
Developmental dysplasia of the hip (DDH) is a common problem in newborns, with an incidence between 1.5 and 20 per 1000 births (1–3). The diagnostic classification depends on various factors, including timing and type of screening, and regional differences. The spectrum of DDH ranges from a stable hip with minor acetabular dysplasia to severe acetabular dysplasia with dislocation of the femoral head. The US classification of Graf considers the bony roof angle (α angle) and the position of the labrum and femoral head and is one of the most used classifications for diagnostic purposes in DDH (4). The α angle describes the acetabular dysplasia by the degree of acetabular inclination. A value of 60° or more is considered normal.
There is general consensus that a dislocated hip (Graf types III and IV: an eccentric hip with α angle <43°) is an indication for immediate treatment. However, there is no consensus on how and when to treat well-centered stable infant hips with acetabular dysplasia (Graf type II: α angle of 43°–59°) (5–8). All randomized trials in patients with stable DDH that compared bracing versus active observation were not able to identify a benefit of bracing (6,7,9). Nevertheless, many well-centered stable hips are currently treated with an abduction device. Therefore, it can be concluded that a large percentage of these hips might be overtreated because more than 80% would have developed to normal without treatment (6–11).
Furthermore, the Graf method is reported to have a high variability and low agreement in all reported hip dysplasia metrics (12). US image quality and anatomic appearance of the hip can be affected by probe position, especially hips with α angles in the range of 43°–59° that may be incorrectly classified as dysplastic. This is probably one of the reasons that the Cochrane review considered all studies that investigated the effectiveness of the different screening programs to be underpowered (13).
Statistical shape modeling (SSM) is a method that can be used to quantify the shape of the acetabulum on US images (14–16). Previous literature suggests that SSM can differentiate shape differences due to orientation from the true anatomic differences (17). Quantification of the shape of the acetabulum with SSM on US images is likely to render more data related to the subsequent development of the hip compared with angle measurements alone. In this study, our first aim was to use SSM to identify acetabular shape characteristics of Graf type II hips, which enable prediction of the development of Graf type II hips, and to identify which hips benefited from Pavlik harness treatment. Our second aim was to identify artifacts due to probe positioning.
Materials and Methods
Patients
For this secondary analysis, eligible infant hips were retrieved from a prospective multicenter randomized trial in which the results of Pavlik harness treatment versus active observation in well-centered stable hips with hip dysplasia on US images were compared. This previously published trial was approved by the medical ethics committee (08/084) at University Medical Center Utrecht, the Netherlands. The full study design and outcomes were published in 2020 (6). According to the medical ethics committee, the design of the current study fell within the original approval.
Between 2009 and 2015, five participating centers throughout the Netherlands included 104 infants aged 3–4 months with stable hips at clinical examination and Graf type IIb or IIc on US images. They were randomly assigned to either Pavlik harness treatment or active observation. To evaluate the effect of treatment, infants were followed up with US and pelvic radiographs. Assessment was based on intention-to-treat analysis. Infants were excluded from this study if they did not have an available baseline US study (n = 7) (Fig 1).

Figure 1: Flowchart shows infant selection criteria.
US and Pelvic Radiographs
US and radiographic source files were retrieved from each participating center. US examinations were performed at inclusion and at 6- and 12-week follow-up according to the Graf method, which is the most applied method in the Netherlands (18). US was performed with a linear probe and without contrast material in all centers. Anteroposterior radiographs of the pelvis were obtained at the ages of approximately 10 and 24 months as part of standard care in the Netherlands, with variations in timing between the centers.
Outcome Definitions
Measurements were retrieved from the previously published trial. The outcomes were measured by one senior pediatric radiologist with more than 30 years of experience who was blinded for the study intervention. Our primary outcome was persistent hip dysplasia on US images, which was defined as an α angle of less than 60° at 12-week follow-up. With the Graf eligibility criteria, the best US images for measurement were selected (18). Secondarily, the rate of residual dysplasia on pelvic radiographs was defined according to the classification system by Tönnis and Brunken (19) with acetabular index (AI) greater than 2 standard deviations as relevant dysplasia. To allow for a potential treatment effect to occur, only the first follow-up radiograph after the age of 6 months was used.
Statistical Shape Modeling
BoneFinder, version 1.3.0 (developed by Claudia Lindner, Tim Cootes, and members of the Centre for Imaging Sciences at The University of Manchester) was used to develop an SSM of the bony roof of the acetabulum and the vertical cortex of the ilium on hip US images at baseline (20,21). With SSM, a global representation of the shape is attained with multiple data points rather than a reduction of the shape to a few data points of mere geometric measurements. This outline method with predefined anatomic landmark points allows for quantification of more subtle shape aspects of the acetabulum. An SSM is generated based on the landmark points that quantify the mean shape of the object. With principal component analysis, all shape variation is captured in specific modes. Each mode is statistically independent and depicts a distinct shape variation from the mean shape. The shape modes produced by the SSM are ranked in order of the proportion of shape variation explained in the image set. The first shape mode explains the highest proportion. Every individual object in the image set has a designated value (standard deviation) that describes the deviation from the mean acetabular shape within that mode of shape variation.
Reproducible landmarks, representing key anatomic locations, were manually selected (ie, start of the acetabular slope, lower limb of the ilium) and connected by landmarks with equal spacing in between. A set of 13 landmarks, which were manually positioned on each individual hip US image, defined the final acetabular shape model. This was done by one PhD candidate (J.M.B.) with 3 years of experience in image processing who was not blinded to treatment or outcomes. For each patient, multiple US images were available; the best image at baseline also according to Graf eligibility criteria was chosen for the landmark annotation. Even though comparable images were selected, the acetabular shape was influenced not only by true anatomic variations but also by variations in probe position. The first five acetabular shape modes from the SSM explained 95% of the total variation in shape of the 97 included hips, which were retained in the analyses.
To ensure intraobserver reliability, a reader (J.M.B.) annotated 30 randomly chosen images a second time 9 months later. Per-mode intraclass correlation coefficients (ICCs) were calculated for one rater (J.M.B.) based on a two-way mixed-effects model. An ICC less than 0.50 indicated poor reliability; an ICC of 0.50–0.74, moderate reliability; an ICC of 0.75–0.90, good reliability; and an ICC greater than 0.90, excellent reliability according to current literature (22). The ICC for intraclass variation in mode 1 was excellent (ICC = 0.93), and in modes 2 and 4 it was good (ICC = 0.82 and 0.79, respectively) (Table 1). However, in modes 3 and 5, it was moderate (ICC = 0.65 and 0.51, respectively).
![]() |
All patients in this study were also included in the previous study (6). However, there was no relevant overlap with this study, which analyzed baseline US images to predict outcomes. Data generated or analyzed during this study are available from the corresponding author, by request.
Statistical Analyses
In case of bilateral stable hip dysplasia, a hip was randomly selected with a number generator to prevent data dependency within individual infants. Differences in baseline characteristics between treatment arms were compared with the independent samples t test. In cases with sufficient data, categorical variables were compared with the χ2 test; otherwise, the Fisher exact test was used. To analyze the association between outcome (persistent hip dysplasia = α angle <60°) and residual hip dysplasia (AI > 2 standard deviations) and each shape mode, logistic regression was used. Multivariable analysis was performed with all shape modes and was adjusted for infant sex and treatment (Pavlik harness or active observation). An interaction term between shape modes and treatment was added to the logistic regression model to analyze whether the shape modes could be used to predict the effect of Pavlik harness treatment. Interaction terms that were not significantly associated were removed from the model. Odds ratios (ORs), 95% CIs, and P values were calculated. An association was considered significant at P < .05. All statistical analyses were performed by a PhD student (J.M.B.) and an epidemiologist and orthopedic surgeon in training (W.P.G.) with 3 and 6 years of experience, respectively, in image processing in SPSS, version 24.0 (IBM).
Results
Patient Characteristics
Of the 104 eligible infants, seven did not have an available baseline US study and thus were excluded (Fig 1). Hence, 97 infants (89 [92%] girls and eight [8%] boys) were included. The mean age at inclusion and first US was 3.37 months ± 0.43. Fifty-two infants (54%; 48 [92%] girls) underwent treatment with a Pavlik harness, and 45 infants (46%; 41 [91%] girls) were actively observed. Bilateral stable hip dysplasia was present in nine (9%) of the included infants, in which a random hip was selected.
We found no evidence of differences between the treatment arms regarding baseline characteristics, which are presented in Table 2. The mean α angle at 12-week US follow-up was 60.1° (range, 41.0°–71.0°) in all 97 hips. In 87 (90%) of the included hips, the first anteroposterior radiograph at more than 6 months of age was retrieved, around 1 year of age (mean, 11.7 months; range, 6–24 months). The mean AI was 24.9° (range, 14.0°–39.0°). We found no evidence of differences between the treatment arms regarding α angle at 12-week follow-up (Pavlik harness treatment: 60.4° ± 4.5; active observation, 59.7° ± 6.0; P = .52) and regarding AI index around 1 year of age (Pavlik harness treatment, 25.0° ± 4.4; active observation, 24.9° ± 4.0; P = .96) (Table 3).
![]() |
![]() |
Modes at Baseline Associated with Persistent or Residual Hip Dysplasia
Figure 2 shows the locations of the 13 landmarks placed on the US images of each hip, defining the SSM. Mode 1 explained the highest proportion of variation in shape (65%) and was not associated with persistent or residual hip dysplasia at follow-up (α angle <60°: OR = 1.21; 95% CI: 0.71, 2.08; P = .35; AI > 2 standard deviations: OR = 1.15; 95% CI: 0.61, 2.18; P = .67) (Table 4). Negative values of mode 2 and positive values of mode 3 were associated with persistent hip dysplasia on US images at 12-week follow-up (mode 2: OR = 0.43; 95% CI: 0.23, 0.79; P = .007; mode 3: OR = 2.39; 95% CI: 1.12, 5.07; P = .02). Negative values of mode 2 were also associated with residual hip dysplasia on pelvic radiographs at around 1 year of age (mode 2: OR = 0.09; 95% CI: 0.02, 0.43; P = .002). Therefore, opposite values (positive values of mode 2 and negative values of mode 3) were associated with an α angle of more than 60° at follow-up, and mode 2 was also associated with AI of less than 2 standard deviations at follow-up. The interaction term between mode 2 and the treatment group remained significant after multivariable analysis, counteracting the association between mode 2 and residual hip dysplasia, indicating a potential beneficial treatment effect of Pavlik harness in hips with a negative mode 2 value (mode 2 interaction term: OR = 12.46; 95% CI: 2.00, 77.71; P = .01).

Figure 2: Hip US image in a 3-month-old girl at baseline. No contrast material was used. The presented hip is outlined with the acetabular shape model, consisting of 13 points.
![]() |
In mode 1, negative values depicted a relatively short ilium and long acetabulum compared with the mean shape (Fig 3). For positive values, this ratio was the opposite. If the US probe is rotated incorrectly, the image may depict an oblique plane, in which the ratio of ilium-to-acetabulum length will be larger or smaller compared with the optimal plane, depending on rotational direction. Mode 1 seemed to represent this feature. Negative mode 2 values were the typical flat bony rim in which the ilium transitions into the bony roof of the acetabulum, also described by Graf et al (4). Positive values of mode 3 were characterized by a sharpening of this transition zone compared with the mean shape in our study sample. Positive values of mode 2 and negative values of mode 3 were characterized by a well-developed slope of the bony rim. Individual hip US examples of mode 1 are provided in Figure 4, mode 2 in Figure 5, and mode 3 in Figure 6.

Figure 3: All five modes retained in the statistical shape modeling, which explained 95% of shape variation in all 97 infant hips. Every mode has the same mean shape. For clarity, the modes are depicted with 2.5 standard deviations on each side from the mean shape. The vertical line highlights the same point representing the start of the slope of the bony acetabular roof in each mode. In mode 1, the probe orientation was largely captured and did not associate with persistent or residual hip dysplasia. Boxes indicate the most important shape differences, namely negative mode 2 and positive mode 3, which were significantly associated with persistent hip dysplasia on US images (α angle <60°). Negative values of mode 2 were also associated with residual hip dysplasia on pelvic radiographs around 1 year of age (acetabular index > 2 standard deviations).

Figure 4: Examples of hip US images obtained with mode 1; no contrast material was used. The presented hips were (A) −1.58 (girl), (B) +0.11 (girl), and (C) +1.80 (boy) standard deviations from the mean shape.

Figure 5: Examples of hip US images acquired with mode 2; no contrast material was used. The presented hips were (A) −1.83 (girl) and (B) +2.12 (girl) standard deviations from the mean shape.

Figure 6: Examples of hip US images acquired with mode 3; no contrast material was used. The presented hips were (A) −1.90 (girl) and (B) +1.49 (girl) standard deviations from the mean shape.
Discussion
We developed statistical shape modeling (SSM) for hip US images to predict the development of stable dysplastic hips and to assess the correlation with treatment with the data from a randomized clinical trial on Pavlik harness treatment of Graf type II hips by Pollet et al (6). The effects of probe positioning were also evaluated. The SSM discriminated between acetabular shape differences due to probe orientation (mode 1) from true anatomic differences (modes 2 and 3). Negative values of mode 2 and positive values of mode 3 were associated with persistent hip dysplasia (α angle <60°) at 12-week follow-up (mode 2: odds ratio [OR] = 0.43; 95% CI: 0.23, 0.79; P = .007; mode 3: OR = 2.39; 95% CI: 1.12, 5.07; P = .02). Negative values of mode 2 were also associated with residual hip dysplasia (acetabular index [AI] > 2 standard deviations) around 1 year of age (mode 2: OR = 0.09; 95% CI: 0.02, 0.43; P = .002). In the second analysis, the interaction term between mode 2 and treatment (Pavlik harness vs active observation) was significant (mode 2 interaction term: OR = 12.46; 95% CI: 2.00, 77.71; P = .01), indicating that hips with negative mode 2 values benefited from Pavlik harness treatment. Characteristics of negative mode 2 values were the typical flat bony rim in which the ilium transitions into the bony roof of the acetabulum. Positive values of mode 3 were characterized by a sharpening of this transition zone compared with the mean shape in our study sample.
Current diagnostic parameters for stable DDH only poorly correlate with outcome and cannot be used to identify hips that will have persistent or residual hip dysplasia with or without treatment (6–11). In the context that around 80% of Graf II hips will develop to normal without treatment, there is a need to extract more data from US images to identify hips that are at risk for persistent or residual hip dysplasia at follow-up (8). Increased accuracy of diagnosis will decrease overtreatment of hips with a DDH Graf type II diagnosis that actually fall in the spectrum of normal hips. Quantification of the shape of the acetabular bony rim with SSM rather than the α angle can be used to more accurately predict which stable hips will deteriorate, probably because more image data are used in the analysis of the US image.
The generally poor correlation of the Graf method with outcome is likely at least partly associated with the high variability and low agreement in all reported hip dysplasia metrics, with no improvement in the last 30 years (12). In our study, relatively good interobserver reliability was found, suggesting SSM could provide a more reliable classification system. Developments in the field of SSM include fully automatic shape model matching, which may even further reduce interobserver variability using SSM (23).
Our study had limitations. First, the time of follow-up at which the radiologic outcome was measured varied slightly. Second, the output SSM will have an association with the quality of the image input, as is the case for the currently used diagnostic outcomes by Graf et al (24). As our study used US images from a multicenter randomized clinical trial, those images most likely represented the normal variation in quality of US images for the average of projection variation. Third, in our SSM, we only modeled the shape of the ilium and acetabulum because the acetabulum is the primary dysplastic bone for the diagnosis of DDH, according to the Graf method. The different subtypes in the classification of stable hip dysplasia (which is the population in this study) are primarily based on acetabular measurements (18). We also did not yet include data on acetabular cartilage, which might play an important role in DDH and might further improve the outcome of this SSM model. Fourth, the intraobserver reliability in the modes was tested by only one reader who was not blinded to outcomes, with a risk of overestimation due to image memory. By incorporating a large time span of 9 months between the ratings, we tried to account for this possible bias. Fifth, because of our limited sample size, cross validation was deemed inappropriate to validate the results. Finally, a mainly Dutch sample of infant hips was included, which might lead to selection bias.
In conclusion, statistical shape modeling (SSM) identified acetabular shape characteristics on US images of Graf type II hips that predict acetabular development and show a correlation with treatment. The SSM was also able to differentiate acetabular shape differences due to probe positioning from true anatomic variations. SSM of US images might improve the diagnosis and treatment of developmental dysplasia of the hip in infants. An acetabular shape mode might be used to derive a quantitative parameter as a more accurate prognostic marker and prevent current overtreatment, which could have important medical and socioeconomic benefits. Also, SSM in software might help the examiner identify the ideal projection plane in a series of US images in real time. Further studies are needed to validate the predictive value of the identified shape modes.
Acknowledgment
We acknowledge F.J.A. Beek, MD, PhD, for his contribution to the radiographic outcome measurements.
Author Contributions
Author contributions: Guarantors of integrity of entire study, V.P., R.J.B.S.; study concepts/study design or data acquisition or data analysis/interpretation, all authors; manuscript drafting or manuscript revision for important intellectual content, all authors; approval of final version of submitted manuscript, all authors; agrees to ensure any questions related to the work are appropriately resolved, all authors; literature research, J.M.B., W.P.G., V.P., R.J.B.S.; clinical studies, J.M.B., W.P.G., V.P., R.J.B.S.; statistical analysis, J.M.B., W.P.G., H.H.W.; and manuscript editing, all authors
References
- 1. ; Canadian Task Force on Preventive Health Care. Preventive health care, 2001 update: screening and management of developmental dysplasia of the hip in newborns. CMAJ 2001;164(12):1669–1677.
- 2. . Developmental dysplasia of the hip: a new approach to incidence. Pediatrics 1999;103(1):93–99.
- 3. . Relative Risk and Incidence for Developmental Dysplasia of the Hip. J Pediatr 2017;181(202):207.
- 4. . The diagnosis of congenital hip-joint dislocation by the ultrasonic Combound treatment. Arch Orthop Trauma Surg 1980;97(2):117–133.
- 5. . Diagnostic and treatment preferences for developmental dysplasia of the hip: a survey of EPOS and POSNA members. J Child Orthop 2018;12(3):236–244.
- 6. . Abduction treatment in stable hip dysplasia does not alter the acetabular growth: results of a randomized clinical trial. Sci Rep 2020;10(1):9647.
- 7. . Immediate treatment versus sonographic surveillance for mild hip dysplasia in newborns. Pediatrics 2010;125(1):e9–e16.
- 8. . The natural history of abnormal ultrasound findings in hips of infants under six months of age. J Child Orthop 2018;12(4):302–307.
- 9. . Does early treatment by abduction splintage improve the development of dysplastic but stable neonatal hips? J Pediatr Orthop 2000;20(3):302–305.
- 10. . Early Diagnosis and Treatment of Congenital Dislocation of the Hip. Proc R Soc Med 1963;56(9):804–806.
- 11. . Treatment Patterns and Outcomes of Stable Hips in Infants With Ultrasonic Dysplasia. J Am Acad Orthop Surg 2019;27(2):68–74.
- 12. . A Systematic Review and Meta-analysis on the Reproducibility of Ultrasound-based Metrics for Assessing Developmental Dysplasia of the Hip. J Pediatr Orthop 2018;38(6):e305–e311.
- 13. . Cochrane Review: Screening programmes for developmental dysplasia of the hip in newborn infants. Evid Based Child Health 2013;8(1):11–54.
- 14. . Validation of statistical shape modelling to predict hip osteoarthritis in females: data from two prospective cohort studies (Cohort Hip and Cohort Knee and Chingford). Rheumatology (Oxford) 2015;54(11):2033–2041.
- 15. . Is patellofemoral pain a precursor to osteoarthritis?: Patellofemoral osteoarthritis and patellofemoral pain patients share aberrant patellar shape compared with healthy controls. Bone Joint Res 2018;7(9):541–547.
- 16. . Predicting OA progression to total hip replacement: can we do better than risk factors alone using active shape modelling as an imaging biomarker? Rheumatology (Oxford) 2012;51(3):562–570.
- 17. . Short-term variability of proximal femur shape in anteroposterior pelvic radiographs.
Paper presented at: Conference on Medical Image Understanding and Analysis (MIUA) ; 2011;London, England . - 18. . Essentials of Infant Hip Sonography: According to Graf. Sonocenter Stolzalpe. 2014.
- 19. . Differentiation of normal and pathological acetabular roof angle in the diagnosis of hip dysplasia. Evaluation of 2294 acetabular roof angles of hip joints in children [in German]. Arch Orthop Unfallchir 1968;64(3):197–228.
- 20. . Active Shape Models—Their Training and Application. Comput Vis Image Underst 1995;61(1):38–59.
- 21. . Fully automatic segmentation of the proximal femur using random forest regression voting. IEEE Trans Med Imaging 2013;32(8):1462–1472.
- 22. . A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research. J Chiropr Med 2016;15(2):155–16.[Published correction appears in J Chiropr Med 2017;16(4):346.].
- 23. . Development of a fully automatic shape model matching (FASMM) system to derive statistical shape models from radiographs: application to the accurate capture and global representation of proximal femur shape. Osteoarthritis Cartilage 2013;21(10):1537–1544.
- 24. . Hip sonography update. Quality-management, catastrophes—tips and tricks. Med Ultrason 2013;15(4):299–303.
Article History
Received: Apr 30 2021Revision requested: July 6 2021
Revision received: Oct 28 2021
Accepted: Nov 16 2021
Published online: Jan 25 2022
Published in print: May 2022