Summary Physical exercises help reduce pain and disabilities in individuals suffering from non-acute low back pain (> 3-4 weeks), but these effects are relatively limited. To increase the effectiveness of this type of intervention, we must determine the types of patients for whom each type of exercise is most likely to be effective (patient/intervention matching) and for what reasons (underlying mechanisms). This study focussed on lumbar stabilization exercises, an active exercise modality that is gaining in credibility and popularity. The study had three specific objectives: (1) to initiate development of clinical prediction rules (CPRs) for treatment outcomes (success or failure) in order to detect, during the clinical examination, those patients most likely to respond well, or not at all, to these exercises; (2) to study the mechanisms (of neuromuscular and psychological origin) activated by these exercises by using more specific measures that describe the treatment effects; (3) to assess the medium-term (8 weeks) test-retest reliability of neuromuscular measures in healthy subjects. A total cohort of 130 patients with low back pain no longer in the acute phase (four weeks post-injury) is needed to develop CPRs. However, in this pilot study we evaluated 48 patients to come up with preliminary evidence that would justify the need for continued recruitment (80 additional patients) in order to derive the CPRs. The exercise program was carried out over eight weeks (two sessions per week) in physiotherapy clinics with no co-intervention. The main outcome measures [pain; disability perceptions (Oswestry Scale)] as well as several questionnaire-based measures (psychological measures [PSY] associated with pain and treatment adherence), were collected at the beginning of the exercise program (T0), at weeks 4 (T4) and 8 (T8 – end), and at six months post-treatment. The other measures obtainable in a clinical setting and therefore likely to be retained for developing CPRs (objective 1) are physical tests (measures from the clinical physiotherapy examination [PHT]) carried out during the clinical examination. These were performed at T0 and T8, and included joint instability tests (n = 4), flexibility tests (n = 6), motor control deficit tests (n = 8), physical performance tests (n = 4) and muscular endurance tests (n = 3). Six laboratory tests were also performed at T0 and T8 to study the mechanisms of neuromuscular origin (neuromuscular measures) [NRM]; for objective 2) in a sub-sample of 32 patients: (1) thickness and activation of the deep muscles of the trunk through ultrasound imaging, (2) lumbar proprioception, (3) postural balance of the trunk in seated position on an unstable chair, (4) lumbar rigidity, (5) anticipatory postural adjustments (APAs), and (6) trunk coordination. To evaluate test-retest reliability (objective 3), these tests were also performed at the same time interval (eight weeks) with a sample of 30 healthy subjects. Analyses and results pertaining to objective 1 (development of CPRs): The preliminary derivation of the CPRs yielded sufficiently compelling results, at both the statistical and theoretical levels, to warrant completing recruitment of the patients needed to obtain more robust CPRs with tighter confidence intervals. The CPR for treatment success retained two variables from the physical examination, giving an overall accuracy of 81% and the following predictive statistics: sensitivity: 94%; specificity: 50%; positive likelihood ratio (+LR): 1.9; and negative likelihood ratio (-LR): 0.13. These two tests consist (1) of asking the patient, in standing position, to hold a light load close to his or her body at shoulder height, and then to stretch his or her arms horizontally forward as far as possible, in order to measure the distance, and (2) to perform a provocation test in which the therapist performs an abduction and lateral rotation movement of the patient’s hip, with the patient in supine position. The CPR for treatment failure included lumbar curvature and sex, thus giving an overall accuracy of 81% and the following predictive statistics: sensitivity: 50%; specificity: 93%; +LR: 7.8; and -LR: 0.53. The psychological variables were examined in a second set of analyses, but did not contribute to the development of a CPR for treatment success. However, pain catastrophizing was added to the CPR for treatment failure, increasing its overall accuracy to 88% and producing the following predictive statistics: sensitivity: 83%; specificity: 90%; +LR: 8.6; and -LR: 0.18. The CPR for treatment success showed excellent sensitivity (94%) while the two CPRs for treatment failure showed excellent specificity (³ 90%), representing a desirable combination for this type of intervention. The predictors retained for the two CPRs (success and failure), while different from the predictors established in an earlier study by another group (Hicks et al., 2005), appear to be congruent with the theory underlying this exercise program. On the other hand, the confidence intervals associated with the CPR statistics were very wide, as would be expected with this small patient sample. Adding more patients would allow these variables to be confirmed or new ones to be identified, and for these confidence intervals to be tightened. Analyses and results pertaining to objective 2 (study of mechanisms): The variance analyses (ANOVAs) and correlational analyses performed brought to light statistically significant effects for certain variables in each category of measure (PHT, PSY, NRM), as well as a number of trends in the results (0.05 < P < 0.10), again suggesting that the recruitment of additional patients would allow several hypotheses to be verified. In actual fact, there was an insufficient number of patients to test, with sufficient statistical power, the presence of SUB-GROUP interaction (success versus failure) ´ TIME (T0 versus T8), particularly for the NRM measures, which could be collected in only one sub-sample (n = 32) of all the patients (n = 48). The study of the PSY measures indicated that it may not be possible to predict at-home adherence to the exercise program with the measures taken at T0, i.e., without taking time into account, which might explain why only one PSY variable was retained in the CPR for treatment failure and none in the CPR for treatment success. Of the NRM measures, those obtained via ultrasound imaging (thickness and activation of deep muscles) and postural balance measures in seated position were the most responsive to changes in disability and pain. Analyses and results pertaining to objective 3 (reliability of neuromuscular measures): Overall, the results obtained support the use of some of these measures in studying neuromuscular functions during a rehabilitation program. In fact, the reliability results obtained were acceptable for some measures, with correlation coefficients higher than 0.75, and fair for others. Few measures showed poor reliability. For (nearly) every test, it was therefore possible to retain a subset of more reliable measures. Two NRM tests (lumbar proprioception and postural balance) showed signs that task learning had occurred between the two measurement sessions. This learning was sufficiently significant to cast doubt on the proprioception test. Fortunately, this was not the case for the postural balance test, in which much less (if any) learning took place. It should be noted that this is the first time such an exhaustive reliability study, performed not only for a comprehensive battery of NRM tests (n = six tests) but also with a time interval corresponding to the duration of a rehabilitation program (eight weeks), has been performed in this field. These measures will be useful for evaluating the effects of various types of interventions. In conclusion, based on the results obtained in this ambitious pilot study, we recommend recruitment of the 80 patients needed for the final derivation of the CPRs and the study of the mechanisms underlying this lumbar stabilization exercise program.