The Influence of a Short Training Program on the Clinical Examination of Dental Restorations
Purpose: To investigate how a simple restoration evaluation training program affected restoration replacement decision making by a group of 16 dentists. Method: The clinical examination of 66 dental restorations in nine female patients was carried out by two groups of dentists: one having previously received training in restoration assessment. The results of these assessments were compared to a gold standard for restoration integrity determined by two experienced clinicians applying US Public Health Service criteria. All evaluations were completed under controlled clinical conditions with standard equipment and lighting. The results of the clinical examinations between the trained (test) group and the untrained (control) group were compared to each other and the gold standard. Results: The trained group scheduled fewer restorations for replacement (6.00±3.01 and 9.71±3.15; p=0.034), in a shorter time (27.86±3.45 mins and 36.71±3.74 mins; p=0.003) and showed greater agreement with the study's gold standard for restoration replacement (0.85±0.27 and 0.79±0.06; p=0.002). Conclusion: Within the limits of this study, examiner training can significantly improve the reliability of restoration replacement decision making by dentists.SUMMARY
INTRODUCTION
The clinical assessment of dental restorations is a daily event for many general dental practitioners, and studies have shown that over half of all dental restorations placed are replacements.1-4 Throughout the world, the costs associated with restoration replacement are significant.5,6 In the United Kingdom alone many thousands of restorations are replaced each year, placing an enormous burden on National Health Service resources.7 It is probable that unnecessary replacement of “sound” restorations occurs, as it is known that variation exists among dentists when deciding whether or not a restoration should be replaced8-20; such variation has implications to those who fund treatment and to the patient and has the potential to increase tooth morbidity.
Calibration of examiners is routinely used in clinical trials and epidemiological surveys to improve examiner consistency and reliability when assessing restorations and dental caries.21-23 The US Public Health Service (USPHS) criteria24 can be used to evaluate restoration aesthetics, marginal adaptation and discoloration, anatomical form, and recurrent caries, and these factors are known to be key areas used by dentists when determining restoration integrity. Although widely used in clinical trials, it does not appear to have been used much in clinical dental practice25-30 but continues to be used by researchers. It has been reported that a simple training program using USPHS criteria improves agreement between examiners in a simulated clinical environment but increases the time taken to evaluate the restorations.5
The present study was designed to investigate if a USPHS training program affected restoration replacement decision making during a clinical examination of dental restorations. The null hypothesis for this research is that the training program would have no effect on restoration replacement decision making by the group of dentists.
MATERIALS AND METHODS
This project was carried out with full ethical approval from the Local Health Board (Bro Taf) with 14 dentists taking part in this study. The recruitment of the dentists has been reported.5 But, in brief, a group of general dental practitioners and academics associated with the University Hospital of Wales were asked to evaluate 112 restorations in extracted teeth in a phantom head. Following the initial assessment, half the dentists were randomly selected to undertake a USPHS-based training program and then asked to reassess the restorations under the same conditions. The second phase of the project, reported here, evaluated the untrained (control) and trained (test) groups assessments of restorations in human subjects.
For this study, a number of employees of the University Dental Hospital of Wales were asked to take part in a clinical trial assessing dental restorations and their replacement; inclusion criteria were having a number of plastic dental restorations, no removable prostheses, no dental pain or orofacial discomfort, and not actively undertaking dental treatment by their dentist. After consent and screening of the volunteers (n=20) by the principal researcher, nine female patients between 19 and 54 years of age were recruited to the study. This resulted in a pool of 66 restorations in 61 teeth (15 bicuspids and 46 molars; Figure 1). The status of the restorations (the gold standard) was then determined by two experienced assessors using USPHS criteria. All the restorations were initially evaluated independently, and, where disagreement between the assessors was noted, an agreement by consensus was made as to the integrity of the restoration.



Citation: Operative Dentistry 36, 2; 10.2341/10-202-C
Two groups of seven dentists one untrained (control) and one trained (test) in the use of USPHS then evaluated the restorations under standardized controlled clinical conditions (dentists used operating loupes if this was a normal and consistent part of their diagnostic practices) with a standard operating light (KaVo Dental Gmbh, Biberach, Germany), size 4 front surface plain dental mirror (Dentsply Ash Instruments, Weybridge, UK), triple syringe, and number 9 probe (Dentsply Ash Instruments) being provided. The dentists were asked to assess each restoration and determine whether they would replace it; they were told to assume that the patient was fit and well, with no dental pain or discomfort. They assessed each restoration individually with a scribe relaying the order in which the restorations were to be evaluated. The participants were given as much time as they needed to complete their deliberations; the time taken to complete the examinations was noted, and the results were compared to the gold standard.
Two weeks later, approximately half the restorations were reexamined by all the dentists (under the same standardised clinical conditions as before) and the results recorded.
In addition to participating in the project, the trained (test) group of dentists were asked to complete an evaluation questionnaire exploring their attitudes and experiences of the restoration evaluation criteria. This questionnaire did not lend itself for comparison in the untrained group; however, their views on the effects that merely taking part in a research project were sought (by interview) in order to ascertain if it had affected their clinical practice (533 words).
RESULTS
All nine patients included in the study were female and they ranged from 19 to 54 years of age (mean 30.6 years). There were 61 restored teeth that housed a total of 66 restorations; 46 were molars and 15 bicuspids (no patient presented with any anterior restorations). Of the molars, five had two restorations, and of the 66 restorations included in the study, 36 were amalgams, and 30 were formed from a resin-based material.
The gold standard assessors initially disagreed about seven restorations; after consensus, five of the restorations were deemed to require replacement (three amalgams and two formed from resin-based filling materials).
The differences between the mean results for test and control groups are given in Table 1 and show a number of statistically significant differences. The test group scheduled fewer restorations for replacement (p=0.034), the range being from 1 to 10 with a mean of 6; the range for the untrained group was from 3 to 12 with a mean of nearly 10. The trained group also took significantly less time in their deliberations (p=0.003 vs p=0.011) and with greater agreement to the gold standard (p=0.002). In addition to the five restorations identified for replacement by the gold standard assessments, there were 38 other restorations identified as needing replacement by at least one of the examiners (9 by the trained group alone, 17 by the untrained group alone, and 12 jointly) with total agreement for not replacing a restoration found for 23 of the 61 restorations. Table 2 summarizes the agreements and disagreements as percentages for the trained and untrained dentists when compared against the restorations that were deemed as requiring replacement by the gold standard.


Table 3 illustrates the test group's response to the evaluation questionnaire and suggests that the dentists found the USPHS criteria both useful and straightforward to apply. In addition, the respondents were invited to record free comments, and these are detailed in Table 4.


DISCUSSION
This research involved the evaluation of restorations in a group of patients that were, arguably, not truly representative of the general public, as the sample was solely female and with an “active” caries experience lower than the national average.31 Half the restorations involved in this research were resin based (30 of the 64), reflecting an increasing use of resin-based restorative materials in general practice.32 It was also noted that none of the patients presented with anterior restorations. Recall ability of assessors' decisions was a concern of the researchers because of the small number of restorations being designated for replacement (5 of 66). The selection process produced a 7.5% incidence of restoration replacement in the patient cohort, which is considerably less than the 25% suggested as desirable for studies looking at decisions relating to assessment of clinical techniques.33 However, when selecting volunteers and restorations for the study, the gold standard assessors chose a range of restorations covering the full range of USPHS assessments with care being taken to avoid selecting unusual restorations that could be easily recalled. The concern relating to recall ability proved to be unfounded during the research.
The gold standard for restoration replacement in this phase of the study was reached by consensus between the gold standard assessors; the excellent agreement shown for the previous simulated clinical phase5 and the relatively minor disagreement noted between the clinical evaluations (7 of 66 restorations) suggested no need to repeat the intra- and interexaminer evaluations for the patient data.
The results highlight a number of statistically significant differences between the trained (test) group and untrained (control) group (Table 1). The trained (test) group scheduled fewer restorations for replacement than the untrained (control) group (6 compared to nearly 10; p=0.034), suggesting that the use of the evaluation criteria made the trained dentists less likely to replace a restoration and more likely to replace a restoration when it had clearly identifiable criteria suggesting replacement. This finding could be explained, as the training program gave the assessors a written description of failure to follow and hence suggesting replacement only if it fit the description. This is also reflected in the convergence toward the gold standard as highlighted by a more favourable score for interexaminer agreement to the gold standard (0.85 compared to 0.79; p=0.002). This convergence was also noted in the simulated clinical phase of the project.5
Examination times were significantly less in the trained (test; 28 compared to 37 minutes for the full examination; p=0.003), the inference being that clearer thoughts processes and descriptions of failure led to an internalization of the processes required and hence a swifter examination process rather than coming to a subjective decision; again this is shown by convergence to the gold standard.
There was no significant difference in intraexaminer agreement between the two groups, which was very high for both (0.88 and 0.92); that is, no group was significantly more consistent in their decision making when compared to themselves. It is, however, believed that intraexaminer agreement, while desirable, is not necessarily the best judge of clinical acumen or reliability since practitioners can consistently agree with themselves that a restoration requires replacement or that a tooth requires restoration, but they can also be consistently wrong. It is believed that the agreement with the “gold standard” is a better marker with respect to clinical validity in decision making; in this research, the trained (test) assessors did this: 0.85 compared to 0.79 in the nontrained group.
The study suggests that training can result in significant improvement in assessment performance and agrees with previous research.34-39 The increased agreement of the trained assessors with the gold standard is an important finding suggesting a successful although short training program. Weaver and Saeger (1984)40 list specific elements for training programs, all of which were included in the current project: an active practical session rating samples, low levels of pretraining examiner agreement, and clearly defined, well-worded assessment criteria.
It is notable that the improvement in examination time was not observed in the simulated clinical phase of this research,5 suggesting that there may have been a period of consolidation and full appreciation of the USPHS criteria.
On completion of the clinical phase of the study, all patients and their associated restorations were reexamined by the principal researcher (RM). This served two purposes: it confirmed that the restorations had not been damaged by the repeated examinations carried out, and it was used to inform the patients about restorations requiring replacement. When indicated, an offer to replace the restoration was made; if this was declined, a letter was forwarded to the patient's general dental practitioner. While this repeat examination procedure has not been previously reported, it is believed to be justified, as Ekstrand et al. (1987)41 showed that probing of occlusal surfaces can produce “irreversible traumatic defects” to teeth. We consider it good clinical and ethical practice to ensure that volunteers in clinical trials like this are not harmed by taking part in the research.
For a new clinical procedure to be accepted, it needs to be safe, effective, and advantageous. It should also be easy to integrate into practice and be acceptable to patients and the user. While this research showed potential in the use of USPHS, the views of the study participants were also sought through the medium of a printed questionnaire. A printed questionnaire was used for ease of distribution, collection, analysis, and interpretation. However, it is accepted that even when anonymity is ensured, there is always a degree of unreliability in drawing conclusions from survey-type analyses and particularly with small surveys like this one because respondents often answer in a way that they feel they should.42 It is also acknowledged that other data-gathering sources, such as a focus group discussion, could have been used to determine what the volunteers thought of the research, its conduct and as a whole.43-47
The evaluation survey revealed a number of interesting findings. Among the trained (test) group, at least three-quarters rated the evaluation criteria to be easy or very easy to use and that they felt that the evaluation criteria were useful and had a place in everyday clinical practice. They also believed that their own consistency in restoration evaluation had improved, with five of the eight in the test group believing that participation in the research had changed their everyday decision making about restoration replacement. Such findings suggest an acceptability of and ease in the applicability of the USPHS criteria in the short term. The results from the questionnaire confirmed the authors' belief that the criteria were easy to apply even though some, such as marginal discoloration and anatomical form, appeared to cause the assessors some difficulty, it being noted in the simulated clinical study that the trainee's deliberations in these fields were made with less certainty.5
The views of the untrained (control) group were sought to see if they felt that their involvement in the project had altered the way they currently viewed restoration replacement. In order to ascertain this, the untrained (control) group was interviewed individually, with half (four of the eight participants) relaying that simply taking part in the study had indeed affected how they viewed restoration replacement. It was also noted that, when interviewed, a number of the untrained (control) group had felt aggrieved and disadvantaged for not being selected to receive training, three suggesting that they would like to undertake the training program offered.
In addition to the responses noted in the evaluation questionnaire, a number of free comments were generated that raised some important points. It was recognized that training in restoration replacement or indeed calibration of dentists is recognized as a rare thing (unless you are participating in a trial). This observation is not unique to this research.48 There was an inherent willingness for people to take part in the research, and perhaps the willingness for people to participate in calibration programs should not be underestimated, as equally different operators take on board new tools with differing degrees of enthusiasm. This finding may have been a direct result of using practitioners who were based at the hospital and using colleagues who were eager to participate and help out—despite no financial incentive being promised. It is, however, something that should be explored.
The impact that the USPHS guidelines have had in dental research cannot be underestimated, and it has been suggested that “few if any methodological studies . . . have been cited more often and had greater scientific impact.”49 As an assessment tool, the USPHS compares favorably (if not better) to simpler evaluation systems, such as that used by Lotzkar et al. (1971),50 which looked at four areas (adaptation, contour, contact, and occlusion), and better than more complicated evaluation tools, such as that proposed by Hammons and Jemison (1967),51 which evaluated 10 areas—anatomical carving, marginal ridge relation, contact, contour, marginal integrity, condensation, occlusion, tissue integrity, postoperative lavage, and surface smoothness—that then had to be scored as excellent, acceptable, or unacceptable. However, following this research, the fundamental question on how best to use the evaluation criteria in the clinical environment still needs to be established. There is no work outside a clinical trial environment to indicate how well the USPHS performs over the life of a restoration in routine practice or how well it can be combined with other diagnostic techniques (eg, radiographs). There is also no evidence to indicate that a restoration scored as “failed” by USPHS criteria would actually progress to failure if it were monitored rather than replaced.
Despite the above, this research has shown that a short training program can decrease examination times and ensure convergence toward a gold standard for projected restoration replacement and affect the number of restorations projected for replacement. The USPHS criteria could be used as a tool to train dentists and undergraduates in restoration assessment, and in the absence of real, identifiable, recordable, and justifiable reasons for replacement (eg, pain), a restoration should not be replaced. The adjunctive use of radiographs or other diagnostic aids with the USPHS needs to be considered where there is diagnostic uncertainty, and while Poorterman et al. (1999)52 and Hintze and Wenzel (1994)53 believe that radiographs in the assessment of restorations have a limited clinical benefit in populations with an experience of a low number of caries, there is research to show the contrary.54 It has also been shown that radiographs can have a detrimental effect in the diagnosis of caries and lead to overtreatment of carious lesions in the inexperienced.55
It is believed that this research provides significant evidence that the use of the USPHS as a research tool in primary dental care merits consideration despite the challenges that research in general dental practice presents.56
CONCLUSIONS
Despite the limitations presented in this clinical study, it is concluded that the use of standard criteria (USPHS) delivered though a basic training program can significantly influence restoration replacement rates among general dental practitioners, significantly reduce examination times, and provide convergence to a defined standard. The effects of these findings in the long term should be determined along with alternative methods of delivering training to larger groups of assessors.

Type and distribution of selected restorations in patients used for the clinical phase.
Contributor Notes
Robert McAndrew, BDS, MScD, PhD, University of Cardiff, Applied Clinical Research and Public Health, School of Dentistry, Heath Park, Cardiff CF14 4XY, United Kingdom
Barbara Chadwick, BDS, MScD, PhD, University of Cardiff, Applied Clinical Research and Public Health, School of Dentistry, Heath Park, Cardiff CF14 4XY, United Kingdom
Elizabeth T. Treasure, BDS, PhD, University of Cardiff, Applied Clinical Research and Public Health, School of Dentistry, Heath Park, Cardiff CF14 4XY, United Kingdom