High Stakes Testing – Doing it Right

Approximately 15 years ago we first started discussing with pharmaceutical companies the idea of using testing to validate training programs and measure sales representatives’ knowledge acquisition. At that time we essentially had three types of reactions: (1) some small number of companies “got it” (2) a larger number of companies felt that testing was a “nice to have, not a need to have,” or “something that we might do some day but not now” and (3) a surprising number of companies told us that testing sales representatives was not “part of their corporate culture.”

Of course, this situation has changed dramatically in these 15 years. Today, rare is the pharmaceutical company that does not test its sales force on a regular basis. In fact many pharmaceutical companies, for a variety of reasons (perceived competitive advantage, compliance issues, pressure from regulatory organizations), have moved in the opposite direction and are now using testing in its high stakes form: as an important element in career decisions (promotion and dismissal). If you are using high stakes testing or are considering its use you must be careful, systematic and knowledgeable about basic testing theory – otherwise you are opening your company up to potential legal jeopardy.

The key questions you must consider in high stakes testing are:

  • Are your tests fair, valid, and reliable?
  • Are your test questions written to well-formed learning objectives?
  • Have you written the appropriate number of test questions to cover these learning objectives?
  • Are the test questions written at the proper level of Bloom’s Taxonomy?
  • Are the test questions properly constructed?
  • Have you used a defensible methodology for setting a passing score?
  • Have you done a post-exam item analysis?
  • What sort of policy of remediation and consequences have you put in place?
  • Has this policy been communicated to the test takers?
  • Have you consulted with your in-house legal team?
  • Are your test results auditable?

Let’s look at each of these, briefly, in turn:

Validity, Reliability and Fairness
Fairness is not generally a contentious issue in corporate knowledge-based testing as long as all employees are exposed to the same training programs, have the same learning resources available to them and are expected to perform at the same level of competency. It is worth noting that fairness does become an important issue in skills evaluations, because human raters (as opposed to computers) are doing the scoring.

There are many types of validity; the one most relevant to this discussion is content validity. Content validity is assured by writing well-formed questions to properly constructed learning objectives. It is important to mention that validity is not a quantitative measure; it does not return a numeric result.

Reliability refers to consistency of results over time, over multiple test forms, across items and among evaluators (for performance-based tests). For statistical reasons that are beyond the scope of this paper the reliability of mastery (criterion-referenced tests) tends to be low relative to norm-referenced tests because the scores tend to bunch up at one end of the curve. From a pure statistical perspective test reliability is maximized when the average test score is 62.5% -- generally not an acceptable average to most of our clients.

Learning Objectives
All training materials must have well formed learning objectives and test questions must be written to these objectives. We are often asked how many questions should be written to each objective. The theoretical answer is: as many as are needed to thoroughly test the objective. In practice, for most learning objectives, this means three to five questions.

Bloom’s Taxonomy
Most testing in the pharmaceutical industry is done at the Knowledge and Comprehension level, with some testing done at the Application level. Not much is done at the three highest levels of Bloom. This is not inherently bad if you are truly testing just knowledge acquisition. If, however, you want to see if your sales representatives can apply their knowledge then write more questions at the Application level. Why aren’t more questions written at the Application level? It’s hard to write good Application questions!

Question Construction
The rules for writing questions are not difficult, but it is surprising how many question writers have never been exposed to them. Space does not permit listing all of them here. Contact the author if you would like a list of the rules.

Passing Score
In our work with many pharmaceutical companies this is the area of test validity most commonly violated. It is our experience that most companies set passing scores arbitrarily by one of three methods:

  • The Higher Authority Method: “Our Vice President said it should be 90.”
  • The Committee Method: “What do you think it should be? I don’t know, 90 seems about right, is that OK with everyone?”
  • The Received Wisdom Method: “I don’t know how or when it got set but it’s always been 90.”

There are legally defensible ways to set cut scores but none of the above passes muster. Legally defensible passing score setting methods fall into two categories: Data-Driven and Conjectural. The most commonly used method is the Conjectural method known as the Angoff method. In the Angoff method a team of three to five subject matter experts independently assesses each test item and estimates the percentage of minimally competent test takers that one would expect to answer the item correctly. The percents are then summed and averaged to obtain the passing score.

Item Analysis
Since most tests these days are given with online testing systems it is relatively easy to do a post exam item analysis. At a minimum, for each item, you should do a point-biserial correlation and a choice distribution. You can weed out poorly written items (in spite of your best efforts, they will sneak in there) and detect needed areas of student remediation.

Remediation and Consequences
If you are going to use test results as an element of promotion and dismissal decisions you must have a clearly thought out policy of remediation and escalating consequences for failure. At each failure you need to demonstrate that you have given the test taker a fair chance at remediation prior to his/her taking another test. This policy must be communicated to the test takers prior to the initiation of the testing program.

Legal Advice
As you can imagine, in our litigious society, high stakes testing can lead into potential legal jeopardy. Prior to instituting a high stakes testing program consult with your company’s HR personnel and lawyers for company policy and guidance. (Important disclaimer: The author of this article is NOT a lawyer.)

Auditable Results
If your company finds itself in a legal dispute the authenticity of your records may be challenged. Be certain that your testing environment has legally defensible electronic records and signatures (adheres to federal code 21 CFR 11).

The focus of this article has been on objective, cognitive tests. In sales training, skills testing is also a critical element of employee evaluation. Although outside the scope of this article there are also methodologies for ensuring the validity of these types of measures. Be certain that you use a scoring rubric based on Behaviorally Anchored Rating Scales (BARS), and ensure rater consistency as a key element of fairness through training, practice and statistical methods.

It is impossible in an article of this length to do more than scratch the surface of testing theory. For questions or more detailed information, you can contact Steven Just at sjust@pedagogue.com.

Print Article