
Contents
-
Study
guidelines for case studies Olivetti and Canon
-
Case
studies
-
Test
items
-
Segers,
M.S.R., An alternative for Assessing Problem-solving Skills: the Overall Test.
Educational Evaluation,
23(4),
373-398.
2.
Maastricht
Skills Test
Faculty of Medicine
-
Examples
of criteria lists
3.
Maastricht
Progress Test
Faculty of Medicine
-
Extracts
from progress test September 1990
4.
The Thesis
Supervision Experiment
-
Extract
from Redistributing
power in the classroom: the missing link in Problem-based
learning.
(see
reference nr.7)
5.
Reference
list to the assessment procedures used in Maastricht
6.
Recommended reading
-
Van
der Vleuten, C. P. M., Scherpbier, A.J.J.A., Wijnen, W.H.F.W., & Snellen,
H.A.M. (1996). Flexibility
in learning: a case report on problem-based learning. International Higher
Education(2), 17-24.
- Van der Vleuten, C. P. M.
(1996). The
assessment of professional competence:
developments, research and practical implications. Advances in Health
Sciences
Education, 1(1),
41-67.
-
Study
guidelines for case studies Olivetti and Canon
-
Case
studies
-
Test
items
-
Segers,
M.S.R., An alternative for Assessing Problem-solving Skills: the Overall
Test.
UNIVERSITY
OF LIMBURG
Faculty
of Economics and Business Administration
International
Business Studies
OVERALL
TEST I
Information
and Study materials
Academic
year 1995-1996
1.
INTRODUCTION
The
first Overall Test (OAT) will take place on January 26, 1996 (between 9:00 and 1
2:00). Two weeks are available to you to prepare for this test, during which
time no tutorial groups are scheduled (please refer to the Study Guide p.18 and
84). This reader tells you about the objectives, form and procedures of the
test. It also includes the test articles and some study guidelines, to assist
you in your preparation.
2.
THE OVERALL TEST IN THE FIRST YEAR
The
Faculty's main objective is to train people who are able to recognize, to define
and analyze, and to find ways to solve problems related to business economics
and administration. Our aim is to offer an integrated curriculum on business
studies. Four testing procedures are used to fit this objective: knowledgde
tests, practical exams, writing assignments and overall
tests.
The
OAT is designed to measure a student's ability to apply concepts, theories,
formulas, and problem-solving skills, and to assess student's understanding of
the relationship between the various disciplines. In other words, the OAT tries
to answer questions such as: Is the student able to place the author's view
within business/economic theory?; Is the student able to interpret a diagram
that is part of a case?; Is the student able to solve a problem by arguing from
different perspectives? The OAT covers the entire leaming contents of the two
blocks preceding it, including the contents of skills training, but excluding
the language training.
The
main testing objective of the knowledge test is to measure student's
knowledge (e.g. the definition of the product lifecycle), whereas the
overall test is focused on the application of knowledge by analyzing the
problem (concept, model etc.) from different perspectives.
Overall
tests are part of the first-year examination process, together with the
knowledge tests, exams for Quantitative Methods and writing assignments. For information on the determination of
the passfail grade, the minimum result for passing an OAT and compensation
rules, please refer to the exam regulations.
lt is
important that you gain a clear understanding of the learning contents of the
previous blocks. After the 8-weeks block education, you have been given two
weeks without group meetings to prepare for the OAT. During these two weeks you
are expected to study the enclosed articles. Study guideline: try to grasp the
contents of the articles and relate them to the knowledge and skills you
acquired during the last two blocks. This implies the acquired knowledge on the
diciplines of marketing, organisation, microeconomics, international economic
relations, economics of the public sector and knowledge and skills acquired
during various skill training sessions.
Enclosed
in this reader you find the study materials for the OAT. They consist of a
number of articles from joumals and/or books. You are expected to study them
using the learning contents of the past blocks. On the OAT, questions will refer
to these articles. Some articles are accompanied by study guidelines, which
indicate the topics to be dealt with in particular, so you can concentratie on
them.
The
OAT consists of two types of questions: open questions, marked 'O' (also called
essay questions) the answer to which you must formulate yourself; and
true/?/false questions, marked 'C' (similar to the questions on the knowledge
test at the end of each block).
Questions
typically refer to the articies. Articles may be used in different ways. For
example:
·
the
article forms the context within which questions concerning theory will be
asked. In this case the article is similar to an 'extensive stem' (as in the
knowledge test).
·
questions
on the article itself are asked. Emphasized are the abilities to comprehend, to
interpret and to place the article in relation to current literature. In this
case the article will be more extensive and complex.
The
test is an 'open book' test: you are allowed to bring with you and use all
literature and other materials, ranging from textbooks to notes, from
dictionaries to calculators. You may use a textbook as a 'reminder', in order to
look up a term or model. However, you will not have enough time available to
study the textbook in detail during the test.
You
are not allowed to share study materials with other students during test
administration. Be
sure to bring the enclosed articles with you.
Questions
requiring the interpretation of an article often refer to certain passages.
Obviously, it would be difficult to answer such questions without the
article.
Since
the true/?/false questions have to be marked on a computer forrn, you are
further requested to bring an HB-pencil with you -
Shortly
after the test administration the answers to the true/?/false questions and the
model answers to the open questions will be published on the notice
boards.
Make-up
Exam (re-take)
The
Make-up Exam is scheduled for March 26, 1996. Students receiving a failing grade
for the first exam are expected to partake. Prior registration is required for
all students.
Objections
concerning the nature
Objections
concerning the nature of true/?/false questions or answer keys as well as
objections to open questions and model answers must be filed WITHIN 5 WORKING
DAYS after the test date. Late objections will not be
handled.
Objections
concerning grading of open questions
Graded
exams will be available for inspection on March 08, 1996 between 9:00 - 12:00.
Each student is allowed to look into his or her graded exam for 15 minutes.
However, PRIOR REGISTRATION is required, so individual exams can be pulled.
Registration can take place Monday March 04 till Wednesday March 06 at the
Education Desk, room 0008. Well-founded objections to the grading
of open questions must be filed WITHIN 5 WORKING DAYS after the inspection
date. Late objections will not be handled.
The
OAT coordinator deals with students' written objections. His or her written
response to these objections are kept on file at the Education Office, room
3064. They will be available for review during regular office
hours.
Form
requirements
The
form requirements for filing objections are:
·
objections
must be typed, and submitted in duplicate;
·
a
separate form must be used for each objection;
·
objections
conceming content have to be argued on the basis of
literaturen
·
the
top-left hand corner of each objection must indicate
* student name and
address
* student ID
number
* study (and graduate)
programma
* name of
test.
Objections
that do not meet these requirements will not be processed.
Objections
must be addressed to the OAT coordinator, and can be submitted to the Education
Office
(next
to the Education Desk or next to the secretariat, room
3072).
The
final marks will be published Friday, February 23, 1996.
3.
LITERATURE
·
Case
study Nestié S.A.
·
Case
study Procter and Gamble Europe.
·
Paul
J.H. Schoemaker, "Scenario Planning: a Tool for Strategic Thinking",
Sloan Management Review, Winter 1995, pp.
25-39.
·
Case
study Olivetti
·
Case
study Canon
4.
STUDY GUIDELINE
·
Case
studies Nestié S.A. and Procter and Gamble Europe.
These
case studies deal with several of the topics treated in block 1.1 "Introduction
to Organization and Marketing". More specifically, they illustrate concepts such
as corporate and business-level strategy, organizational structure, marketing
and organizational control.
lt is
important that you read the cases very carefully. You should have a thorough
understanding of the situations and problems faced by Nestlé and Procter and
Gamble. This implies that if you do not understand certain concepts or words,
you should search for additional literature in order to find adequate
explanations.
The
questions are designed to assess and apply the knowledge you acquired during
block 1.1 in new problem situations. This has two implications. First, we assume
that you have sufficient, ready to use knowledge of the literature
associated with block 1.1. If this is not the case we advice you to review the
literature. Second, you should study the cases with the literature of block 1.1
in mind. That is, try to link the situations and problems described in the cases
with the relevant literature.
·
Paul
J.H. Schoemaker, "Scenario Planning: a Tooi for Strategic Thinking", Sloan
Management Review, Winter 1995, pp. 25-39.
This
article will be used as a background for the questions regarding the QM
subjects. In order to allow yourself an optimal preparation, take enough time to
study, analyse, go through and check two specific parts of this
article.
The
first part that deserves additional attention is the description of the two
applications of scenario planning (pp. 30-36). On these pages Schoemaker relates
his method of 'scenario planning' with the (matrix) and correlation (matrix) as
to be found in W&W? And next, what exactly is the relation between a
correlation matrix (such as the one in table 3) and the scenario proflies (given
in figure l). To phrase last question in other words: if we give you an
arbitrary correlation matrix, could you possibly derive the corresponding
scenario profiles?
The
second part of the article that deserves extra study time, is 'Table 5, p. 37'
of the article, describing in full detail the outcomes of several test
experiments, together with a rather briefly worded comment on these outcomes by
Schoemaker. The comment itself doesn't clarify the experiments too much, but
with the contents of table 5, we have enough information to 'reconstruct' the
experiments. Reconstructon in the meaning of: to find out essential
characteristics of the tests such as its type (test on a single mean, test on a
proportion, test on the difference in means, test on the difference in
proportions, ... )?; what is the null hypothesis to be tested?; which is the
distribution of the test statistic?; is the test one-sided or two-sided? Some of
those characteristics can only be discovered by calculating back, starting from
Schoemaker's outcomes as to be found in table 5. Make these calculations as part
of your preparation at home; otherwise, you will have shortage of time during
the test!
·
Case
studies Olivetti and Canon
The cases -
Olivetti and Canon - discuss the global corporate policies of two major
companies. After reading the cases very carefully, you should be able to apply
theoretical issues that are dealed within block 1.2 “Introduction to
International Business" to the real life business situations as described in the
cases. In order to evaluate the strategies of the two companies in today’s
global marketplace you should, among others, understand important concepts such
as internationalisation, strategic alliances, product life cycle theory, central
coordination and local adaptation. Furthermore, you should be aware of the
political and economic situations that affect the behaviour of (multinational)
corporations. Read the cases at home and try to focus on those parts of the case
studies that are relevant to block 1.2.
For
the Olivetti case, p. 254-261 are omitted because they were not relevant for
your study.
Global
corporate policies
Case 7.1 Olivetti
Copyright
(© 1993 by the International Institute for
Management Development (IMD), Lausanne, Switzerland. Not to be used or
reproduced without written permission directly,from
IMD.
This case was prepared by
Research Associate JoyceMiller, under the supervision of Professors George
Taucher and Dominique Turpin, as a basis flor class discussion rather than to
illustrate either effective or ineffective handling of a business
situation.
In late 1986,
Elserino Piol, Executive Vice President Strategies and Development in the
Olivetti Group, one of the world's foremost information technology companies and
the second largest indigenous personal computer manufacturer in Europe, was
concerned about the company's photocopier business. Their Agliè plant located
near Olivetti's headquarters in Ivrea was producing about 20,000 units annually,
most of which were sold in Italy. This operation was expected to be an important
component for Olivetti in creating the 'integrated office', where several pieces
of standalone equipment would be linked up in a muiti-functional, automated
system.
But the window of opportunity was
closing. With the fast pace of development in the telecommunications technology
that provided the networks and links between formerly disparate pieces, several
new contenders were poised to enter the office-of-the-future market. A few
months earlier, Mr. Piol had travelled to Tokyo to meet with senior management
in Canon Inc., a major Japanese copier manufacturer, to sound out the
possibilities for co-operation. At this point in time, Mr. Piol wondered whether
it might make sense to form a basic technology alliance with a leader in the
copier field.
Ing.
C. Olivetti & Co., SpA
Ing. C. Olivetti
& Co., SpA was the parent company of the Olivetti Group, whose product line
included-distributed data processing and office automation equipment,
typewriters, calculators, cash registers. and photocopiers (Exhibit
7.1).
In 1986, the Olivetti Group
obtained a net income of L565.5 billion on sales of L7,317 billion, up 12.3%
from the previous year[1].
At this time, Olivetti had manufacturing activities in 27 plants in seven
countries.
|
1986
1985 |
|
Distributed data
processing and office automation Electronic
professional typewriters. videotyping systems
14.0
13.2 Personal
computers
28.5
29.5 Minicomputers and
terminals
28.0
32.2 Printers
7.0
7.2 Telecommunications
equipment
2.8
2.7 Total
80.3
84.8 Office
products Portable and
office manual and electric typewriters
8.2
5.9 Calculators, cash
registers
6.7
5.7 Copiers
3.6
2.3 Office
furniture
1.2
1.3 Total
19.7
15.2 Overall
total
100.0
100.0 |
Source: Annual
Report.
Exhibit
7.1
Olivetti Group revenue breakdown, by market sector in 1986 and 1985
(%)
Founded
in 1908 and headquartered in the foothills of the Italian Alps, just over the
border from Switzerland, Olivetti was known for many years as the family-owned
company that turned out elegantly designed typewriters. By the mid-1960s, Olivetti was the sixth
largest industrial organization in Italy. and 80% of its revenues from the sale
of manual and electronic typewriters, calculators, accounting machines and
office furiture were generated outside Italy.
In the
following decade, as a result of its ambitieus growth strategy, Olivetti became
seriously undercapitalized, and it appeared that the company would either go
bankrupt or fall into the hands of the ltalian government. In April 1978, a
dynamic leader from outside the family was brought in to turn the company
around. Carlo de Benedetti, an ltalian industrialist who had previously spent
several months as Managing Director of Fiat, took over as Vice Chairman and CEO.
De Benedetti invested over $17 million of his own personal fortune in the
company (and thereby became the majority shareholder) and launched a programma
to revitalize Olivetti.
Today, the ink-jet technology
used in bubble jet printers is a much better approach, offering a standard of
reproduction that was once thought impossible to achieve. Many companies.
including Olivetti, are working to further develop ink-jet technology.
Currently, Canon is using a lightweight printer head to spray ink through
nozzles that are one-third the diameter of a human hair. The biggest obstacle to
increasing the speed is finding a way to dry the ink fast enough. There is no
solution yet, but many are working on it.
Olivetti’s ink-jet area is
expected to develop into a growing. and profitable business over the next
decade. Until this point, we've developed a technology that is similar to
Hewlett-Packard's technology to make bubble jet printers, and we've gained a
strong position with our dry-ink-jet non-impact printing calculator. Our bubble
jet printers are very sophisticated electromechanical printers, and we're now
the largest producer of printers in Europe. For Olivetti. this was a natural
transition from the typewriter. We have a research lab of 70 people in lvrea
working on ink-jet physics, chemistry and application, and in addition, we have
about 60 people in an R&D -group in Yverdon, Switzerland, looking at how
this technology could be implemented in new products.
Olivetti had put together a group of
close to 70 engineers in Agliè who were involved in designing low-end copiers.
These machines were fully developed by this group, and there was ongoing R&D
concentrated on photoconductors and toners.
By
late 1986, the Agliè operation was turning out about 20,000 units annually.
However. several assembly line problems were occurring, and the source of these
difficulties could often be traced back to external parts. The high reject rate
was resulting in additional costs for Olivetti as well as its suppliers. Mr.
Demonte remarked:
We're losing money in the
copier business. But, closing up the operation entirely would certainly lead to
additional expenses. We have a large infrastructure built up to support this
business. We have a strong market position in ltaly, and we can't just pull out
of that. There would also be a question of what to do with the dealer channel
and after-sales service organization. it isn't part of Olivetti's culture to
just switch off something like this. Moreover, there are strong employment laws
in ltaly.
We have tried several times to enter a
partnership in the copier business. Sometimes. the companies we contacted wanted
to buy our operation outright. At one point. we approached one of our Japanese
OEM suppliers, but they didn't want to be in a joint venture with an industrial
operation. We always asked for R&D, management and production to be put into
such a venture, and the Japanese counter-proposal was always to have the
management and R&D in the venture and then subcontract out the production to
a Japanese company. They were concerned about the quality of the end product as
well as the level of production know-how.
Exploring
the possibilities for cooperation
In
late 1986, Elserino Piol. Executive Vice President, Strategies and Developrnent,
travelled to Tokyo and approached senior Canon management with the idea of
cooperating in some way in the copier business. Mr. Piol was intrigued by
Canon's replaceable cartridge technology, which was introduced in 1982 in the
world's first personal copiers, and he believed that -great potential benefits
for both parties could be derived if the two companies could work together. Mr
Piol elaborated:
I strongly felt that we
could mutually benefit from this kind of cooperation. In our initial meetings. 1
found Canon's top management to be quite open and willing to talk about
cooperating with a foreign company. Before going to Tokyo, 1 had also initiated
discussions with another large copier manufacturer that was not Japanese but had
a large European presence. Olivetti needs a partner to share R&D with, one
whom we could acquire technology from and would give us access to an additional
market in the copier business.
Olivetti was one of the
first firms with a strategy to acquire technology not strictly by inhouse
development but also through joint ventures, alliances, venture capital
companies, and so on. At present, we have close to 200 joint ventures in
operation (Exhibit 7.4).
Olivetti has a lot of
experience with this kind of arrangement.
In
1986, Canon was the dominant player in Europe, placing an average of 17,000
units each month, which represented a 22.7% share of the European copier market.
For years, Canon had used an OEM strategy in Europe, while all other safes were
handled by its Amsterdam-based regional headquarters, Canon Europe. This
arrangement had enabled Canon to concentrate on cementing its position in the
highly competitive domestic market. Over time, the larger of Canon's European
sales subsidiaries that were subsequently put in place began to operate more
independently. As of late 1986, Canon had only a small position in southern
Europe and believed that it would be expensive and time-consuming. to develop
its own distribution channels there.
Filippo
Demonte, who as head of the Office Products Group was directly responsible for
Olivetti's copier business. remarked:
For whatever
arrangement we might enter into. it is important that Olivetti be the majority
shareholder. Any venture has to be 100% under Olivetti management so that we can
guarantee to the government and to the company that we would not be selling
Italian technology to a foreign company. In these things, it is important not only to show but also
to be. Moreover, succeeding in ltaly is more likely if you are a
successful lialian company than if you are a successful foreign company: the
same principle that exists everywhere. It is important to the policy makers, the
opinion makers, the unions and other national bodies. Having majority ownership
would also ensure that we could participate in ltalian ,government,
inter-government programmes, and Europe-wide programmes.
Mr.
Piol believed that much could be learned from being in a partnership with a
.company like Canon. particularly with regard to production process, supplier
relationships and basic copier technology. He explained:
If we were to put together some kind of joint
venture with Canon - and I'm not sure just what that would look like in terms of
ownership, structure, and the kind of assets and staff each partner would put
into it - there could be some significant benefits on both
sides.
This could be an opportunity for Canon to
strengthen its presence in Europe, and we could learn about Japanese techniques.
The Japanese have more exacting goals for quality and better control over
development time. Right now, we're working with an inventory level of 45 days,
and in Canon, it's five. We've used value engineering techniques many times in
the past to improve this level, but not with the same success as the Japanese.
They apply these techniques in a strict and methodical way, with a determination
not to stop until .good results - results which may seem impossible to obtain -
are achieved. On the other side. it's hard to know how strongly Canon would ask
us to adopt the Japanese way.
In the early stages of such a
venture, 1 imagine that we would manufacture an Olivetti machine, which would be
received by the Canon and Olivetti sales organizations, as well as their dealers
in Europe. Over time, we would license the basic technology from Canon Japan and
refine it for European needs. Perhaps we would also buy the photographic drums
a.nd mirror mechanisms from Canon factories in Japan and/or France, and Canon
would presumably make a profit on these safes. One of the essential negotiating
issues would be to determine the kind of R&D that would be done in Agliè and
its scale, as well as whether we could eventually compete with other Canon
design centres
Mr. Demonte
added:
lt would be interesting
to have Canon as a partner because then we would have a parent that is both a
shareholder and a customer. When we're speaking with the shareholder, we'11 be
talking about profit and loss, net equity, and so on. When we're speaking with
the customer, we'11 be talking about the level of logistical and quality
improvement. As well, we'11 be trying, to anticipate what the customer wants,
which should help us with the product design specifications and in the
production level we attain.
On the one hand, we
would be an Olivetti company. On the other side, we would become part of Canon's
copier machine division and part of the Canon family of copiers. One of the
inherent challenges with any venture where two partners are involved is to
manage the identity question. There will always be some people on both sides who
will have difficulty making the distinction. Big companies are not made in such
a way as to understand that they don't own a whole
organization.
This case was prepared
by Research Associate Joyce Miller, under the supervision of Professors George
Taucher and Dominique Turpin, as a basis flor class discussion rather than to
illustratie either effective or ineffective handling of a business
situation.
We are grateful for the
assistance of Professor Gene Gregory in the preparation of this case
study.
From
its humble beginnings in a small workshop in Tokyo's Roppongi district, Canon
Inc. had become, by 1986, one of the world's leading manufacturers of cameras,
business machines, and precision optical equipment. In the following year, Canon
would celebrate its 50th anniversary, and President Ryuzaburo Kaku planned to
use the occasion both to review the company's past achievements and carefully
plan for the future. Mr Kaku's aim was to make Canon into a premier global
corporation
Well before the yen
entered the steepest arc of its upward curve, Canon had seen the necessity of
moving- manufacturing into its markets, of putting production close to the place
of consumption. The new phase of 'internationalization' was initially prompted
by the trade imbalance (and trade friction) between Japan and the chief
countries where Canon sells ... Canon has advanced quite briskly towards
becoming truly global - and the intention is to take the .global process further
by establishing R&D centres in its markets as its national companies develop
into free-standing. businesses within the global
corporation.
The imperatives of
global rationalization - especially in copier operations - require Canon
ownership and finely - tuned management of R&D, production and marketing. As
with all strategic alliances. the fine line between compelling. necessity and
expediency is not always readily apparent.
In the
mid-1980s, Olivetti, a long-time player in the office equipment market with
particular strength in Italy, was looking for a way to bolster its presence in
the European copier market. Canon, at the time, was eager to expand its market
share in Italy and to strengthen its European manufacturing
base.
CANON
INC.
A young company
by Japanese standards, Canon traced its history back to November 1933 when a
small group of camera enthousiasts led by Mr. Goro Yoshida founded Precision
Optical Research Instruments Laboratory in Roppongi, then a suburb of Tokyo, to
conduct research into quality compact cameras. Two years later, the Hansa Canon,
Japan's first 35 mm focal plane shutter camera, remarkably resembling the
German-made Leica, was introduced in Tokyo. In 1937, Precision Optical Industry
Co., Ltd was established to manufacture the Hansa Canon, with Mr. Saburo Uchida
as its first Executive Managing Director. When Mr. Uchida was drafted for
service in the army in the late 1930s, Dr. Takeshi Mitarai, a practising
physician who had invested in the new company and become its auditor, took over
the company's management and became its president in September
1942.
During
the war, Precision Optical was forced to abandon 35 mm camera production to
become a supplier to the Japanese military. In this capacity, the company
developed an indirect X-ray camera for mass-screening to detect tuberculosis
infection. In 1944, the company diversified into binocular production with the
acquisition of Yamato Kogaku Seisakusho. After rapid reconversion to camera
production, with the war's end, the company changed its name to Canon Camera
Co., Ltd in 1947. Over the next two decades, the company -grew into the world's
leading camera manufacturer.
Canon's
international operations began modestly in 1951 with the appointment of Hong
Kong,-based Jardine Matheson as its sole worldwide agent. Responding to the
growing US market for quality cameras. Canon established its first overseas
branch office in New York in 1955, and two years later formed Canon Europa in
Geneva as an exclusive distributor in Europe.
Vertical
integration and product diversification
Early
in the decade, Canon began the dual processes of vertical integration and
product diversification that accounted for much of its strength in the domestic
and world markets. Subsidiaries were established to produce micromotors and
metal parts, and a supplier of precision components was acquired. Then, in 1956,
the first major expansion of the product line was made with the addition of
personal cine-cameras.
An
overly ambitious diversification strategy led to Canon's first and, thus far,
only major product failure. Introduced in 1958, the Synchroreader, designed to
record voice messasges on paper for educational use, proved to be
technologically far ahead of its time- Within a year of its introduction, the
product had to be withdrawn from the market, leaving the company with a division
staffed with electronic engineers who could not be dismissed simply because
management had made a serious strategic error in product planning and
marketing
Determined to
transform adversity into advantage, Canon harnessed the skills of these people
to make a major move into business machines with microfilm equipment for banking
use in 1959 and, in a major new departure, with the development of the Canola
130 electronic calculator introduced to the market in May 1964. Success in the calculator market set the
stage for venturing, into the copier market in 1968, with a 'New Process' plain
paper system that challenged and eventually broke the tight hold of
Xerox.
Competitive
pressures intensify
In 1974, Canon
found itself in serious trouble, Malfunctioning calculators, with faulty light
emitting diode displays, had to be recalled in large numbers, a mishap that
could not have come at a more inauspicious time. Ferocious competition, led by
Casio and Sharp, had driven prices to the ground, forcing many calculator makers
to withdraw from the market. Those that remained were operating at the margin,
with little or no profit. At the same time, the growth of camera sales slowed as
markets became increasingly saturated. Exports of camera and other products
decreased under the pressure of a higher yen, and production costs were rising
as a result of higher petroleum prices. In the first half of 1975, Canon was
forced to suspend dividends for the first time in its history, an experience
still regarded in the company with some horror almost 20 years later. The
combination of forces battering the company exposed the company's structural and
managerial weaknesses. Ryuzaburo Kaku, then in charge of Finance,
recalled:
Canon's technical
strength - demonstrated in a stream of pioneering that began with Japan's first
35 mm precision cameras - had not been backed by a coherent management strategy.
Marketing was weak. Competitors were copying (our) products before (we) could
fully exploit (our) sales potential. Canon was like a ship chat constantly
changed cours and got nowhere ... Components were being manufactured in too many
scattered locations ... As in many old Japanese companies, our people were so
afraid of making mistakes that they did nothing. We've had to teach them not to
fear being creative - or even failing.
Introduction
of the premier company plan
Mr.
Takeo Maeda, the new president who had assumed office just before the gale of
misfortune swept over the company, responded with a 6-year premier company plan.
Launched in 1976, the plan called for a restructuring, and internationalization
of the company, and the introduction of new efficient production systems, to
avoid the pressures of yen appreciation, protectionism and energy shortages in
the future. The objectives of the plan were clear and ambitious. Canon was to
become a leading corporation in Japan within three years, and a world leader in
the subsequent three years. The new plan began by reducing- operations,
curtailing costs and undertaking, efforts to strengthen camera, calculator and
copier sales. An operating profit rate of 15%, with no debt, became the
principal tenets of financial management. Sales were targeted to increase 15%
annually - considered to be a reasonable growth rate - with the goal of
substantially increasing market share in all product lines. All this was to be
achieved through more rapid and higher quality product development, improved
production, and total marketing management
A new matrix organization linked the
three major product divisions - camera, business machines and optical products -
with functional committees for new product development, production and
marketing. The Canon Development System (CDS) was established to improve the
efficiency of R&D, shortening the time to market for new products. The task
of the Canon Production System (CPS) was to resolve quality problems, eliminate
waste, and activate employees within the new rationalized organizational
structure. The objective of the Canon Marketing System (CMS) was to relate the
company's products and services to customer satisfaction in all of Canon's
worldwide markets. Pushing responsibility down the line, the three product
divisions were to operate as autonomous vertical profit centres. Division chiefs were appointed and
delegated the authority to act fairly independently.
The
new plan was only just put into action when Mr. Maeda suddenly passed away. Mr.
Kaku, who as Managing Director had been largely responsibie for shaping
the new direction of the company, was elevated to the presidency and charged
with the task of completing the reforms underway.
From
the outset, Canon had been endowed with a strong corporate sense of purpose.
Self-motivation, self-awareness and self-management were the three pillars on
which the company had been created. Mr Kaku continued to give these
philosophical principles primary importance, adapting and embellishing the
company purposes for the task ahead (Exhibit 7.5). In his
words:
(When 1 took over,
Canon was) 'sluggish' and 'full of bureaucratic attitudes' which drained the
organisation of its ability to respond to changes in the operating environment .
. (My basic philosophy was) to build a company which further upholds human
rights and dignity, while striving to develop better technology and products
through innovation.
Our
corporatie philosophy
- To
be a global corporation providing
kyosei 'living and working
together for the common good'
in
all counnies where we operate
|
Our
mission |
Our
objectives |
Our
business development goals |
Our
values |
|
§
To make a
positive contribution through continued growth and reinvestment in the
world's communities |
§
To be a
responsible global citizen §
To have unique
and quality products §
To build an ideal
company for continuing prosperity |
§
To combine our
traditional hardware strength with software systems
development §
To create
information systems and networks which integrate hardware, .software and
services §
To operate on a
global scale |
§
Respect cultural
differences §
Encourage
self-motivation, self-awareness and self-management §
Respect dignity,
value initiative and recognize merit §
Work together in
harmony §
Sustain our
physical and emotional health |
Exhibit 7.5
The
Canon way
A
decade later, in 1985, Canon was weil on the way to becoming a premier company
by world standards. Significant increases in investment and R&D had resulted
in a spate of new products, many of them 'firsts' in the marketplace. Canon's
product line ranged from 35 mm and video cameras to copiers, electronic
typewriters, laser printers, facsimile machines, and microlithographic equipment
for producing semiconductors and medical equipment. At this time, Canon's
manufacturing and marketing organisation spanned over 100 countries and employed
34, 100 people (Exhibit 7.6). In 1985, profits rose to ¥ 37 billion on net sales
of close to ¥ 956 biliion. Business machines accounted for 71% of sales, with
cameras and optical equipment generating 21% and 8%,
respectively.
The
response to endaka
But
new problems were on the horizon. Unlike most other Japanese companies, Canon
relied heavily on overseas markets for the bulk of its business, with North
America and Europe each accounting for 30% of sales. Although the process of
globalizing manufacturing was well underway, a high percentage of overseas safes
were still ,generated by exports, making Canon particularly vulnerable to
endaka (yen appreciation), which followed the Plaza Accords in 1986.[2]
Canon's
response to the rising yen was guided by past experience. R&D expenditures
were increased, cost reduction efforts were broadened and intensified, and
capital outlays for overseas production facilities were boosted. After posting
record profits for the previous ten years, Canon's income dropped 70% in 1986 to
¥ IO.7 million, threatening a cut in dividends. Shinji Tatewaki, who had just
returned to Canon's copier division in Tokyo after heading up the company's
Chicaco sales office for several years, recalled:
The US government
devalued the dollar and, within the space of virtually a day, the yen was worth
significantly more against other currencies. In 1984, the yen was strong at ¥
251 to one US dollar. Then the level dropped down to ¥ 150. Production costs
increased dramatically. and there was no way that we could recover the loss. We
had to reconstruct our entire operations. We launched a large-scale cost
reduction activity and a campaign to avoid waste. In Canon Tokyo. people soon
began pinning '¥ 150 badges' on their shirts. We were all focused on what we had
to do to live in a ¥ 150 world.
Because
of the strength of the yen, Canon products made in Japan had become more
expensive overseas. Further expansion of overseas production was essential. In
addition, as a Forbes reporter commented:[3]
Canon's strongest
defence against a rising yen is innovation. With innovative products, price is
less important than in commodity-type products . . . .
This means heavy
spending on research and development, of course. Canon's R&D amounts to some
11% of parent company sales, one of the highest ratios among Japanese companies
outside the chemical and pharmaceutical industries.
Given the
increasing trade friction in the US and European markets, Canon had further
cause to reposition itself to rnaintain future growth. Three-quarters of the
company's safes were in office equipment, including both standalone machines,
such as copiers, and the systems that would combine them in the
'office-of-the-future'. It was in this sector that globalization became
increasingly imperative.

Exhibit
7.6
The Canon organization (Source: Canon Handbook)
Canon’s
copier business
Canon
first entered the copier market in 1965 with a coated paper copier, based on
technology licensed from RCA. Realizing the limitations of this technology,
Canon formed a team of engineers led by Dr. Keizo Yamaji, to develop a copier
drum with an insulating layer that would be suitable for plain paper copying
using, a more photosensitive chemical than the one then used in xerography. This
new design prolonged the drum's life and reduced the risk of discharging toxic
chemicals. Introduced in Japan in April 1968, Canon's 'New Process' (NP) plain
paper copying system was completely free of Xerox patents. Hiroshi Tanaka, who
was part of this effort, commented:[4]
Engineers working on
the plain paper copying project thoroughly investigated the patents held by
Xerox. In the process, we learned how not to violate patents and how to obtain
patents to protect our own technology. The NP technology was completely free of
Xerox's airtight patent network.
In 1972, the
company launched a second generation 'liquid dry' NP system, which used plain
paper and liquid toner and turned out dry copies. This new technology reduced
machine breakdowns by eliminating the complex heat-fusing mechanism and
simplifying the developing and cleaning process. These machines had lower
production costs, were more compact and more reliable than anything available at
the time, and they matched Xerox on copy quality. Canon subsequently licensed
out this technology to 20 manufacturers in Japan and three in the United
States.
NP
copiers were manufactured at the Toride factory on the outskirts of Tokyo, which
had been set up a decade earlier to make synchroreaders and later, cameras.
Toride used a flexible manufacturing system that could accommodate differences
in models and electrical specifications. The four assembly lines could handle
any NP model after a 2-day changeover. Each line had the capacity to turn out
between 3000 and 8000 units monthly. About 2000 parts were required to produce
the range of NP copiers.
Initially, copiers were sold in Japan
through Canon Business Machines Sales Inc., set up in 1968 to market
calculators. In 1971, this subsidiary was merged with Canon Camera Sales Inc. to
form Canon Sales Inc., whose shares were listed on the first Tokyo Stock
Exchange a decade later. Beginning with 200 people dedicated to the sale and
service of copiers, the new company sold Canon copiers outright and offered
customers a Total Guarantee System.
In the
early 1970s, Canon established a dealer network throughout Japan. Dealers
received extensive training and, within a few years, had completely taken over
the task of servicing copiers. Canon did not be-in selling its NP systems in the
US until 1972, when a dealer safes network was established. However, these
copiers were being distributed in Europe through Canon's marketing unit in the
Netherlands as early as 1972, although sales were modest.
Canon's copier
strategy was formed largely by its camera strategy. 'A camera for everyone' was
translated into 'a copier for everyone'.
Canon's
copier line initially was aimed at small and medium-scale users, a market that
had been largely ignored by Xerox, the Xerox strategy focused on large users in
,government, business, and universities. Following Canon's strategy, Dr. Keizo
Yamaji, who had become the General.Manager of Canon's Reprographic Products
Division, wanted to open up an entirely new market for the PPC. Dr. Yamaji had
market data showing that there were over 4 million offices in Japan with fewer
than five employees that were not being addressed by the conventional copier
business. The lowest-priced unit available was ¥ 500,000, about US$2300, which
was too expensive for a small business. As well. professional service engineers
needed to come in regularly to maintain these machines. Again, this cost limited
their use to larger offices. The 'dream' was to come up with a compact,
maintenance-free copier that would cost about $1000 and could be sold to small
offices, for home use, or as a personal desk-side copier. This idea was totally
different from the Xerox system which, until this point, had dominated the world
copier business.
Introduced in late 1982, Canon's personal
copier (PC) represented a revolution in reprographic technology. The PC used a
replaceable cartridge that eliminated the need to maintain the machine
regularly. After making 2000 copies, the user simply replaced the cartridge,
which contained a photoreceptive drum, toner assembly, cleaner, and charging
device. Cartridges were available with four toner colours.
In
time, copier manufacturers around the world began purchasing Canon's personal
copier on an OEM basis. For example, Olivetti began importing Canon personal
copiers in late 1984. Increasingly, large firms operating internationally were
completing their product line by buying certain models from other
producers.
In 1957, Canon
Europa was established in Geneva as Canon's sole distributor for Europe. Over
the following decade, a network of national distributors was developed to
market, distribute and service Canon cameras and calculators. To better manage
the increasing volume of European business, especially in EC countries, the
European headquarters functions were transferred to Canon Amsterdam NV in 1968,
leaving Canon Geneva as a finance company.
With
the introduction of the Premier Company Plan and the Canon Marketing System, the
first task was to reorganize the complex system of multiple national
distributorships that had evolved over the first two post-war decades. Given the
rapid diversification of product lines and the increasing importance of global
rationalization of marketing, total control over the marketing and distribution
system became imperative to respond to customer needs. In 1975, Canon gradually
began the process of replacing distributors with integrated Canon marketing
subsidiaries in each country. Over the following years, Canon Europa NV was
established, with 19 subsidiaries, including, Canon Amsterdam NV, to manage the
intricate European organization. A senior manager in Canon described the
process:
In some countries,
we had to start from scratch; in others, we already had relationships
with distributors. In France, for instance, Canon's camera importer wanted to
get into the copier business, and we quickly had the 200 people in this
organization selling copiers directly. In the UK, we were using Marubeni. a
sogo shosha or general trading house, which then sold products through
several companies. This arrangement lasted only 2-3 years. Then we had to build
something up ourselves. We put cameras and copiers together and distributed
through dealers. In Germany, Canon's camera distributor was not so interested in
selling copiers. Eventually, we were able to put together an arrangement, but it
was strictly a sales and marketing venture. In Italy, our camera importer was
also not interested in copiers. Cash recovery was a real problem. For- copiers,
Canon couldn't expect to get payment for up to ten months after the sale. In the
camera business, payment was available within 30 days. We had a good business in
Italy with calculators, but it was clear that we needed more sophisticated
salespeople to market copiers.
In many cases, Canon
ended up buying out the distributors because of their limited financial strength
and cashflow problems. This put a major strain on Canon's own financial
resources.
Over
the next several years, Canon's marketing capabilities in Europe grew
substantially. Over time, the various national subsidiaries that were
established began to operate more independently and purchase products directly
from Japan.
Canon
begins producing copiers in Europe
In
1972, Canon acquired the assets of ECE GmbH, a small German R&D house
specializing in advanced electrostatic technology in Giessen. near Frankfurt.
ECE had contributed significantly to perfecting Canon's 'liquid dry' copy
technology, and Canon had been helping the firm financially since 1969. By mid-
1973, the ECE facility had been converted into a factory with the capacity to
turn out 1500 low-volume PPCs monthly, to be sold throughout Europe as well as
in some Middle East and African markets.[5]
ECE's
original management team had remained in place after the acquisition, and Canon
Giessen was staffed almost entirely by Germans. Tsukasa Kuge, one of the few
dispatched from Tokyo, arrived in 1973 and remained in the operation until 1975,
returning for an additional five months in 1977. Mr Kuge
recalled:
By acquiring a well-organized high technology
company with considerable experience and know-how in copier development, we were
able to start up a new production unit rather quickly. Much of the time usually
spent on the details of technology and transferring know-how was saved, which
reduced the drain on managerial and technical resources in Tokyo. After a time,
R&D activity was to be dedicated entirely to the development of Canon's
product, and the R&D activities both in Giessen and in Tokyo had to be
performed in conformity with each other, so 1 was sent
over.
In the early 1970s, the
copier market was not so segmented as it is today. We began making what we felt
would sell the best, and we planned to move up in quality. In the beginning. more than 30 people
were doing research and development. and they were creating many ideas that were
also implemented back in Japan. Over time, Giessen's R&D capability was made
smaller as production became more important.
After
two years in operation. a team of 130 people were manufacturing, 500 NP machines
(20cpm) each month under a rigid quality control programme. Production was
slated to increase at a level of 20-35% annually. Giessen's assembly process was
similar to Toride's, but on a much smaller scale. In 1975, the production
capacity at Giessen was doubled, and new lines were added to produce copier
drums and toner.
A
second plant in Europe
In August 1983,
Canon responded to an invitation from the French government to establish a
personal copier factory in Liffre, in Bretagne. At this time, Canon was also
looking at the feasibility of establishing a PPC assembly plant in Virginia,
USA.
By the
end of 1984, the Liffre plant was turning out about 3000 copiers per month, and
lines were subsequently added to produce electronic typewriters and facsimile
transceivers.
Canon's
European presence in 1986
By
1986, Canon had become a leading player in the European market. placing more
than 200,000 units out of an estimated total market of 897,780 (Exhibit 7.7).
Canon's aim was to become the world's leading PPC manufacturer. To achieve this,
the company's goal was to obtain at least a 30% unit share in the three
major copier markets: Japan. Europe, and the US (Exhibits 7.8 and
7.9).
Canon
offered the full range of copiers, from its innovative personal copier to its
NP-8000 series (up to 70cpm), which competed head-on with Xerox and Kodak
machines. In the near future, Canon planned to introduce a digital colour copier
that many believed would not only transform the office environment, but also
revolutionize the whole industry. It was rumoured that newly-emerging domestic
competitors were also developing colour copiers based on a different product
concept.
Currently, Canon had sales subsidiaries
in virtually every European country (Exhibit 7.10). as well as independent
business machine distributors that dealt with a network of retailers. Many of
Canon's European distributors sold only Canon products. In the camera business,
they relied mostly on the retailers. In calculators. they used another channel.
Business equipment needed more support, and it was becoming apparent that more
sales channels were needed to sell copiers. At the same time, Canon's machines
were becoming more expensive because of the 15.8% duty that the European
Commission had placed on most copiers imported from Japan. This temporary rate
was set in 1986, but there was an expectation that the rate would be officially
set at 20% in 1987.
|
1984
1985
1986 (estimated) |
|
Personal
copiers
82,640
111,350
110,050
81.7%
65.2%
52.8% Category 1
57,880
49,110
54,020 (up to
19cprn)
15.0%
12.6%
13.0% Category 2
30,370
28,500
31,410 (20- 39 cpm)
16.2%
15.2%
14.6% Category 3
11,960
10,050
7,330 (40-59 cpm)
19,9%
19.1%
14,2% Category 4
0
0
640 (60-89 cpm)
0
0
8.5% Total
182,850
199,010
203,450
24.7%
24.7%
22.7% |
Source:
InfoSource
S.A.
Exhibit 7.7
Canon brand.- sales
quantity and market share in Europe, 1984-1986
|
Company
1985
1986 |
|
Canon
125
138 Fuji Xerox
97
111 Konishiroku
35
35 Matsushita
6
7 Minolta
34
33 Mita
28
27 Ricoh
168
162 Sharp
33
38 Toshiba
32
36 Total
558
587 |
Source.-
Dataquest
lncorporated
Exhibit
7.8
Estimated PPC placements in Japan, 1985 and 1986, by brand (thousands of
units)
In the
low end. Canon was also finding that its copiers were not competitive enough.
Other Japanese copier manufacturers were very price conscious. Moreover,
customers were getting more sophisticated. In the past, they would accept lower
copy quality but, increasingly, they wanted superior reproduction, easy-to-use
machines with low maintenance requirements, and customers were becoming more
concerned about environmental factors. Canon needed to get a new product in this
category, or new technology.
Canon's Giessen facility was one of the
largest and most integrated copier plants in Europe, employing 400 people and
turning. out 4000 PPCs each month. Giessen manufactured NP systems in Category 2
and Category 3, together with components like photosensitive drums, the heart of
the plain paper copier. About 80 suppliers were contracted locally to provide
services and parts, including moulded casings, lids, platen glass, print boards,
paper supply cassettes, fixing, rollers, solenoids, DC controllers, halogen
lamps and low voltage electric sources. Likewise, Canon's Bretagne operation
employed about 430 people and used numerous local suppliers. Little R&D was
being carried out in either of these operations, aside from modifying designs
sent from Tokyo to meet local manufacturing, and local market needs. In
principle, the R&D laboratories
|
Personal
Segment
Segment
Segment
Segment
Segment
Segment
Total
Copiers
1
2
3
4
5
6 |
|
Adler-Royal
-
16.0
3.2
-
0.4
-
-
19.6 Canon
165.0
52.0
48.3
4,5
13.5
1.4
-
284.7 A.B. Dick
-
3.2
0.8
0.5
0.3
-
-
4.8 Gestetner
-
8.1
2.5
0.3
0.4
-
-
11.3 Harris/3M
-
34.0
13.5
5.2
-
-
-
52.7 Kodak
-
-
-
-
7.2
3.4
1.1
11.7 Konica
-
26.0
7.5
4.1
5.0
0.1
-
42.7 Minolta
5.4
26.1
25.8
1.7
0.6
-
-
59.6 Mita
-
53.0
20.4
-
4.5
-
-
77.9 Monroe
-
11,5
5,0
-
0.6
-
-
17.1 Océ
-
-
-
-
2.6
-
-
2.6 Panasonic
-
19.2
12.1
0.8
-
-
-
32.1 Pitney Bowes
-
7.4
11.8
1.8
2.0
-
-
23.0 Ricoh
5.0
26.0
7.9
3.6
4.0
-
-
46.5 Sanyo
3.6
3.1
-
0.9
-
-
-
7.6 Savin
-
17.6
2.0
12.1
5.4
-
-
37.1 Sharp
28.0
64.3
8.2
10.1
10.0
-
-
120.6 Toshiba
-
43.2
13.0
7.0
-
-
-
63.2 Xerox
-
59.0
18.0
26.3
9.1
0.8
9.2
122.4 Others
7.0
4.9
2.4
-
-
0.6
-
14.9 TOTALS
214.0
474.6
202.4
78.9
68.4
12.5
10.3
1,061.1 |
This segmentation
is based on the following criteria:
|
Segment
Speed
Typical monthly
(copies per minute)
volume range |
|
PC
under 20
N/A 1
0-20
0-10,000 2
21-30
5,000-20,000 3
31-45
5,000-30,000 4
40-75
10,000-75,000 5
70-90
25,000-125,000 6
91 +
100,000 + |
Source: Dataquest
Incorporated.
Exhibit
7.9
Estimated PPC placements in the US, 1986, by brand (thousands of
units)
that Canon set up
abroad were linked with R&D in Japan and part of the -global rationalization
of Canon's R&D effort. These laboratories were intended to serve Canon's
-global operations, not local production. Currently the General Manager of
Canon's 145-person Peripheral Development division in Tokyo, Tsukasa Kuge had
also
|
Countrv
Canon subsidiaries
Canon affiliated companies |
|
France*
1,868
- UK
1,071
- Germany
812
- Spain
-
387 The
Netherlands
384
- Finland
380
- Sweden
332
- Austria
193
- Italy
187
- Belgium
105
- Switzerland
51
- Luxembourg
9
- Total
5,392
387
Combined-total
5,779 |
* Canon Bretagne is
included in the French subsidiaries
Source: Canon
Handbook.
Exhibit
7.10 Canon’s European
distribution capabilities (number of employees as of December
1985)
been directly
involved and was familiar with Canon's European operations. Kuge
remarked:
The idea was for
Giessen to concentrate on mid-rangre copiers. We had personal copiers being.
produced in Liffre and Categories 1, 4 and 5 in Toride. We had significantly
fewer people working, in R&D in Giessen than in the beginning. Over time.
production became much more important, and it was more effective to do the
R&D in Japan.
In developing products,
Canon follows a policy of mochi wa mochi-ya. The idea is to have the
proper development in the proper place. Mochi is the sticky rice cake
that is traditionally cooked for New Year's celebrations. The raw material is
popular and the cooking process is simple. Anyone can make rice cakes, but
mochi-making is a hard and time-consuming, task. The job of making rice
cakes should belong to the lost skilful rice cake maker; namely,
mochi-ya.
Ultimately, Canon needs
to have a greater R&D capability in Europe if we are to become an insider.
We could develop this capability with some incremental investment based on
Giessen's original potential or, alternatively, we could set up a new greenfield
site in Germany or Switzerland. for instance. As well, we need to further
investigate options for locally-produced parts.
Mr. Kaku
commented:
When we first
began production in Europe….there were no compelling economic reasons to
transfer this original technology. But it is our established policy, in keeping
with our basic corporate purposes, to participate to the fullest in the
development of the societies which we serve through our
products.
In
late 1986, Elserino Piol, Executive Vice President Strategies and Development in
the Olivetti Group, travelled to Tokyo to speak with senior Canon management
about joining, forces in the copier business. Olivetti had a firm hold on 85% of
the copier market in ltaly - which represented about 5% of the total market in
Europe - and Olivetti was looking, for a way to double its share. Canon had had
some difficulty serving the southern part of Europe. and it was possible to
conclude that combining, the sales effort with Olivetti could expand the total
safes for both companies. However, it was also possible that such an alliance
would lead to conflicts between the two salesforces.
Currently,
Canon had the highest installed base in all of Europe. However, the market was
still relatively undeveloped. There was a huge potential for growth with the
coming, developments in digital technology and colour copying and the further
integration of the copier into the office environment. At this point, the
question for Canon was whether it made sense to enter a venture with a company
that was ostensibly a competitor in the copier business. Olivetti was a leading
player in the office products market with a long history in the business, and
this was an area that Canon wanted to enter more strongly in the future. For
both partners, such a venture would be a way to learn the way of thinking,
history, technology and philosophy of the other.
In the
past, Japanese manufacturers had tended to manufacture products in Japan and
then export them to Europe and North America. Early on, Canon realized that this
tendency could not continue. Canon's philosophy was to produce products in the
market where they were used. In fact, Canon was the first Japanese company to
set up a factory for copiers in Europe, which was done to have some insurance
for the future. Over the years, Canon had set up many ventures, but they had
always been built up from ground zero. The transfer of technology was much
easier this way, and it was more secure. This would be the only major joint
venture for Canon in copiers that involved manufacturing and R&D, and it
would be only the second joint venture that Canon had entered into outside
Japan. The first one, Lotte Canon, was established in
1985.
Canon's
technology was ahead of Olivetti's, so its patents, know-how and projects would
probably be put into the joint venture. Canon had just started production of a
new Category 1 copier in Toride. In looking at Olivetti's R&D and
manufacturing, capability and its sales channels, there was also the possibility
of transferring this production into such an operation. The question of
Olivetti's relationships with other OEM suppliers would still have to be
resolved. Furthermore, Olivetti's suppliers were quite different from Canon's
standards on quality, and significant improvements would probably have to be
sought on the product cost side. More than 20 years earlier, Canon had launched
programmes to study the potential of its suppliers. Although studies could be
expensive, the result often saved time and costs in terms of quality assurance.
As well, Canon came to understand the level of quality support it needed to
provide to its suppliers. As a result, Canon's suppliers had become involved in
developing Canon machines, and they operated on a just-in-time basis. This
collaboration was natural end ongoing. Moreover. through this arrangement, Canon
had gathered a lot of cost data and continually looked for ways to improve.
Typically, Canon's inventory level in Japan was less than five days. In Giessen
it was seven days, although work was ongoing to bring this level down
further.
In
Tokyo, Canon had a very different system from the one used by Olivetti and most
other European and North American manufacturers. Canon used a mass production
system, and the underlying driver was how to improve production volume within a
certain time frame. This was based on minutes and seconds, and the idea was to
look continually for ways to shorten the work cycle. Canon used conveyor belts,
and most people in the copier area worked on a 20-30 second cycle, In contrast,
in Olivetti, one person typically worked 25-30 minutes at a station and
assembled a lot of parts. The whole unit was manually pushed on a cart to the
next station, and there was usually some waiting time for the next
step.
There
were also differences in the development system. Traditionally, Canon's R&D
people concentrated on perfecting the design. There were no major modifications
once the drawing was completed and moved into production. Canon looked
continually for ways to improve the quality in each step, to make cost
reductions, and to develop products faster. In Canon, the objective was for
production costs to be reduced every year, which could be achieved by changing
the design to use cheaper parts, negotiating with suppliers for price discounts,
changing the production process in the factory to work more effectively, and so
on. This was the kind of thinking that Canon would need to transfer into a joint
venture.
Canon
had -never entered into this kind of alliance before. The challenge for both
parties would be how to adapt and how to implement changes. The key would be how
to structure such a venture, how to leave the good parts of each partner's
culture and build on a common basis. Canon had always had a philosophy of
coexistence.
University of Limburg, Faculty
of Economics and Business Administration
OverAll-Test
1
Blocks 1.1
& 1.2
Friday,
January 26, 1996
Testbook
This exam
consists of:
·
7 pages
(not including answer sheets)
·
17 closed
(true/?/false) questions, numbered C1 to C17
·
8
open-ended questions, numbered O1 to O8.
·
1
computerized answer form for closed questions
·
8 answer
sheets for open-ended questions.
·
Case study
Nestlé S.A.
·
Case study
Procter and Gamble Europe.
·
Paul J.H.
Schoemaker, AScenario
Planning: a Tool for Strategic Thinking@, Sloan
Management Review, Winter 1995, pp. 25-39.
·
Case study
Olivetti
·
Case study
Canon
Time
distribution and points awardedTime distribution and points
awarded
We advise
you to carefully allocate your time (180 minutes), taking into account the
weight of each question. Closed
(true/?/false) questions are worth one point each.
The
weights (points) of the open-ended questions are as follows:
O1 5
O3
5
O6
6
O2 5
O4
15
O7
6
O5 15
O8
3
Total
open-ended questions:
60 points
Total
closed (true/?/false) questions: 30
points
Total
OA
90 points
Testing
procedureTesting procedure
The closed
(true/?/false) questions must be marked on the computerized answer form, using
an HB pencil. The open-ended
questions must be written out on the enclosed answer sheets. Please use a black or blue
pen.
GradingGrading
The mean
test results obtained will determine the passing grade of this OAT. However, any score below 30% (27 points
out of 90) implies failure, while any score above 55% (49,5 points out of 90) is
a pass.
Answer
KeyAnswer
Key
Shortly
after the test the answer key will be posted on the bulletin
boards.
Objection
policyObjection policy
Objections
to the nature of the questions or the answer keys must be filed WITHIN 5 WORKING
DAYS after the test date.
Well-founded objections to the grading of individual questions must be
filed WITHIN 5 WORKING DAYS after the inspection date.
Graded
exams will be available for inspection on March 08, 1996, between 9:00 and
12:00. Each student is allowed to
look into his or her graded exam for 15 minutes. However, PRIOR REGISTRATION is
required, so individual exams can be pulled. Registration can take place from
March 04 till 06 1996, at the Education Desk, room 0008.
The OAT
coordinator deals with students' written objections. His or her written response
to these objections are kept on file at the Education Office, room 3064. They
will be available for review after March 22, 1996, during regular office
hours.
Form
requirements:
The form
requirements for filing objections are:
·
All
objections must be typed, and submitted in duplicate.
·
A separate
form must be used for each objection.
The top
left corner of each objection must indicate:
·
student
name and address;
·
student ID
number;
·
study (and
graduate) programme;
·
name of
test.
OBJECTIONS
THAT DO NOT MEET THESE REQUIREMENTS WILL NOT BE PROCESSED. Objections must be addressed to the OAT
coordinator, and can be submitted to the Education Office.
Publication
of gradesPublication of grades
Grades
will be posted on the bulletin boards, after February 23,
1996.
THE
OVERALL TEST
Mien S. R. Segers
Universiteit
Maastricht, School of Economics and Business Administration, Dept. Of
Educational Development and Research, Maastricht, The
Netherlands
Since the mid 80's, many new terms have enriched the assessment literature, such as performance assessment, authentic assessment, direct assessment, curriculum‑embedded assessment and a few more. Criteria for good instruction as well as good assessment practices are suggested, based on research-based models in the field of cognitive psychology and on expert-novice studies. This article first reports on the translation of these criteria into a set of the characteristics of the assessment system of a problem‑based curriculum in the field of Economics and Business Administration. Secondly, the article reports on three studies conducted to evaluate and improve the assessment and instructional practices. The first study concerns the fairness of the assessment instruments. The article presents a methodology to search for the congruence between the formal curriculum, the operational curriculum and the assessment goals. The findings of the first study suggest that within a Problem-Based curriculum it is possible to implement an assessment system which is fair to the students. Additionally, there is empirical evidence that the student outcomes can be used as one source of input for the evaluation of the instructional practice. The second study reports on the validity of the OverAll Test as an instrument to assess problem-solving skills. The analysis of thinking aloud protocols suggest there is some empirical evidence about the validity of the OverAll test as an instrument measuring problem-solving skills in the field of Economics and Business Administration. The third study addresses the question: is it important to map students’ knowledge profile as a remedial tool for problem-solving performances? The answer on this question depends on the extent to which a student’s problem-solving performance is influenced by the quality of his/her knowledge profile. Students’ knowledge profile is measured by a Knowledge Test and a Sorting Task. Students’ problem-solving skills are assessed by the OverAll Test. The results indicate that students with an organized knowledge base perform better in problem‑solving situations than students whose conceptual models are loosely structured. The implications of these findings for instruction as well as for assessment are discussed.
Introduction
One task
that credit administrators, controllers, business managers and economists in
various professional contexts have in common is that they are expected to solve
complex problems regularly and effectively. For Economics degree programs, an
import question is: how does the graduate deal with the problems s/he faces when
starting his/her professional career? Do you hope and pray the organization
which hired her/him doesn't come tumbling down? Or do you have sufficient
evidence that the graduate will be capable of dealing with the informational
load that accompanies the problem and that s/he will use it in a coherent and
integrated way to reach a solution to the problem?
Three
distinct elements can inform you about the expert status of the graduate: the
content of the economics courses studied (syllabus content), the teaching
methods adopted and the methods and results of the assessment used to determine
the success of students in solving economics problems.
During the
past decades, Economics degree programs have been subject to change in their
content and instructional methods. This process of change has seldom been
matched with changes in the assessment methods used to determine students'
outcomes (Mallier, Morwood & Old, 1990). This paper aims to contribute to
the development of appropriate assessment methods in Economics Education. The
rationale of the assessment system implemented is described. It is informed by
the findings of cognitive research on the constituent cognitive features that
underlie expert problem solving and how experts acquire their expertise. It is
based on the cognitive learning theory postulating that all learning involves
thinking. The assessment approach that suits this teaching and learning theory
emphasizes the use of a set of measurement tools, integrating conceptual
understanding and performance skills to solve authentic
problems.
The
present contribution describes the results of a number of studies trying to find
empirical evidence for quality, in terms of the validity of the instruments
adopted in the Maastricht School of Economics and Business Administration. These
studies attempt to answer to questions such as: do the measurement tools provide
a profile of students' conceptual understanding and problem solving skills? Are
they fair, this means to what extent can students be expected to meet the goals
measured by the test? Knowledge about the extent of overlap between what is
tested and what is taught is critical to the interpretation of the test results.
Furthermore, this report presents empirical evidence for some of the basic
assumptions of the assessment system adopted.
The
Maastricht economics curriculum is intended to guide students to become academic
professionals: graduates who can recognize the problems of different disciplines
within the field of economics, who are capable of analysing and contributing to
the solutions of these problems. "Problem" is the key word within this goal
definition. The Maastricht School of Economics and Business Administration
adopted a problem-based educational approach to design its curriculum.This
approach is significantly influenced by the findings of cognitive psychological
research, especially results from expert vs. novice studies.Two general
characteristics of expert performance can be identified (Yekovich, 1993;
Feltovich et al., 1993, Glaser , 1990):
C
Experts’
knowledge is coherent. Experts possess a well-structured network of concepts and
principles about the domain that accurately represents key phenomena and their
interrelationships. Beginners’ knowledge is not only patchy, consisting of
isolated definitions but they also miss the principles that lie beneath apparent
surface features of a problem presented. In contrast, experts’ knowledge is
structured and experts recognize underlying principles and
patterns;
C
Novices
often know facts, concepts, principles without knowing the conditions under
which that knowledge applies and how it can be used most effectively. “Experts
and novices may be equally competent at recalling specific items of information,
but the more experienced relate these items to the goals of problem solution and
conditions for action”(Glaser, 1990, p. 477). Dochy and Alexander (1995)
identify this type of knowledge as conditional knowledge; experts are able to
use the relevant elements of knowledge in a flexible way in order to describe,
analyse and solve novel problems.
For students to become experts, this expert profile requires the development of a learning scheme aiming at analysing, solving and evaluating problems on the basis of a deep understanding of the subject domain studied. This can be illustrated by an example taken from industrial economics (Lawson, 1992). A student sets out to study the economics of running an airline. First, s/he need to acquire access to what is known about airline operations and how economists analyse different market forms, using a range of models of the firm, from the perfectly competitive firms through to monopolies. Secondly, the students needs to appreciate the purposes and limitations of the theories for the firm. S/he need to be able to assemble the facts of the airline operation. Then s/he need to link theories with the assembled facts of airline operation. For example, describing British Airways as a regulated carrier with considerable but still limited market power within the domestic UK market, requires the student to recognise the appositeness of the regulated industry model to the facts of BA’s domestic operations.
This example suggests some principles for the instruction guiding the learning of the student. They can be summarized as follows:
C
The
curriculum should focus on clusters of related concepts. Developing conceptual
networks is enhanced when students are actively engaged in the learning process.
Students should be encouraged to manipulate and use the knowledge they are
acquiring by confronting them with authentic problems. Acquiring knowledge is
not the ultimate goal of instruction. A major goal of instruction should be
promoting understanding of important conceptual knowledge in such a way that it
can be used in analysing and working with realistic problems (Feltovich, et al, 1993).
C
Feltovich et
al. (1993)
stress that knowledge that will be used in many ways has to be learned,
represented, and tried out (in application) in many ways. Therefore, knowledge
(including concepts, models, theories) should be interrelated in diverse ways
and cases should be addressed in relation to other cases. The use of a
variability of cases involving similar concepts and similar cases embodying
different concepts, helps students to work with novel problems. Cases and
knowledge should be “revisited” from different relevant points of view and for
the purpose of answering different kind of questions.
The changes which take place when proficiency develops not only define the criteria for instruction by which competence can be developed but also the criteria by which competence can be assessed. Furthermore, instruction and assessment must be linked for at least two reasons. First, students’ outcomes provide information to use in improving educational practice only when the instruments to measure students’ outcomes match the instructional practice (English, 1992). Secondly, tests are diagnostic aids only when they identify the extent to which the goals are attained. This means that tests must be sensitive to how well students are able to use knowledge in an interrelated way to analyse and solve authentic problems.
The instructional principles described lead to the following assessment principles:
C
Assessment
instruments should measure the extent to which students possess knowledge that
is organized in a way that facilitates fast and correct recognition of patterns.
A significant dimension for assessment of competence is the presence of
interrelated concepts. Additionally, the ability to recognize principles and
patterns underlying the problem or task presented is an indication of developing
competence that should be assessed (Glaser, 1990)
C
The
assessment system should substantially focus on measuring the extent to which
students are capable of flexibly applying their knowledge to analyse and solve
novel problems. These problems should be of a real-world type. Research in the
field of mathematics suggests that such problems offer opportunities to develop
understanding in context, to develop reasoning in the subject domain studied and
to develop the making of interdisciplinary connections (Blum & Niss, 1991;
de Lange, 1992; Lesh & Lamon, 1992).
The Maastricht School of Economics and Business Administration implemented problem-based learning as a way to design the curriculum. Students are confronted with authentic problems, this means problems as they would find them in “real-life”. Because authentic problems are often not solvable within mono-disciplinary constraints, the curriculum is organised on a multidisciplinary basis. This implies that problems are discussed from distinct points of view (disciplines) such as Marketing, Organisation as well as Micro-economics. The problems are the context within which students study the basic concepts and models within the field of Economics and Business Administration. Students acquire and apply knowledge simultaneously.
The
assessment system developed follows the core idea of the organisation of the
curriculum. The acquisition as well as the application of knowledge is assessed.
Therefore, two instruments are implemented: the Knowledge Test and the OverAll
Test.
The
KnowledgeTest.
The
Knowledge Test measures primarily the knowledge of facts, the meaning of symbols
and the concepts and principles of the four particular fields of study:
Marketing and Organization, Micro-economics, Macro-economics and Accounting and
Finance. This type of knowledge is often defined as declarative knowledge
(Anderson, 1983; Dochy & Alexander, 1995). The test items require students
to reproduce and/or demonstrate understanding of their knowledge about the main
subjects studied. It is not
sufficient for students to remember or even understand isolated definitions of
domain-related concepts. They need to understand the frame of reference which
organises the distinct subjects.
The
Knowledge Test covers the domain studied within one instructional
period1. It consists of 100 to 150 multiple-choice true/?/false
format items. To assure relatively even coverage of the domain, an analytic grid
is used for the construction of the test.
Figure 1
contains some examples of Knowledge Test items.
Question
1
A very important question
is which management principles a manager should use to achieve organizational
excellence. During this century several different viewpoints have
emerged.
true/?/false
According to the contingency viewpoint, managers should analyse and
understand situational differences and choose the best solution suited to the
firm and the individual in each situation.
(True)
Question
2
The ice cream-company
“Magnus” was only producer of icecream. Today, “Magnus” is producer and seller
of icecream.
true/?/false
When the ice cream-company “Magnus” combines the producing and the
selling of ice cream under the same management, vertical integration takes
place.
(True)
Question
3
After a recession, it was
observed that employment did not rise at the same time as general economic
activity.
true/?/false
This can be explained by referring to organizational slack
resources.
(True)
Question
4
Suppose that there is
inflation, and that the Central Bank changes the rate growth of the money supply
so as to equal the long run annual growth rate of production. Suppose also that
people believe this money growth
rate will continue to equal the growth rate of production. In the following
several immediate effects are mentioned
true/?/false
As an immediate effect, the nominal interest rate would fall.
(True)
true/?/false
An an immediate effect, actual inflation would temporarily be negative.
(True)
Figure
1: Examples
of Knowledge Test Items
The four
examples given assess conceptual understanding. The first and the second
questions require students to be able to recognize the definition of the
contingency viewpoint and the definition of vertical integration. The second
question is embedded in a simplified authentic situation. It asks for more than
merely factual recall. Students not only have to reproduce the definition of the
concept of vertical integration, but apply it to the case of the ice
cream-company. Since only the relevant variables are mentioned, students do not
need to retrieve the relevant information from the case in order to be able to
identify the strategy used as “vertical integration”. For the third question, in
order to be able to give the right answer, students need to build the following
frame of reasoning: if economic activity grows, organizational activity will
grow. In that case, organizations will first use their slack human resources.
For example, by transferring people internally, they are able to meet their
increased need for personnel instead of hiring externally. As a result,
employment will not rise at the same rate as the general economic activity.
Being able to the define the concept of “organizational slack resources” is not
sufficient. The conditions for the application of these resources and the
consequences in macro-economic terms need to be
understood.
The fourth
question starts from a macro-economics case which, like the second question,
presents the critical elements for solving the problem presented. In order to be
able to answer the questions, students need to understand the various relevant
concepts (nominal interest rate, inflation, rate of growth of the money supply,
long-run annual growth rate of production). Additionally, they are required to
master the interconnections between these concepts.
As is
clear from the examples, we introduced the “question mark” option. This option
allows the students to “pass”. Students encircling the question mark option
indicate they have not mastered the subject. They are not forced to give an
answer and therefore to guess if they do not know the answer. They are not
punished for not knowing: choosing the question mark option gives a score of 0
points. On the other hand, they lose 1 point (-1) when indicating the wrong
answer. Encircling the right answer means +1 score. The introduction of this
scoring system makes guessing only attractive for students if they are
reasonably sure of the answer and if they have mastered an important part of the
test items. Therefore, if students give the wrong answer, in most
cases2 it reveals that
they “misunderstood” the objective measured. For the example in table 2, the
test results revealed that some students had constructed their own
interpretation of the meaning of a (un)differentiated marketing strategy.
Although the concept was studied during tutorials, the misconception persisted
that differentiating has to do with the target market instead of the product.
Since a quite large group of students took the risk of indicating the “false”
option, they seemed to be quite sure of their answer.
true/?/false
An undifferentiated as well as a differentiated marketing strategy is
directed at approximately the whole market.
(true)
Figure
2: An
Example of a Knowledge Test Item
The
OverAll Test.
Figure 3
presents an example of an OverAll Test item which illustrates the difference
from the Knowledge Test item as explained in the previous
section.
Case
Mexx
The case study presents the history and recent
developments in the fashion company Mexx. Main trends within the European
clothing industry are described. Mexx Fashion as a company is illustrated by its
organizational structure, its product profile and market place, its business
system, its corporate culture and some current facts and
figures.
Question 1
true/?/false
Mexx’s corporate culture and philosophy is consistent with the systems
viewpoint on management.
(False, it is consistent
with behavioural viewpoint)
Question
2
Benneton’s and Mexx’s
corporate strategies are quite different. More specifically, there are two main
differences.
a. Identify these two main differences
in corporate strategies. Illustrate your answer with examples mentioned in the
case.
b.
What are the advantages of
Benneton’s corporate strategy compared to Mexx’s approach?
Figure
3: Examples
of OverAll Test Items
The first
question is identical to the first example question of the Knowledge Test: they
both refer to the different viewpoints on management. The Knowledge Test item
requires from the students to recognize the definition of one of the viewpoints.
Memorization of the definition is not sufficient to answer the OverAll Test
item. Students have to interpret the case and select the relevant information
for this test item . On the basis of a comparison of this information with
conceptual knowledge of the different viewpoints on management, they have to
deduce the answer. The second OverAll Test question resembles the second
Knowledge Test item: they both refer to the concept of vertical integration.
However, the OverAll Test item requires students to take more mental steps to
reach the solution of the problem posed than the Knowledge Test does. For the
first part of the question (a), these can be schematized as
follows:
C
Define the
concept of corporate strategies
C
Select the
relevant information for the Mexx company as described in the case
study
C
Compare it
with the definition of the different possible strategies
C
Select the
relevant information for Benneton as described in the case
study
C
Compare it
with the definition of the different possible strategies
C
Compare
the relevant information from both cases with the definition of the
strategies
C
Define,
for each company, its strategy
C
Compare
both strategies by going back to the definition of the strategies and the
relevant information in the case study
For the second part (b), students have to evaluate. Therefore, they have to take some extra mental steps:
C
Understand
the conditions for efficiency and effectiveness for the different strategies
C
Select the
relevant information on the conditions for both companies
C
Interprete
the factual conditions by comparison with those studied in the
textbooks
This
example illustrates that the OverAll Test measures whether students are able to
retrieve the relevant concept (model, principles) for the problem.
Furthermore, it measures if they can use these instruments to solve the problem. It measures if the knowledge is usable (Glaser, 1990) or if students know “when and where” (conditional knowledge). In short, the OverAll Test measures to what extent students are able to analyse problems and contribute to their solution by applying the relevant tools.
The OverAll Test is organised within the first year curriculum as follows. After two instructional periods (blocks), the students have two weeks free for self-study. During these weeks, they study on the basis of the study manual they receive at the beginning of this period. This manual presents information about the main goals of the OverAll Test, the parts of the curriculum which are relevant for the study of the material presented in the manual, an example of an elaborated case with test items, some practical (organizational) information and finally a set of articles. The character of the articles is different. It may be a description of a case relating to innovations in or problems of a national or international firm as published in a newspaper or a journal. Other articles express theoretical considerations of a scientist, the report of some research, comments on a theory or model. During the self-study period the students are expected to apply the knowledge they have acquired over the preceding weeks, with a view to being capable of explaining the new, complex problem situations which are presented in the set of articles. They are asked, while reading the articles, to try to explain spontaneously to themselves (i.e. without being explicitly prompted by a tutor) the ideas/theories described in these articles by relating them to previously acquired knowledge. This behaviour is often called 'self-explanation' (Chi, Feltovich & Glaser, 1981). In short, the self-study period can be described as an opportunity for students to practice the analysis and synthesis of economics problems as they have learned to do in the tutorial groups. Therefore, the study manual offers them a set of new problems as described in a set of articles. Figure 4 provides an example.
Article:
Schoemaker, P.J.H. (1995). Scenario Planning: a Tool for Strategic Thinking.
Sloan Management Review, pp. 25-39.
Study
Guideline: “...The first part that deserves additional attention is the
description of the two applications of scenario planning. On these pages
Schoemaker relates his method of “scenario planning” to the various statistical
techniques you have encountered in the Quantitative Methods blocks. Don’t
restrict yourself to the role of a passive consumer of his treatment of
statistics, but take a more active position by comparing Schoemakers’ use and
interpretation of statistical concepts with that to be found in our textbook
Wonnacott & Wonnacott (W&W). To give a simple example: what is
Schoemaker’s definition of a “correlation matrix”? And how does this view relate
to the interpretation of the concepts covariance (matrix) and correlation
(matrix) as to be found in W&W ? And next, what exactly is the relationship
between the correlation matrix (such as the one in table 3) and the scenario
profiles (given in figure 1)? To phrase the last question in other words: if we
give you an arbitrary correlation matrix, could you derive the corresponding
scenario profiles?” (OverAll Test I, Information and Study Guidelines,
1995-1996)
Figure
4: An
Example of an OverAll Test Study Guideline
After the
two weeks of self-study, the OverAll Test is administered. The OverAll Test
questions refer to the articles: they assess whether the students are able to
interprete and analyse the problems as presented in the articles by applying the
concepts, models and tools they have acquired during the tutorials.
Figure 5
displays two questions for the Schoemaker article.
Question
1
In his introduction,
Schoemaker compares the method of scenario planning with other approaches such
as contingency planning, sensitivity analysis and computer simulations.
Hellriegel & Slogum (1996) textbook) give a similar comparison of three
methods: scenarios, the Delphi technique and simulation. They stress that an
overlap exists between these approaches, and indeed, it is not difficult to
imagine how to use techniques like Delphi and simulation within Schoemakers’s
framework of scenario planning.
True/?/false
The Delphi technique fits better in phase 3 of scenario planning
(identifying basic trends) than in phase 9 of the scenario planning (develop
quantitative models).
Question
2
The correlations in Table
3, Part B on p. 31 (Schoemaker,
1995) are nearly all positive, which makes the case rather specific.......Give a
new example of scenario planning by solving the following
tasks:
a.
Write down a hypothetical correlation matrix, having the same size as
that in table 3, but the number of entries with ‘+’, ‘-‘ and ‘0' more equally
distributed;
b.
Derive a scenario profile
(as figure 1) from this correlation matrix, thereby given special attention to
the existence of both positive and negative correlations. If necessary, make
additional assumptions in order to find the profile. Start with one single
scenario.
c.
Derive a second scenario
profile , assuming this second scenario to be the >reverse= scenario of the
first (reverse in its literal meaning; if the first scenario is something like
>recession=, then the second is that of >high economic
activity=);
e.
Give an interpretation of the outcomes of the scenario profiles you
constructed yourself. Schoemaker ends up with one scenario that performs best in
all possible aspects, whilst the third scenario is the worst one, again in all
possible aspects. Is the same true for the case you
designed?
Figure
5: A Example
of an OverAll Test Item
The
OverAll Test is administered twice a year. Each OverAll Test assesses the
application of knowledge from different disciplines which were studied during
the preceding two instructional periods. The Schoemaker article illustrates the
integration of knowledge in the field of Statistics with the discipline of
Organization. Knowledge from both disciplines has to be used to tackle the
problem of scenario planning.
The
OverAll Test is a paper-and-pencil test. The questions are based on the articles
studied at home. As is clear from the Schoemaker example, the OverAll Test
combines two item formats: true-false questions with the question mark option
and essay- or open-ended questions. The true-false items are mostly intended to
measure if a student can apply the acquired knowledge in a new situation, if he can use an abstract
concept in a specific, quite complex situation which is relevant for the 'real
life of economists'.
In
Schoemaker (1995) the true/?/false questions ask students to use their knowledge
about three approaches (which were studied during the tutorials) in order to
interpret the method of scenario planning as presented in the article. It is not
sufficient for students to memorize the techniques as described in their
textbook. They are required to know the interconnections between these
approaches and how they can be effectively used within the distinct phases of
scenario planning. These kinds of multiple choice questions in the OverAll Test
are set in the context of authentic problems and they are focussed on the use of
knowledge in a new problem situation. Where it is not the test’s goal to require
from the students to elaborate on the relevance of the Delphi technique for
scenario planning, the multiple choice format is considered to be appropriate.
In contrast, the open ended question asks for elaboration which cannot be
accomplished with a multiple choice format. Students are asked to analyse a new
problem i.e. deriving two scenario profiles from a correlation matrix and to
evaluate the outcomes of the two scenario profiles. The essay subtest and the
true-false subtest have the same weight. The OverAll Test consists of seven to
twelve cases or articles, describing one or more related economic problems. The
choice of this number of cases is based on the finding that because the sampling
breadth is limited, the generalizability of scores may be poor due to content
specificity (Swanson et al., 1991). These findings were confirmed by the results
of a pilot-study with the OverAll Test (Segers et al., 1991, 1992). Most
variability was explained by the interaction effect of persons and cases (35.41%
for the essay subtest and 65.48% for the true-false subtest). This means that
students who perform better for one case are not necessarily the ones who
perform well on other cases. It implies that one case has a low predictive value
for the other cases. The findings suggest that for an OverAll Test containing 12
cases, the generalizability coefficient is 0.67.
Since it is the faculty’s intention to simulate real-world situation in the assessment system, the OverAll Test is not only based on authentic cases but also has an open book character. This means that students are allowed to bring with them the study material they think they will need. As in the real world, they have resource materials available. First, students have to be able to select the proper resource materials and equipment related to the subjects of the test. But, if they cannot use it in an interpretative way, they will not be able to analyse and solve the problem posed (Feller, 1994).
Main Concerns About The Assessment
Practice
During the
last five years the faculty has gained experience with the assessment system as
described. Although there is a lot of enthusiasm, empirical evidence to
interpret the effectiveness of the system implemented was lacking. A set of
questions emerged. In this paper I will elaborate upon three of
them.
C
Are the
assessment instruments fair? To what extent are the scores on the OverAll Test
and on the Knowledge Test influenced by the match between instruction and test?
For the OverAll Test, students’ evaluations3 indicate students experience difficulties
because they have not gained enough experience in applying the acquired
knowledge within a diverse set of realistic situations. However, the OverAll
Test is based on the faculty objectives as operationalized within the study
materials of the students and the guidelines for the tutors. The question
emerges as to whether there is a lack of match between the formal and the
operational curriculum. In that case, students might be expected to have serious
difficulties answering the test (Birenbaum, 1996; Pelgrum, 1989). Especially
within a problem-based curriculum where tutors are only the guide the students
have for generating learning issues and self-study, it is important to obtain
information on this issue (Dolmans, 1993). This leads to two interrelated
questions. First, is there a match between the formal and the operational
curriculum? Secondly, to what extent does the test measure the formal and
operational curricular objectives?
C
Does the
OverAll Test measures the extent to which the students are able to use a
conceptual network to analyse authentic problems ? Or, is it just another
instrument for measuring factual recall?
C
What is
the use of a Knowledge Test? What is the value-added of a more traditional
assessment instrument ? Do the Knowledge Test scores provide additional and
indispensable information about the students’ level of expertise
?
These
questions were addressed in three studies. For each study, the theoretical
framework, research method and results will be described.
Study 1:
Are The Assessment Instruments Fair?
This study
examines the curricular and instructional validity of the faculty assessment
instruments.
Rationale.
Is it fair
to expect the students to answer the test questions? If the test is valid, or in
other words, if the knowledge assessed is part of the curriculum, the answer is
yes. This means the test content matches the formal curriculum, i.e. the
curricular objectives and the curriculum material. To check this match is the
most common method test constructors use to establish test validity. In so
doing, they assume that the objectives are actually taught. Many studies
indicate that this assumption may be questioned (Calfee, 1983; De Haan, 1992; English, 1992; Leinhardt & Seewald, 1981; Pelgrum,
1990). The operational curriculum, what is actually taught in the
classrooms, can differ to a
significantly extent from the formal curriculum as described in textbooks and
syllabi. McClung (1979) introduced the term instructional validity to describe the match between the operational
curriculum and what is tested. The overlap between the test content and the
formal curriculum is called curricular validity. A mismatch between the
formal and the operational curriculum has consequences to take into account. On
the basis of the assessment results, the faculty makes inferences on the extent
to which the faculty objectives are reached. They are one source of input for
the evaluation of the faculty practice. If the test does not measure what has
been taught, no inferences can be made about the quality of the teaching process
(English, 1992). In a summative context, when the tests are used as selection
instruments, the faculty only expects the students with a certain profile to
pass the tests. This profile is defined in congruence with the faculty
objectives. In the case of a lack of instructional validity of the assessment
instrument, how can the students be described in terms of knowledge and skills?
To what extent can the assessment results indicate if first year students will
be able to follow the second year courses which build on the first year
knowledge?
Although
instructional validity is a concern for all types of curricula, what about
problem-based curricula? In a Problem-Based Learning-setting, more than in a
conventional one, students are expected to take responsibility for their
learning process. In a conventional curriculum, teaching is the central process.
The tutor defines the objectives, the content of the courses, the ways to reach
the objectives. In most cases, the teacher directly “delivers” the information
to the students by lecturing. In the case of a local test (no national
standarized test), he constructs the test on the basis of his notes for the
lectures. In Problem-Based Learning, the path from objectives to the test is
longer and the students have more freedom to choose their own way.
Faculty
objectives
ì
Problem
ì
Discussion
in tutorial group
ì
Student-generated
learning issues
ì
Self-study
ì
Feedback
in tutorial group
ì
Assessment
Figure 6:
The Learning Process in a Problem-based Learning Setting
Faculty
objectives
ì
Lectures/Syllabus
ì
Self
study
ì
Assessment
Figure
7: The
Learning Process in a Tutor-centred Instructional Process
The
faculty objectives are operationalized in a set of tasks. These tasks present
the problem situation which students have to analyse and try to solve. Students
work on the problem in small tutorial groups. The result is a list of learning
objectives considered to be relevant in order to analyse and solve the problem
they are confronted with. Since in a lot of cases the problems are
ill-structured (complex as real problems mostly are), there may be differences
in learning objectives. The learning objectives are the starting point for
students’ self-study. They look for the relevant information to achieve their
learning objectives. In discussion with their colleagues and the tutor, they
check the relevance of the information for the problem and build the relevant
theoretical framework. At the end of the instructional period, a test is
administered to measure the extent to which they have mastered the basic
knowledge of that instructional period. How sure can we be the test is fair for
students with sometimes different paths for analysing and solving the single
problem posed? Various studies have tried to gain insight into the relation
between the formal and the operational curriculum within a Problem-Based
Learning-setting ( Coulson & Osborne, 1984; Dolmans, 1994; Shahabudin,
1987;Tans et al., 1986). They conclude that, to an significant extent, there is
an overlap between both curricula. Additionally, Dolmans (1994) investigated the
relation between the time spent during the tutorial groups on the core concepts
of the instructional period and the scores on the test referring to these
concepts. The correlation seems to be weak (r= .22, p<0.5, n=94). Probably the quality, more than the
quantity, of the time spent on problems affects test
scores.
Research
method.
Procedure.
The formal
curriculum was described by analysing the textbooks, syllabi and tutorial
manuals. The analysis resulted in a list containing more than 500 detailed
topics for each period. This extended list has been screened by domain
specialists to get a workable list. They constructed a hierarchical schema of
the list of topics. The highest hierarchical levels of the networks of subjects
are included in the final version. An example are the concepts of entry
strategies, export, licensing and joint ventures. They
were all included in the draft version. In the final version only the concept of
entry strategies was included. By this screening, the list of central
concepts has been reduced to 147 topics for the Marketing and Organization
period and 136 topics for the Macro-economics period. The curricular validity is
examined by comparing the formal curriculum with the test of the first
instructional period. The list of concepts is compared with the list of
objectives of the Knowledge Test and OverAll Test.
To examine
the instructional validity, two questionnaires were developed on the basis of
the lists of concepts. The questionnaires are a modified version of the Dolmans
Topic Checklist (1994). The first Topic Checklist (TOC1) consist of 147 topics
and eight main themes of the disciplines Marketing and Organization. An example
of the TOC 1 is presented in figure 3.The first column contains some examples of
the 147 topics of TOC1. The upper row present some examples of the eight main
themes. Its relevance will be
explained in study 3.
|
Topics |
organization + systems |
marketing mix |
consumers=behavior |
|
Structure follows
strategy |
1 |
2 |
3 |
|
Product
attributes |
1 |
2 |
3 |
|
Giffen good |
1 |
2 |
3 |
|
Lorenz curve |
1 |
2 |
3 |
Figure
8: Examples
of Topics and Main Themes in the Topic Checklist 1
Students
are asked to indicate whether the topic was discussed in their tutorial groups
or not, by encircling the topic or not. In order to gain some insight into the
quality of the time spent on the topic, the second Topic Checklist (TOC 2) on
Macro-Economics, consists of two additional questions. Students had to indicate
the level of comprehension they believed they had reached. For every respondent,
how many topics they had mastered
on each of the three levels of comprehension which were distinguished, were
counted. The levels were the level of definition, the level of
comprehension and the level of analysis. The level of definition
indicates the student is (only) able to reproduce the meaning of the concept as
formulated in the textbooks. To comprehend the topic indicates that the student
is able to define the concept in his own words, describe its relevance and its
relation to other concepts. To master a topic on the level of analysis means
that the student is able to apply the concepts when being presented with a
problem to be analysed. In addition to the students, the staff members who
developed the course were asked to indicate for each topic the intended level of
comprehension. Finally, students were asked if a topic received much, moderate
or not much attention during the tutorial meetings.
Sample.
The
sampling procedure employed in the study is a quota sample. The group of first
year students is, for organizational reasons, divided into four groups. Two
groups have their meetings in the morning, two groups in the afternoon. Students
were equally selected from these four groups. For the TOC 1, 34 student
volunteers participated, for TOC 2, 45 students.
Results
As the
results in table 1 indicate, there is a significant amount of overlap between
topics as planned for study by the staff and the topics indicated by the
students as being subject of discussions and study during the instructional
period. Table 1 indicates that on average 87% of the topics of TOC 1 and 77.4%
of the topics of TOC 2 have been subject of study (RT). Other studies
investigating the match between the formal and the operational curriculum in a
Problem-Based Learning-setting (Dolmans, 1994), show an overlap of 64.2%
(s=26.7).
Students perceive they have mastered on average 47% of the topics of TOC 2 on the level of comprehension. This means that they are able to explain in their own words the meaning of the topics, their relevance and their relation to other concepts. For, on average, 31% of the topics, students state that they are able to use these topics for the analysis of problems (level of analysis). For, on average, 22% of the topics, students indicate that they master them on the level of definition, this means “only” reproducing the definition. The correspondance with the intentions of the staff is considerable.
Table
1: The
Degree of Overlap Between the Formal and the Operational
Curriculum
|
Variables |
Mean |
Standard
Deviation |
n |
|
RT1* NRT1* RT2 NRT2 |
87% 12% 77.4% 22.6% |
17.33 15.67 12.64 25.67 |
34 34 45 45 |
|
Definition |
22.1%
(student) 20.6%
(staff) |
21.24 |
45 |
|
Comprehension |
47%
(student) 40.4%
(staff) |
22.64 |
45 |
|
Analysis |
30.9%
(student) 39%
(staff) |
16.58 |
45 |
RT: Recognized Topics (1=
TOC1. 2= TOC2)
NRT: Not Recognized
Topics
This means
that they were missing in the Topic Checklist I. Comparing the topics discussed
or not (RT/NRT) with test items content, none of the topics which were indicated
as not subject of the discussions by more than 29% of the students (percentile
25) are part of the tests. This result suggests high instructional validity of
the Knowledge Test as well as the OverAll Test.
Additionally
for TOC2, the more topics students indicate as “received much of attention
during the meetings”, the higher their OverAll Test score (r= .40*). On
the other hand, the more topics students indicate as “received moderate
attention during the meetings”, the lower the OverAll Test scores are (r=
-.32*). Probably, students acquired partial knowledge by the “small talks”
they had about the topic. This partial knowledge might impede instead of enhance
succesful problem analysis. There is only a very weak correlation between topics
which received not much attention and the test scores (r=
.01).
The second
study focuses on the evaluation of the OverAll Test criterion validity: to what
extent do the test scores reflect students= abilities to analyse and solve
economics problems?
Despite
the enthusiasm for alternative forms of assessment, stressing complex cognitive
processes and analogy with the actual conduct of problem solving, they pose some
challenging problems. As compared to traditional tests, authentic assessment
instruments measuring problem solving are often thought to be better reflections
of the criterion performances that are of importance in the students’ future
professional careers (Linn & Burton, 1994, Magone et al., 1994). Until now,
there are only a few examples of studies offering empirical evidence for this
assumption (Burger & Burger, 1994, Magone et al. 1994). Magone et al., 1994
report on the QUASUAR project (Quantitative Understanding: Amplifying Student
Achievement and Reasoning) and more precisely on the QUASAR Cognitive Assessment
Instrument. It consists of a set of open-ended assessment tasks, asking students
not only to select or produce answers but also to show their work or to justify
or explain their solutions. Magone et al. used different sources of logical and
empirical evidence for judging the validity of the assessment instrument:
well-defined tasks specifications, systematic internal and external reviews of
each task and qualitative analysis of students’ responses. This quantitative
analysis focussed on the processes underlying task performance: does the
analysis of the students’ responses indicate their conceptual understanding and
their ability to use basic concepts to solve a problem? According to Magone et
al. (1994), the results support the validity of the instrument. They suggest
that the tasks require high-level thinking and reasoning processes.
Another
source of information for the validity of a test, is the relation of the test
scores to an external criterion. Shepard (1992) describes the empirical evidence
of relations to external criteria as an integral part of today’s definition of
validity. Test are always simplifications of what we intend to measure.
Therefore, it is important to determine if and to what extent test scores
reflect other abilities than those intended. Writing skills, for example, might
confound open-ended assessment of the analysis of economics problems.
Criterion-related validity is especially important in practice for selection and
placement decisions. If the test is used to select students for graduation or
for entering postgraduate courses, a practically significant statistical
relationship should be evident between test score and relevant criterion. The
Burger and Burger study compares three instruments, two performance-based
assessment instruments measuring writing and reading skills and a
norm-referenced test series designed to measure achievement in basic skills
taught throughout the nation. The findings of the Burger & Burger study
provide some “encouraging” (p. 14) evidence for the validity of the performance
assessment instruments.
The
purpose of the study presented in this paper is to use the Burger & Burger
approach to determine the criterion validity of the OverAll Test: to what extent
does the OverAll Test measure the ability of students to define, analyse and
solve economics problems?
Procedure.
Student
performances on the OverAll Test and on a set of economics problems were
compared. Four problems were formulated by experts in the field of
macro-economics and finance. The problems deal with real-life situations. The
construction and review of the problems were guided by a set of criteria for
case writing (Leenders & Erskine, 1989; Vilsteren van et. al, 1993). The
lengths of the described economic problems are from 25 to 100 lines. Each
problem starts with an introduction, presenting information about the company
(context information), and the position of the student. The specific problem
situation is described in the body of the case. The problem description ends
with a set (max. 3) of analysis tasks. They refer to the analysis of the problem
presented as well as to the analysis of reasons (Messick,
1989).
In order
to analyse the processes underlying the problem analysis, the method of
think-aloud protocols is used. The participating students read the problem
description aloud. Then they are asked to think aloud as they analyse the
problem (Messick, 1989). In order to analyse the route students follow during
their analysis, students are asked to mention if they return to a previous
section of the problem description. Immediately after problem-analysis while
thinking aloud, students are asked to write down their response to the problem.
The
analysis of student responses (written and oral) focuses on the knowledge
structures that are used during problem solving. It does not only look at the
points of decisions between alternatives, but also attempts to map the whole
process from the formulation of hypothesis to the reaching of a solution to the
problem, the nature of the knowledge used and the cognitive operations used to
reach the solution (Patel & Arocha, 1995) The schemes for the analysis of
the responses were based on a detailed model of the analysis of the problems by
the expert-constructors. If necessary, the schemes were expanded and modified as
a sample of actual responses was reviewed and coded. Central criteria for the
coding were the amount of correct concepts, relationships between the concepts
used for problem analysis, and the correctness of the product (solution of the
problem). For the latter criterion, three categories were used: correct answers,
partially correct answers and wrong answers. Additionally in the analysis of
student responses to the problems, two categories were examined: the length of the reasoning process and the
degree to which students went straight to the aspects of the problem relevant to
the analysis (Flaherty, 1974). Finally, comparisons are made of the results of
the protocol- analysis for the three groups of students.
Sample.
The
results of the analysis of the four problems were obtained for fifteen
first-year students. The sampling procedure used in the study is a qouta sample.
From the 45 participants of the first study (TOC2), 15 students were selected on
the basis of their scores on the OverAll Tests. We devided the 37 participants
in three groups: the group of students with the 27 % highest OverAll Test
scores, the group of students with the 27% lowest OverAll Test scores and the
group in between. Five students were equally selected from these three groups.
Results4
In
general, the students with a high-score on the OverAll Test (high achievers)
performed better on the problem tasks. They identified more relevant concepts
and clusters of concepts (interrelated concepts). As presented in table 2, the
high achievers identified 63% of the relevant concepts, the low achievers 37%
and the moderate achievers 38%. Additionally, the amount of correct answers to
the analysis tasks formulated for each problem (decisions in the problem solving
process), was significantly higher for the high achievers (6.2) than for the low
achievers (2.5) and moderate achievers (3.8). The differences between the low
achievers and the moderate achievers for the amount of concepts as well as for
the correctness of the answers is negligible. If the three groups of students
are compared on the amount of partially correct answers, there is an important
difference between the three groups. The moderate achievers especially take
partly correct decisions (1.4). The low achievers take the most wrong decisions
(9.2), although the difference between low achievers and the moderate achievers
is small. Table 2 presents the descriptive statistics for the two
protocol-analysis criteria: the average amount of concepts used during problem
analysis and the correctness of the decisions made.
Table 2: The Average Amount of Concepts Used. the Average Amount of Correct Answers. the Average Amount of Partly Correct Answers and the Average Amount of Wrong Answers on a Set of Cases.
|
|
63%
(3.0) |
38%
(18.7) |
37%
(16.7) |
|
Amount of correct
answers (max.12) |
6.2
(1.3) |
2.6
(2.6) |
2.5
(3.1) |
|
Amount of partly
correct answers (max.12) |
0
(0.0) |
1.4
(1.1) |
0.2
(0.5) |
|
Amount of wrong
answers (max.12) |
5.8
(1.3) |
8.0
(2.9) |
9.2
(2.8) |
It can be
concluded that the preliminary findings of this study provide some evidence for
the criterion-related validity of the OverAll Test
What are
the merits of assessing students’ knowledge structures when the main goal of
instruction is successful problem-solving? To what extent can student knowledge
profiles serve as feedback for their problem-solving abilities? The third study
presented investigates the influence of students’ knowledge structure on their
performance on problem-solving tasks as measured with the OverAll
Test.
Research
into the differences between experts and novices in performance on
problem-solving tasks, resulted in a profile of successful problem-solvers
(Glaser & Chi, 1988; Yekovich, 1993). Smith (1991) summaries the internal
factors affecting problem-solving performance. Successful problem solving is
enhanced by
C
affective
variables, including self-confidence, motivation, beliefs
etc
C
the length
of prior successful problem-solving experience
C
knowledge
of the domain from which the problem is drawn (factual, conceptual,
procedural)
C
knowledge
of general problem-solving procedures such as means-ends analysis,
trial-and-error, etc.
C
knowledge
which is adequate, organized, accessible, integrated and accurate (misconception
free)
C
other
personal characteristics such as cognitive development, personality
etc.
The
importance of an adequate, well-organised and easily accessible conceptual
knowledge of the relevant content domain is confirmed by studies of Chi et al.
(1981) and Perkins, Schwartz, and Simmons (1988). The knowledge base serves as
the basis for the representation, analysis and solving of the problem presented.
In addition to this conceptual understanding, the successful problem solver
knows what to do, how and when to do it. Problem solving requires procedural
knowledge (Smith, 1991). The present study focuses on the influence of the
student’s declarative knowledge on his/her performance on problem-solving tasks
in the domain of economics. If the study provides empirical evidence for the
relevance of an organized knowledge structure for successful problem-solving,
examining students’ knowledge profile is a relevant instrument in the regular
instructional process aiming at successful problem-solving as well as for
remedial purposes.
Procedure.
The
procedure used is sorting concepts. According to Chi et al. (1981) and Shavelson
(1974) , this method is a valid way to try to provide an answer to the question
of to what degree the student’s knowledge is structured. The respondents are
asked to sort the concepts presented in TOC 1 within the eight main themes which
are presented (see study 1).The student’s result from the sorting task is
compared with his performance on the OverAll Test. For TOC2, the students were
not asked to classify but to indicate the level of competency they acquired for
each of the presented concepts (see study 1). This variable is correlated with
the student’s score on the OverAll Test. Finally, the student’s score on the
Knowledge test, covering the same content domain, is compared with the OverAll
Test score.
Sample.
The same sampling procedure is used as in study 1.
The
results of this study confirm previous research results on the influence of an
organised knowledge base on problem-solving performance. Correlation
coefficients (see table 3) indicate that the better students are able to
classify the concepts of the domain of marketing and organisation (TOC1), the
better they are able to analyse and solve problems within this domain. The more
concepts are wrongly classified, the lower the students’ performance on the
OverAll Test.
Correlation
of student’s score on the Knowledge Test with the OverAll Test score is even
more convincing.
Table 3: Pearson’s Correlation Coefficients between Students’ Sorting Performance and the
OverAll Test Scores.
|
|
OverAll Test
score (Total
%) |
|
CST WST NST KT score
(C-I) |
0.49* -0.1338 -0.2452 0.69** |
CST= correctly sorted
topics
WST= wrongly sorted
topics
KT score (C-I)= Knowledge
test score (Correct-minus-Incorrect score)
* statistically significant
with a confidence level of 95%
**statistically significant
with a confidence level of 99%
For the
TOC2, students were asked to the level of competency they perceived to have
acquired for each of the concepts. Correlation of this variable with students’
scores on the OverAll Test indicate the more concepts are mastered on the level
of analysis, the higher student’s score on the OverAll Test (see table
4).
Table
4: Pearson’s
Correlation Coefficients between Students’ Perception of the Level of
Comprehension
and the OverAll Test Scores.
|
Level of
Competency |
OverAll Test
Score (Total
%) |
OverAll Test
Scores/Open-ended Questions |
|
Definition Comprehension Analysis KT-score
(C-1) |
-0.43 0.06 0.29* 0.45** |
-0.12 -0.11 0.37* 0.69** |
In summary, a well-organised knowledge base seems to affect successful problem-solving as measured by the OverAll Test. There is some empirical evidence that student’s perception of mastering the concepts on the level of analysis, relates to his performance on the OverAll Test.
Contemporary
cognitive psychology suggested several changes for instruction and assessment
(Calfee, 1995). For example, the importance of knowledge application instead of
knowledge consumption and .Additionally, assessment and instruction must be
contextualized, reflective and social. On the basis of these ideas, a lot of
schools are looking for and experimenting with alternative ways to develop their
curricula. The Maastricht School of Economics and Business Administration
introduced a problem-based curriculum, intending to educate competent
problem-solvers. As a lot of schools do, the Maastricht school struggled with
the choice and the implementation of a congruent assessment system. We chose for
two assessment instruments: a Knowledge Test and an OverAll Test. The present
article intended to describe the case of Maastricht assessment system as an
example of assessment within an innovative curriculum. One of the main concerns
of the faculty was to gain empirical evidence for the quality of the assessment
system in its broad sense. In this way, the article presented a second case: a
research methodology to search for empirical evidence for the quality of
assessment innovations. Finally, with the three studies presented, I hope to
contribute to the discussions about the feasibility of alternatives in
instruction and assessment.
I
addressed three questions. When introducing student-centred programs, there is a
lot of concern about student outcomes. Do student in settings such as
problem-based programs actually conduct learning activities that correspond with
the learning activities that were intended by the faculty (Dolmans, 1994)? Will
the students work on the topics the faculty describes as essential for a
competent professional in the field? If not, is it fair to assess students on
the basis of the formal goals? To investigate these questions, a Topic Checklist
was designed as a map of the formal curriculum .This map was presented to
students in order to describe the instructional practice. This map was also used
as a blueprint to analyse the assessment goals. The study presented suggests
there is an important degree of overlap between the formal and the operational
curriculum, in terms of concepts studied as well as in terms of the level of
mastery intended and achieved. Although learning in the problem-based curriculum
is highly self-directed, students address the issues the faculty describes as
essential. Additionally, there is a sufficient congruence between the assessment
practices in terms of goals assessed and the formal and operational curriculum.
This implies that, even when there is a lot of freedom for the students within
the program, it seems to be possible to make assessment instruments fair to the
student. Additionally, because of the match between the curriculum and the
assessment practices, student outcomes are a relevant source of information
about the teaching practices.
The second
question concerns one of the main issues of performance-based assessment
instruments. Even when a the faculty develops case-based assessment instruments,
the question remains if a student’s performance on the cases has anything to do
with professional problem-solving? The second study addressed the criterion
validity of the OverAll Test. Are high achievers successful problem-solvers? The
preliminary results of the analysis of the think-aloud protocols suggest there
is some empirical evidence to answer this question confirmative. It seems that
it is possible to assess students problem-solving with assessment instruments
based on a set of authentic cases with analysis tasks.
Finally,
one of the basic assumptions of the Maastricht assessment practices was
addressed: the influence of student’s knowledge profile on his performance on
the OverAll Test. Student’s
performance on a concept-mapping task seemed to relate to his performance
on the OverAll Test. Students’ performance on the Knowledge Test indicated the
same relation between knowledge profiles and OverAll Test performances. These findings confirm the
results of research in the field of cognitive psychology: the possession of a
well-organized knowledge base is important for successful
problem-solving.
Considering the findings,
some implications for assessment as well as instruction can be formulated. The
so-called innovative assessment movement has led to a growing interest in new
forms of assessment. Some examples are: open-book exams, take-away exams,
projects, real life tasks, simulation exercises, self and peer assessment.
Assessment instruments aiming to measure students’ conceptual understanding do
not seem to fit in these ideas. They are often condemned as traditional
instruments measuring on a low cognitive level such as conceptual understanding.
However, the results presented indicate the importance of the measurement of
conceptual understanding. They suggest that the breath of the student’s
knowledge base and degree of fragmentation and structure is a relevant dimension
of assessment. Although the assessment of problem solving skills is the ultimate
goal, we should not relinquish the traditional assessment techniques.
Alternative assessment techniques such as the OverAll Test should not replace
the Knowledge Test. The use of both instruments enables a triangulation based on
a wide-range of evidence, thus increasing the quality and the validity of the
inferences drawn on the basis of the assessment (Birenbaum, 1996). If diagnosis
of the sources of poor problem-solving performance is one goal of assessment,
then the assessment should permit identification of the nature and the extent of
a student’s knowledge. If assessment can uncover more precise deficits in
students’ knowledge bases, then more specific guidelines for instructional
remediation can be made for individuals and groups with similar strengths and
weaknesses. Knowledge about the processes and products of successful reasoners
coupled with the same knowledge about less successful students provides some
instructional guidance regarding “what to teach” (Brown, Bransford, Ferrara,
& Campione, 1983).
For instruction, our results
imply that when problem-solving is a main goal, learning environments should be
designed enabling students to acquire a knowledge base which is by its nature
and extent a sufficient basis to identify, define, analyze and solve authentic
problems. The extent to which students reach this goal is an important indicator
for the design as well as the review of the learning environment.
For assessment, the
findings suggest that feedback should follow two dimensions: the breath and the
depth of a student’s knowledge profile and the extent to which this knowledge is
usable. No single assessment technique can satisfy both assessment dimensions
without presenting a distorted view of student’s capabilities (Birenbaum, 1996).
Therefore, a variety of assessment tools is preferable to a single
tool.
Notes
1.
The first year
comprises four instructional periods, called blocks, each lasting for 8
weeks.
2.
If the
psychometric data (item-test correlation coefficients) do not indicate
insufficient quality of the test item itself.
3.
After each
OverAll Test administration, students fill in a questionnaire asking for their
study strategies, the match between instruction and the test and the
difficulties they have experienced.
4.
At the time of
publication, the protocol analysis is not yet finished. Therefore, only
preliminary results are presented.
References
Anderson, J. R. (1983). The architecure of
cognition. Cambridge, MA: Harvard University Press.
Birenbaum, M. (1996). Assessment 2000:
Towards a Pluralistic Approach to Assessment. In M. Birenbaum, & F. J. R .C.
Dochy, Alternatives in assessment of achievements, learning processes and
prior knowledge (pp. 3-30). Boston, Dordrecht, London: Kluwer Academic
Publishers.
Blum, W., & Niss, M. (1991). Applied
mathematical problem solving, modelling, applications and links to other
subjects. State, trends and issues in mathematics instruction. Educational
studies, 22 (1), 7-68.
Brown, A. L., Bransford, J. D., Ferrara, R.
A., & Campione, J. C. (1983). Learning,
remembering and understanding. In J. H.
Flavell, & E. M. Markman (Eds.), Carmichaels’s manual of child psychology
(Vol.1, pp. 77-166). New York: Wiley.
Burger, S. & Burger, D. (1994).
Determining the Validity of Performance-based Assessment. Educational
Measurement: Issues and Practices, Spring 1994, pp. 9-15.
Calfee, R. (1983). Establishing instructional
validity for minimum competence programs. In G. F. Madaus, The courts,
validity, and minimum competence testing (pp. 95-114). Boston:
Kluwer-Nijhoff Publishing.
Calfee, R. (1995). Implications of Cognitive
psychology for Authentic Assessment and Instruction. In T. Oakland, & R. K.
Hambleton (Eds), Academic Assessment. Boston/London/ Dordrecht: Kluwer
academic Publishers.
Chi, M. T. H., Feltovich, P. J., &
Glaser, R. (1981). Categorization and representation of physics problems by
experts and novics. Cognitive science, 5, pp.
121-152.
Chi, M. T. H., & Van Lehn, K. A. (1991).
The content of physics self-explanation. Journal of the Learning Sciences,
1, 69-105.
Coulson, R. L., & Osborne, C. E. (1984).
Insuring Curricular Content in a Student-directed Problem-based Learning
Program. In H. G. Schmidt, & M. L. De Volder (Eds.), Tutorial in
Problem-Based Learning. A New Direction in Teaching the Health Professions
(pp. 225-229). The Netherlands: Van Gorcum.
De Haan, D. M. (1992). Measuring
test-curriculum overlap. Enschede: Febo.
de Lange, J. (1992). Assessing mathematical
skills, understanding and thinking. In R. Lesh, & S. Lamon (Eds.),
Assessment of authentic performance in school mathematics (pp. 195-214).
Washington, D.C.: American Association for the Advancement of
Science.
Dochy, F. J. R. C., & Alexander, P. A.
(1995). Mapping Prior Knowledge: A Framework for Discussion among Researchers.
European Journal for Psychology of Education, X, (3),225-242.
Dolmans, D. (1994). How students learn in
a problem-based curriculum. Maastricht: Universitaire
Pers.
English, F. W. (1992). Deciding what to
teach and test. Newbury Park California: Sage Publications Company,
Corwin Press, INC.
Feller, M. (1994). Open-book testing and
education for the future. Studies in Educational Evaluation, 20, pp.
235-238.
Feltovich, P. J., Spiro, R. J., &
Coulson, R. L. (1993). Learning, Teaching, and Testing for Complex Conceptual
Understanding. In N. Frederiksen, R. J. Mislevy, & I. I. Bejar (Eds.),
Test theory for a New Generation of Tests. Hillsdale, New Jersey:
Lawrence Erlbaum Associates, Publishers.
Flaherty, E. G. (1974). The Thinking Aloud
Technique and Problem Solving Ability. Journal of Educational research,
68, pp. 223-225.
Glaser, R. (1990). Toward new models for
assessment. International Journal of Educational Research, 14,
475-483
Glaser, R., & Chi, M. H. T. (1988).
Overview. In M. H. T. Chi, R. Glaser, & M. J. Farr (Eds.), The nature of
expertise (XV-XXVIII). Hillsdale, New Jersey: Lawrence Erlbaum
Associates, Publishers.
Lawson, C. (1992). On the relation between
course structure, teaching methods and evaluation procedures in economics.
Assessment and Evaluation in Higher Education, 17, (1), pp.
1-10.
Leenders, M. R., & Erskine, J. A. (1989).
Case Research: The case writing process. London, Ontario: University of
Western Ontario.
Leinhardt, G., & Seewald, A. M. (1981).
Overlap: What’s Tested, What’s Taught? Journal of Educational Measurement, 18
(2), 85-95.
Lesh, R., & Lamon, S. (1992).
Assessment of authentic performance in school mathematics. Washington,
D.C.: American Association for the Advanced Science.
Linn, R .L., & Burton, E. (1994).
Performance-Based Assessment: Implications of Task Specificity.
Educational Measurement: Issues and Practice, spring 1994,
5-15.
Magone, M. E., Cai, J., Silver, E. A., &
Wang, N. (1994). Validating the cognitive compexity and content quality of a
mathematics performance assessment. International Journal of
Educational Research, 21, (4), 317-340.
Mallier, T., Morwood, S., & Old, J.
(1990). Assessment methods and economics degrees. Assessment and evaluation
in Higher Education, 15 (1), 22-44.
McClung, M. S. (1979). Competency testing
programs: Legal and educational issues. Fordham Law review, 47, 6511-712.
Messick, S. (1989). Validity. In R. L. Linn
(Ed.), Educational Measurement (pp. 13-104). New York:
Macmillan.
Patel, V. L., & Arocha, J. F. (1995).
Methods in the study of clinical reasoning. In J. Higgs, & J. Mark (Eds.),
Clinical reasoning in the health professions (pp. 35-48).Oxford:
Butterworth-Heinemann.
Pelgrum, W. J. (1990). Educational
assessment: monitoring, evaluation and the curriculum. Enschede:
Febo.
Perkins, D. N., Schwartz, S., & Simmons,
R. (1988). Toward a unified theory of problem-solving; a view from
programming. Paper presented at the meeting of the American Educational
Research Association, New Orleans, LA.
Schoemaker, P. J. H. (1995). Scenario Planning: a Tool for Strategic
Thinking. Sloan Management Review, pp. 25-39.
Segers, M. S. R., Tempelaar, D., Keizer, P.,
Schijns, J., Vaessen, E., & Van Mourik, A. (1991). De OverAll Toets : een
eerste experiment met een nieuwe toetsvorm. [The OverAll Test: A first
experiment]. Maastricht: University of Limburg.
Segers, M. S. R. , Tempelaar, D., Keizer, P.,
Schijns, J., Vaessen, E., & Van Mourik, A. (1992). De OverAll Toets : een
tweede experiment met een nieuwe toetsvorm. [The OverAll Test: A second experiment]. Maastricht: University of
Limburg.
Shahabudin, S. H. (1987). Content Coverage in
Problem-based Learning. Medical Education, 21,
31-313.
Shavelson, R. J. (1974). Methods for
examining representations of a subject-matter structure in a student’s memory.
Journal of Research in Science Teaching, 11,
231-249.
Shepard, L. A. (1992). Evaluating test
validity. Review of Research in Education, 19,
405-450.
Smith, M. U. (1991). A view from biology. In
M. U. Smith (ed.) Toward a
Unified Theory of Problem solving. (pp. 1-19). Hillsdale, New Jersey:
Lawrence Erlbaum Associated, Publishers.
Spiro, R. J., Coulson, R. L., Feltovich, P.
J., & Anderson, D. K. (1988). Cognitive flexibility theory: Advanced
knowledge acquisition in ill-structured domains. In The tenth annual
conference of the cognitive science society ( pp.375-383). Hillsdale, NJ:
Lawrence Erlbaum Associates.
Swanson, D. B., Case, S. N., & van der
Vleuten, C. P. M. (1991). Strategies for student assessment. In D. Boud, &
G. Feletti, The challenge of problem-based learning
(pp. 260-274 ). London: Kogan
Page.
Tans, R. W., Schmidt, H. G.,
Schade-Hoogeveen, B. E. J., & Gijselaers, W. H. (1986). Sturing van het
onderwijsleerproces door middel van problemen: Een veldexperiment. [Directing
the Learning Process by Means of Problems: A Field Experiment]. Tijdschrift
voor Onderwijsresearch, 11 (1), 35-46.
Vilsteren, P. P. M. van, Heijden, M. P. van
der, & Arts, A. R. M. (1993). Het gebruik van casussen in cursussen van
de Open Universiteit (The use of cases in Open University courses).
COP-reeks 9301, Heerlen: Open Universiteit.
Yekovich, F. R. (1993). A theoretical View
of the Development of Expertise in Credit Administration. Paper
presented at the 1993 Annual Meeting of the American Educational Research
association, Atlanta, Georgia.
The
Author
MIEN SEGERS is Associate Professor Assessment and Evaluation at the Department of Educational Development and Research, School of Economics and Business Administration, Universiteit Maastricht, The Netherlands. She received her PhD in the field of quality assurance in Higher Education. Her current research activities are focussing on the implementation of innovative assessment practices within problem-based curricula.
2.
Maastricht Skills Test
Faculty of Medicine
-
Examples of criteria lists
Criterialist
:
nr. 07158
Field
:
Gynaecology/Obstetrics
Station
:
CERVICAL SMEAR
Production date
:
september 1996
Drawn up by
:
Hieke Kruseman
Intended for
:
4th year students 1996-1997
Date of examination
:
april 10th 1997
Time
:
20 minutes
Simulated patient
:
female
Instruments needed
:
STUDENT'S
TASKs
In this station medical
technical skills are assessed.
You are in your clercship
General Practice.
Mrs. Brown, 35 years old, visits
the surgery because of vaginal discharge, which starts approximately 2 weeks
ago.
You will receive 4
tasks.
You have 20 minutes to fullfill
these tasks.
Task I:
Take a relevant history
concerning this complaint.
Task II:
Carry out speculum examination
on the model.
Task III:
Make a cervical smear on the
model.
Task IV:
Interpret the photographs the
examiner shows and report the examiner which is the most likely
diagnosis.
Througout the examination, state
what you are doining, to what you are paying special attention, and what your
findings are.
Good
luck!
INSTRUCTION FOR THE SIMULATED
PATIENT
§
You are a 35 year old woman, you
have no children and you have never been pregnant.
§
You have a vaginal discharge
since approximately 2 weeks, wich is coloured grey-white, and doesn't smell
very well. The discharge is not bloody.
§
You have no itching
complaints.
§
Every day you have to change
clothes, sometimes twice a day.
§
You never had these complaints
before.
§
You don't know what's the cause
of the discharge.
§
Miction is not painful,
copulation isn't painful either. You have no abdominal
complaints.
§
The last menstruation was 3
weeks ago. It was in time, and lasts for 5 days, which is normal for
you.
§
You don't use
medicins.
§
You are married. Your husband
has no complaints.
§
Your husband had a vasectomy, 7
years ago.
§
The last cervical smear was
carried out 4 years ago. PAP 1 (normal).
EXAMINER'S
INSTRUCTIONS
In this criterialist will be
used a 6-point scale.
The examiners instruction gives
a global description of actions the student has to
performe.
Task-I:
History
Item 1 :
Amout of discharge?
(much)
ltching?
(no)
Stench?
(yes)
Colour of discharge?
(grey-white)
Blood?
(no)
Related to the menstrual
cycle?
(Unknown, started one week after last menstrual
period)
Recent changes in sexual
behaviour?
(no)
Cohabitation painful?
(no)
Husband complaints?
(no)
Micturition painful?
(no)
Contraception?
(sterilisation 7 years ago)
Last cervical smear?
Result?
(4 years ago. PAP 1)
Medicins?
(no)
Had these complaints
before?
(no)
Task II:
Speculum
examination
Item 2:
Preparation I
(materials).
Students prepares all the
material which is needed to carry out a cervial smear
Item 3:
Preparation
II.
The patients bladder should be
empty.
Student sits, enlighted the lamp
and direct the shine, puts on gloves and lubricates the
speculum.
Item 4:
Technique of the speculum
examination.
Brings in the speculum:
spreads the labia
puts the speculum in 45
° in the vaginal axis turns the speculum in neutral position shows portio by
slowly opening the speculum
Removes the
speculum:
pulls back the speculum slowly
inspects the vagina
removes speculum while a little opened, 45 ° in vaginal
axis
Item 5:
Findings.
Content vagina:
blood? discharge?
Portio:
position, seize, surface, erythroplaky, colour?
Ostium:
closed, opened, discharge?
Vagina:
colour?
(In the model no discharge can
be seen.)
Task III:
Cervical smear.
Item 6:
Technique cervical
smear.
Endocervix: cytobrush, turns round
twice 360 °.
Ectocervix: places Ayre
spatula in ostium en turns round twice 360 °.
Item 7:
Technique
slides.
Student marks one slide with E,
the other one with P for respectivily endo- and ectocervical
material.
Fixates slides immediately after
speading the material on the slides.
Puts slides in dispatch
box.
Task IV:
Item 8:
(Examiner shows
photographs)
1 > Macroscopical aspect of
vaginal discharge. The student has to describe the colour,
the
amount and the aspect of the
discharge, and the colour of vagina and portio.
2 > A microscopic preparation
with bacteries, leucocytes and some 'clue cells'.
Item 9:
Most likely
diagnosis:
Bacterial
vaginosis
(or Gardnerella vaginalis,
non-specific vaginitis)
The examiner collects all
the material and prepares the station for the next
student.
CHECKLIST
year:
1996-1997
year group:
4
station no:
07158
no. of items:
9
examiner
no
ID no.
Student
______________________________________________________________________________
Field:
gynaecology / obstetrics
Station:
vaginal discharge
_________________________________________________________________________________
good suff. neutr. insuff. poor absent
Task I:
History
1
Asks the right questions
0
0
0
0
0
0
Task II:
Speculum examination
2
Preparation I (materials)
0
0
0
0
0
0
3
Preparation II
0
0
0
0
0
0
4
Technique
0
0
0
0
0
0
5
Findings
0
0
0
0
0
0
Task III:
Cervical smear
6
Technique cervical smear
0
0
0
0
0
0
7
Technique slides
0
0
0
0
0
0
Task IV:
Interpretation photographs
8
Photo 1
0
0
0
0
0
0
9
Photo 2
0
0
0
_________________________________________________________________________________
Evaluation:
Criterialist:
nr 89008
Field:
Integrated: abdomen and communication skills
Station:
PAIN IN THE UPPER ABDOMEN
Production date:
november 1996
Drawn up by:
Jano Havas
Intended for:
6th year students 1996-1997
Date of examination:
june 26th 1997
Time:
30 minutes
Simulated patient:
female (age: 40 years)
Instruments needed:
stethoscope
STUDENT'S
TASKs
In this station medical
technical skills aswell as communication skills are
assessed.
You are in your clercship
General Practice.
The patient waiting for you has
come to see you for taking medical advice.
Kindly perform a consultation
with this patient.
Information of the patient you
can read at the chart audit.
After the physical examination
has finished, while the patient get dressed, the examiner will ask you 3
questions:
1
Which is the most likely diagnosis?
2
Which is the differential diagnosis?
Now, the examiner will give you the
right diagnosis.
3
What management (therapy and further investigations) you think are needed
with this diagnosis?
Now, you can finish the
consultation with the patient. The patient has been informed about this
procedure.
You have 30 minutes to fullfill
your task.
If time is left, this can be
used for feed-back.
If you understood this task,
please call the patient.
Good
luck!
INSTRUCTION FOR THE SIMULATED
PATIENT
EXAMINER'S INTSRUCTIONS: MEDICAL
TECHNICAL PART (1)
In this station two examiners
will be present: one for the medical technical part (1)
and
one for the communication part
(II).
In this criteria list a 6-point
scale will be used.
The examioners instruction gives
a global description of actions the student has to
performe.
Item 27:
Eleboration of chief
complaint
-
since one week pain in the abdomen
-
continuous
-
worse half an hour after eating
-
a gnawing pain
-
under the sternum, an area of about 3 centimeters.
Item 28:
Associated
symptoms
-
nausea
(yes)
-
vomiting
(once yesterday, no blood)
-
pyrosis
(for years)
-
had these complaints before
(yes)
Item 29:
Past medical history / family
history / intoxications
-
operated before
-
family history
-
smoking
(3-5 cigaretts/day)
-
alcohol
(in weekend 1-2 beers)
-
coffee
(normaly: 5/day, this week:1-2)
-
medicins
(none, only OTC anti-acid)
Item 30:
The student pays attention
to:
-
general impression: Is patient in pain?
-
colour of the sclerae?
-
colour of the skin?
-
posture of the patient?
Item 31:
Inspection
The sudent pays
attention to:
-
defects of the skin over the
abdomen (scars, icteric?)
-
shape of the abdomen (symmetrie or not?)
Item 32:
Auscultation
The
student:
-
listens at least in four regions of the abdomen
-
pays attention to peristaltics and murmers
Item 33:
Percussion
The
student:
-
performs percussion at least in four regions of the abdomen and of the
liver and spleen
-
pays attention to abdominal sounds and percussion
pain
Item 34:
Palpation
The
student:
-
performs superficial and deep palpation, at least in four regions of the
abdomen and finds this is painfull in the epigastric area and finds some active
defense
-
performs palaption of the colon, liver, galbladder and
kidneys
Item 36:
The
student:
-
palpates the lymphe-nodes of Virchow (supraclavicular) (if needed the
examiner asks for it)
Item 37:
The
student:
-
wants to perform a rectal examination (the examiner gives the fidings:
nothing particular)
Item 38:
Most likely
diagnosis
-
ulcus venticuli/duodeni
Item 39:
Differentialdiagnosis
-
non specific gastritis
-
malignant ulcer of the stomach
-
pathology of the pancreas (pancretitis, carcinoma)
-
pathology of the galbladder (cholecystitis,
cholelithiasis)
-
pathology of the colon (constipation, colitis, IBS)
Now, the examiner gives the
right diagnosis to the student.
Item 40:
Management
advices
Stop:
-
smoking
-
drinking alcohol
-
drinking coffee
-
other food wich worsened the complaints
The patient may not
take:
-
medication such as aspirin and NSAID
therapy
2-6 weeks:
-
cimetidine 1 d 800 mg or 2 d 400 mg
-
ranitidine 1 d 300 mg or 2 d 150 mg
The
student:
-
asks patient to visit the surgery after this period
If:
-
complaints disappeared: stop medication
-
not: continue medication untill the 8th week, patient has to visit the
surgery again after this period
Item
41:
Further
investigation:
Gastroscopy
Biopsies and search for
Helicobacter Pylori
EXAMINER'S INTSRUCTIONS:
COMMUNICATION PART (II)
In this station two examiners
will be present: one for the medical technical part (1)
and
one for the connnunication part
(II).
In this critria list a 6-point
scale will be used.
CHECKLIST
year
1996-1997
year group
6
station no
89008
no. of items
examiner no
ID no. student
_________________________________________________________________________________
Field
abdomen/communication
Station
pain in the upper abdomen
_________________________________________________________________________________
good suff. neutr. insuff. poor absent
History
v
Chief complaint
0
0
0
0
0
0
v
Associated symptoms
0
0
0
0
0
0
v
History
0
0
0
0
0
0
Physical
examination
v
General inspection
0
0
0
0
0
0
Exmination of
the abdomen:
v
Inspection
0
0
0
0
0
0
v
Auscultation
0
0
0
0
0
0
v
Percussion
0
0
0
0
0
0
v
Palpation
0
0
0
0
0
0
Right
order:
insp.- auc.-
perc.- palp.
0
0
0
0
0
0
v
Special examination
0
0
0
0
0
0
v
Rectal examination
0
0
0
0
0
0
Diagnosis
v
Most likely diagnosis
0
0
0
v
Differential diagnosis
0
0
0
0
0
0
Management
v
Advices and therapy
0
0
0
0
0
0
v
Further investigation
0
0
0
0
0
0
______________________________________________________________________________
Evaluation:
MAAS-Global, score list
November 1, 1992
© J. van Thiel, H.
Kraan, J. van Dalen
Univexsity of Limburg, Mautricht, 7be Netherlands
EVALUATIOI <FORMATION
1 2 3 4 5 (> 7
Doctor:
name
group
number
case
patient
observer
Interpretation scal 1
through 7:
1 = absent or very
bad
4 = doubtful
5 = sufficient
2 = bad
6 = good
3 = insufficient
7 = excellent
Consult the criteria
list MAAS-R2 and MAAS-Global (not yet available in Englisch) if you are not sure
about the interpretation of an item. Score-boxes serve only as memory aid. The
ultimate scoring is by global judgement.
FOLLOW-UP
CONSULTATION
1 2 3
4 5 6
7
Recapitulates
complaintsand
questions
of last consultation
o
recapitulates managament
plan
o
checks fot performance
of plan
o
checks for effect on
course
o
ENTRY
1 2 3
4 5 6
7
tells name and
function
o
asks for or verifies
personalia
o
GLOBAL ORIENTATION
1 2 3
4 5 6
7
short oriantation on
complaint
and degree of suffering
o
questions other reasons
for visit
o
REQUEST FOR HELP
1 2 3
4 5 6 7
mentions/explores reuest
for help,
Wishes or expectations
o
inducement of visit
now
o
explores open in frame
of reference of patient
o
responds to cues
o
QUESTIONNING DURING
HISTORY-TAKING
1 2 3
4 5 6
7
variety in
questions
o
relevancy of questions
(made) clear
o
is allert for following
by patient
o
PHYSICAL
EXAMINATION
1 2 3
4 5 6
7
instructs about
undressing
o
informs about
examination
o
treats patient with care
and respect
o
EVALUATION
INFORMATION
1 2 3
4 5 6
7
informs about findings
ans (provisional) diagnosis
o
tells aetiology and
prognosis
o
MANAGEMENT PLAN
1 2 3
4 5 6
7
deliberation as well as
proposal
o
alternatives, advantages
and disadvantages
o
feasibility ans
compliance
o
arrangements who, what,
when
o
EVALUATION OF THE
CONSULTATION
1 2 3
4 5 6
7
general question,
answering the request for
G
help, discussing our own working method
PROVIDING
INFORMATION
1 2 3
4 5 6
7
announcement,
catagorizing
o
in small amounts,
concrete explanation
o
comprehensible
language
o
ask about reaction and
comprehension
o
EMOTIONS
1 2 3
4 5 6
7
ask for/ explores
feelings
o
reflections of feelings
(incl. nature and intesisty) o
assimilative reactions:
deals first with feelings
o
sufficiently in entire
consultation
o
SUMMARIES
1 2 3
4 5 6
7
(recapitulations,
paraphrases or summaries)
concise, in own
words
o
correct for content,
complete
o
checking
o
sufficiently in entire
consultation
o
ORDERING
1 2 3
4 5 6
7
announcemnts (diagnostic
procedure,
o
history-taking, examination, other
phases)
distributes available
time well-balanced
o
explores request for
help mainly in the beginning o
management plan after
evaluation/information
o
NATURALNESS
1 2 3
4 5 6
7
flexible communicative
behaviour
o
no diruptive
hesitations
o
spontaneous and
natural
o
attunes own style to
patient
o
EMPATHY
1 2 3
4 5 6
7
attitude emphatic,
attentive, inviting
by word, behaviour and eye contact
o
when conflict: room for
patient’s arguments
as well as for own arguments
o
FURTHER
FEEDBACK:
Production date
:
september 1996
Drawn up by
:
Bert Zonneveld
Time
:
10 minutes
Instruments needed
:
STUDENT'S
TASK
In this station medical
technical skills are assessed.
On the cushion you see a
wound. The wound has been inspected and cleaned.
Perform infiltration
anaesthesia.
Put on the
sterile gloves and apply two superficial sutures.
Throughout the procedure
mention aloud what you are doing.
Good
luck!
EXAMINER'S
INSTRUCTION
Please remove the sutures before
the next student enters the room. Display all instruments
in the same way as at the start
of the test.
In this criterialist a 3-point
scale will be used.
Judge whether the student
performs well or wrong, or if he/she didn't perform at
all.
Item 16
Two sintels back, one forward
and again one sintel back.
Item 18 and 19
The student performs in such a
way that there is no chance to contaminate him/herself.
CHECKLIST
year
:
1996-1997
yeargroup
:
4
station no
:
02195
no. of items
:
19
examiner no.
:
ID no. student :
Field:
therapeutical skills
Station:
suturing
good
wrong
didnot do
Material for anaesthetics
1
5 ml syringe
0
0
0
2
2 injection needles 0.8 x 40 mm.
0
0
0
3
desinfectant
0
0
0
Technique
4
The student controls the injection fluid
0
0
0
5
The top of the desk is disinfected
0
0
0
6
The fluid is drawn into the syringe
0
0
0
7
The needle is replaced by a
sterile needle
Before injection
0
0
0
8
The syringe is clear of air-bubbels
0
0
0
The administration of the anesthetics
9
The skin is disinfected
0
0
0
The
examiner now tells the student that he/she can contine suturing supposing that
the anaesthesia is administrered
Suturing
10
The student aseptically puts on
the sterile gloves
0
0
0
11
The wound is spread to
determine
0
0
0
its depth
12 The
student graps the edge of the wound
0
0
0
with the tissue forceps at the piont where
the needle is to be insert
13 The
student inserts the needle peripend-
0
0
0
icularly to the
skin
14
Across the bottom of the wound
the student 0
0
0
Inserts the needle through the
opposite edge
of the
wound
15 The
second piont of insertion lies directly 0
0
0
opposite the first, at a similar distance of
the edge of the wound (4 mm)
16* The suture is
made with use of the
0
0
0
fixation forceps
17 The
knot is located on the piont of
0
0
0
insertion
Results
18* The sterile
materials have remaind
0
0
0
uncontaminated
19* The wound has
remaind uncontaminated
0
0
0
by clothing or skin
20 The
edges of the wound have approximated
0
0
0
properly
Feedback:
3.
Maastricht Progress Test
Faculty of
Medicine
-
Extracts from progress test
September 1990
STUDENT ASSESSMENT PROJECT
(SAP)
Progress Test September
1990
UNIVERSITY OF
LIMBURG
Faculty of
Medicine
INSTRUCTIONS
·
Read these instructions carefully
before you start.
·
Check if there are any pages missing
in your copy of the test and ask for a new copy if
necessary.
·
Each question comprises one or more
statements which must all be answered SEPARATELY. The numbers of the statements
correspond to the numbers on the answer form.
·
In questions 162 and 163, reference
is made to photo no. 199. This photo is given on a supplementary
sheet.
·
Some questions contain a piece of
text between brackets which serves to clarify the question. This is meant as
supplementary and ALWAYS CORRECT information and as such does NOT need to be
evaluated.
·
The answer form contains your
name, examination number and year of study. Please do not make any
alterations! Any mistakes should be reported to the supervisor of the Office
of Educational Administration (Bureau Onderwijs).
·
Answer the questions by filling in
one option box per question. This should be done with an HB (= soft) pencil.
Read the relevant instructions on the answer form.
·
The answer form should be handed in
no later than 1:00 p.m.
·
The result is calculated by
subtracting the number of incorrect answers from the number of correct answers,
with question marks being counted as zero; in other words:
correct = + 1
incorrect = -
1
? = 0.
Result = correct minus incorrect.
·
Comments on INDIVIDUAL QUESTIONS
should be handed in on a separate sheet. These comments will be taken into
account in deciding whether or not to cancel a particular question before the
final results are computed. They should be well legible and should be handed in
as soon as possible, but no later than next TUESDAY AFTERNOON at the office of
the:STUDENT ASSESSMENT PROJECT (Project Evaluatie van
Studieresultaten)
·
The answer key can be obtained at
the information desk of the Office of Educational Administration (Randwijck)
from 1:00 p.m..
·
To ensure that the examination runs
smoothly, the relevant regulations are included on the final page of the
booklet.
GOOD
LUCK!
RESPIRATORY SYSTEM - CATEGORY
I
questions 1 -
28
1.
Narrowing of the middle meatus of the nose (e.g. as a result of swelling
of the mucosa) leads to obstruction of drainage from the maxillary sinus. lit.: Moore,
Clin.Or.Anatomy, 2nd ed., pp. 957-958
Metal-fume fever is a not uncommon
disorder in which the person affected complains of dry cough, nausea, sweating,
shivering, malaise and fever and which is usually self-
limiting.
2.
This disorder more frequently occurs after exposure to zinc fumes than
after exposure to mercury vapour.
lit.:
W.R. Parker, Occupational Lung Disorders, 1982, p.
454
Every T1-2 N0M0 staged squamous-cell
carcinoma of the lung should be treated surgically, provided that the patient's
condition allows the operation.
3.
In order to increase the chance of cure, pneumonectomy should, in the
majority of cases, be given preference to lobectomy. lit.: de Boer,
chirurgie, 1988, p. 636
Possible treatments for acute airway obstruction in asthmatic patients
are:
4.
inhalation of cromolyn;
5.
inhalation of a beta-2-adrenergic agonist;
6.
intravenous injection of aminophylline.
lit.:
Wesseling en Neef, Algemene Farmacotherapie, 1985. p.
527
A decrease in the elasticity of the
lungs (compliance 4 L/kPa; normal value 2 L/kPa) leads to:
7.
an increase in functional residual capacity;
8.
an increase in tidal volume.
lit.:
Bernards & Bouman, Fysiologie van de mens, 1988, chapter
16
A patient with a renal disorder
shows metabolic acidosis.
9.
Hypoventilation contributes to compensating for
thisacidosis.
lit.:
Bernards & Bernards, Fysiologie van de mens, 1988, chapter
16
In most cases, acute bacterial
inflammation of the nasal sinuses is accompanied by pain. This is usually pain
above, behind or under the eye.
10. Maxillary
sinusitis is, in the majority of cases, accompanied by pain above the
eye.
lit.:
Jongkees, Keel-, Neus- en Oorheelkunde, 1983, p. 84
A five-year-old boy is brought to
his family doctor's surgery with fever (39.5 oC*), swelling of the right upper
eyelid and swelling round the nose. An inflammation of one of the nasal sinuses
is a likely diagnosis.
11.
Frontal
sinusitis is more probable than ethmoid sinusitis.
lit.:
Jongkees, Keel-,Neus- en Oorheelkunde,1983, p. 88
A thirty-year-old man tells his
family doctor that he suddenly got the shivers, followed by a rapid rise in
temperature and within a few hours severe pain, connected with breathing, over
the left side of his chest. He also complains of coughing and bringing up
rust-coloured sputum. Physical examination reveals a very ill man with rapid,
shallow respiration. On examination of the thorax, there is dullness of
percussion at the lower left side of his back and there is a pleural friction
rub.
12. These
symptoms are more likely a manifestation of bacterial pneunomia* than of
pulmonary infarction.
lit.:
Harrison's Principles of Internal Medicine, 7th ed., pp. 767,
936
The compliance of the lungs is
defined as the ratio of the change in total lung capacity to the change in
intrathoracic pressure.
13.
The
compliance of the lungs of normal babies is greater than that of normal
adults.
lit.:
Nelson's Textbook of Paediatrics, p. 926
14. Coxsackie viruses
more frequently cause infections of the upper respiratory tract than of the
lower respiratory tract.
lit.:
Nelson's Textbook of Paediatrics, 1979, p. 1172
15. In children
with a cleft palate there is an increased risk of functional disorders of the
auditory (Eustachian) tube.
lit.:
Gerlings, Keel-, Neus- en Oorheelkunde, 1979, p.
182
Spirometry is used to measure certain lung volumes. The total lung
capacity is defined as:
16. the sum of
residual volume plus inspiratory reserve volume;
17. the sum of
functional residual capacity plus tidal volume and inspiratory reserve
volume;
18. the sum of
residual volume plus inspiratory vital capacity*. lit.: Bernards
& Bouman, Fysiologie van de mens, 1977, p. 352
An acute form of extrinsic allergic
alveolitis (e.g. farmer's lung) is accompanied by shortness of
breath.
19. Pulmonary
function tests are more likely to reveal an obstructive functional disorder than
a restrictive functional disorder in this case. lit.: Sluiter,
Leerboek Longziekten, 1985, p. 413
Certain lung disorders can be
accompanied by an increase in the level of angiotensin-converting enzyme (ACE)
in the blood. Such disorders include:
20.
sarcoidosis;
21.
mucoviscidosis (cystic fibrosis).
lit.:
Sluiter, Leerboek Longziekten, 1985, p. 711
Primary bronchial carcinomas are
often accompanied by infiltration of adjacent structures in the thoracic cavity.
Involvement of the sympathetic trunk leads to a complex of signs and symptoms
known as Horner's syndrome. These signs and symptoms
include:
22. ptosis of
the eyelid;
23.
hoarseness;
24.
miosis. lit.: Sluiter,
Leerboek Longziekten, 1985, p. 258
A 24-year-old man is hospitalized
because of an acute, severe attack of bronchial asthma which does not respond to
his regular therapy (administration of salbutamol and beclomethasone by
aerosol). On admission, the patient is agitated, his pulse rate is 120/min and
the peak expiratory flow rate measures 80 l/min (normal value 600-650 l/min).
Blood gas analysis reveals a normal pH and a normal arterial PO2 and
PCO2.
25. These
findings form an indication to start artificial respiration.
lit.:
Sluiter, Leerboek Longziekten, 1985, p. 239
The position of the oxygen
dissociation curve is determined by the P50, the oxygen tension needed to
achieve 50% saturation of haemoglobin. A normal P50 value is 3.6 kPa. A P50
lower than the normal value indicates a shift of the oxygen dissociation curve
to the left; a P50 higher than the normal value indicates a shift of the oxygen
dissociation curve to the right.
26. A left
shift of the oxygen dissociation curve promotes oxygen release at tissue level
more than a shift of this curve to the right. lit.: Sluiter,
Leerboek Longziekten, 1985, p. 54
Vascular resistance is influenced by
the arterial carbon dioxide tension.
27. A decrease
in the arterial carbon dioxide tension (hypocapnia) leads to increased vascular
resistance in the cerebral circulation.
lit.:
Sluiter, Leerboek Longziekten, 1985, p. 61
Arachidonic acid is converted into
prostaglandins and thromboxane A2 by the enzyme cyclooxygenase and into
leukotrienes by the enzyme lipoxygenase.
28.
Administration of nonsteroidal anti-inflammatory drugs inhibits the
synthesis of prostaglandins.
lit.:
Sluiter, Leerboek Longziekten, 1985, p. 121
4.
The Thesis Supervision
Experiment
THE THESIS SUPERVISION
EXPERIMENT
In:
REDISTRIBUTING POWER IN THE
CLASSROOM: THE MISSING LINK IN PROBLEM-BASED LEARNING
A. Georges L.
Romme
Maastricht University
Dept. of Management Sciences
P.O. Box 616, 6200 MD Maastricht, The
Netherlands.
E-mail: s.romme@mw.unimaas.nl
Forthcoming in:
Troy, J. et al. (eds.), Learning in a Changing
Environment
Dordrecht/London/Boston: Kluwer Academic Publishers,
1998.
In
order to find out whether applying the notion of circularity of power would
stimulate active learning by students, an experiment was set up in the area of
thesis supervision. From an experimental point of view thesis supervision
involves an interesting setting for exploring the relationship between power and
learning. It typically involves a formal relationship between student and
supervisor in which the supervisor has all formal authority regarding the final
assessment of the thesis. More specifically, the main proposition at the time
was that a thesis circle, in which a group of students working on their masters
thesis and their supervisor(s) collaborate on the basis of equivalence in
decision-making (or circularity of power), will provide a learning system in
which active learning, dialogue and collective problem-solving
prevail.
The experiment was started when
several students in the winter of 1995/96 expressed their preference to do a
masters thesis and/or internship project in the area of circular organizing.
These students were largely motivated by their experiences in an intensive
skills course on Circular Organizing (given by the author) which is part of the
3rd year curriculum. Key steps in the experiment were the adoption of
a set of rules for organizing the circle and circle meetings, and the
development of an evaluation procedure based on the consent or “no objection”
principle.
In April 1996 seven interested
students were invited by the author (as the supervisor of their thesis projects)
to participate in a start-up meeting. During this meeting the potential
objectives and procedures of the circle were discussed. The decision was taken
to focus on the supervision of thesis projects on the basis of circular
principles, which basically implied students and the supervisor would share the
responsibility for supervising, evaluating and grading a number of undergraduate
thesis projects.
Since its start, the circle has met
regularly over a total period of more than two years (with about fifteen
meetings per year, and each meeting taking about 3,5 hours). Starting with eight
members (including one supervisor) in April 1996, the circle grew to a
membership of eighteen to twenty members (incl. two supervisors) in April 1998.
In the first year of its existence, three students have completed their thesis
(and masters degree). In the second year six students have completed their
thesis projects. The formal arrangements of the circle are currently as
follows:
·
In addition to the supervisors as
permanent members, the membership of the circle includes students who are doing
(or intend to do) a thesis project in the area of the theory and practice of
circular organizing, under supervision of the circle.
·
In order to
get access to the circle, the student should have acquired knowledge and skills
in circular organizing on the level of a two-week intensive skills training
(which is part of the 3rd year curriculum) or a similar course
elswhere.
·
New members can be proposed by each
current member of the circle.
·
The membership of the circle ends
with the completion and final assessment of the masters thesis, except decided
otherwise.
·
Each circle meeting proceeds
according to a standard format involving four parts: (1) an opening round, (2)
determination of the agenda (in principle on the basis of a proposal sent out to
all members prior to the meeting by the secretary), (3) discussion and decisions
on the agenda issues, and (4) a closing round.
·
Decisions are taken by way of the
consent (“no argued objection”) principle; that is, a decision is taken when all
participants have no argued objection against the proposed decision. Decision
issues include, for example: the circle’s objectives, its work procedures (e.g.,
preparation of agenda), the election of the chairperson or secretary, proposals
for research projects, the general criteria for assessment of a thesis, and the
final assessment and grading of a masters thesis.
·
In the case of the final assessment
of a masters thesis, a decision procedure is followed in which the student who
wrote the thesis has no consent right; that is, (s)he can participate in the
discussion but not in the formal decision round(s) in which the chair asks
consent of each other member of the circle.
·
Both the chair and secretary are
chosen for a limited period (e.g., three months) by way of an election procedure
based on the consent principle.
·
The supervisors act as functional
leaders of the circle, within certain boundaries that are determined by the
university’s exam regulations and decisions taken by the circle. For example,
the circle can delegate the authority to accept new members to the
supervisor.
In
this section the experiences with and (preliminary) outcomes of the thesis
circle experiment are described and evaluated. The observations described here
are those of researchers who are also actors in the system they are studying. In
order to generate valid data, we have tried to publicly test all observations
and conclusions presented. Part of this process was a special session of the
circle in which the first year of the circle was evaluated, and critical
episodes and situations were identified. This session focused on questions such
as ‘What is essential to the meetings of this circle?’ and ‘Which incidents or
events in the first year of this circle are noteworthy or
important?’
One of the observations made was
that the actual discussions on thesis projects during circle meetings developed
into an ongoing dialogue between the members of the circle, without any extra
effort needed to move in that direction. In the context of this dialogue, the
traditional difference between the roles of supervisors and students to some
extent disappeared. The conventional idea of the relationship between (thesis)
supervisor and student can be described as an expert who is leading the student
through her or his individual learning process. In the thesis circle this
traditional notion of the role of the leader/supervisor was abandoned almost
instantly, in favor of a role as educator, coach or facilitator. In other words,
the supervisors acted as a useful resource rather than as someone in charge.
Moreover, in this respect the supervisors appeared to act as a role model for
student members who also started to learn how to use these kinds of skills,
particularly in the area of coaching and facilitating problem solving by other
students. Evidently, some students very quickly began to learn how to use the
advocacy and inquiry skills of the supervisors.
Advocacy and inquiry in this respect
particularly involves interpersonal communication skills that serve to stimulate
the individual and the group to explore the deeper issues, images and problems
inhibiting the learning process of this particular student. For example, a
typical inquiry might be: “I’ve heard you talking about several ideas you have
in mind here, but I would like to know what really motivates you in this
project?”. A typical advocacy might involve recommendations for a certain
methodological approach, theoretical idea, case study, or time schedule. Note
that all kinds of defensive behaviours came into play during meetings: face
saving, stereotyping, intellectualization, victimization, etcetera. But the
point is that these defensive behaviours did not appear to inhibit learning, but
rather they appeared to stimulate the learning process because the anxiety, and
more general the emotional side of learning, could be discussed openly.
Effective inquiry and advocacy here also involves remaining open, authentic and
vulnerable, and in this way serves to create awareness and disarm these
defensive behaviours.
During the first year of the
circle’s existence, a number of critical incidents can be identified. The
most important critical incidents involved the assessment and grading of the
first and second thesis projects that were presented to the circle in their
final version. These first assessments by consent rule can be seen as the first
critical tests of the extent to which the equivalence in decision-making would
not be undermined by, for example, group pressure or differences in expertise or
(informal) authority. In this respect, the experimental procedure we used in the
first assessment (in August 1996) was not perceived as adequately, particularly
regarding the interdependency and interaction between individual assessments at
an (too) early stage in the discussion. In order to create more clarity in this
area, we adapted the assessment procedure for the assessment of the second
thesis (two months later). Our main reference point here was the election
procedure developed for circular organizations (e.g., Endenburg,
1992).
This assessment procedure is based
on the following general ideas. First, the assessment should be based on
arguments and dialogue rather than authority. Second, the process leading to the
final assessment should start as open as possible; that is, interdependency and
interaction between the initial individual assessments should be diminished as
much as possible. Third, a process of argumentation and dialogue should
subsequently serve to increase interdependency and interaction in order to come
to a final grade that is well-argued and broadly accepted. The procedure
involves the following steps:
·
Before the meeting in which the
assessment takes place, all circle members are informed in writing about the
assessment (as an item on the agenda); they all receive a copy of the thesis at
least one week before the meeting.
·
The first step during the meeting is
to establish the criteria on which the assessment will be based; these criteria
should conform to the university’s formal regulations, although they will tend
to be more specific than the latter. In principle, a set of criteria has been
established before the individual student starts working on his or her thesis,
but in assessing the final version of a particular thesis these criteria may be
supplemented with more specific criteria related to the nature of this thesis
project.
·
Subsequently, each participant is
asked to write his/her own name and the proposed grade (the quantitative
assessment) of the thesis on a piece of paper. The formulation of this proposal
is, for example: “I, Paul propose a 7.” This step is a crucial one, because it
reduces the initial interdependency among individual assessments to a
minimum.
·
All proposal forms are handed in
with the chairperson, who then starts asking each participant to state the
arguments which prompted him or her to propose this particular grade. For
example, the chair will ask: “John, you proposed a 6 for Maria’s thesis, could
you motivate this proposal?”. The chair makes sure there are no discussions when
each participant argues for his or her proposal.
·
The next step is for the chair to go
around the group once again, now asking whether anyone would like to change his
proposal in view of the arguments heard in the previous round. This step is the
first one where interaction between arguments and proposals is deliberately
allowed, and the discussion is led by the chair in order to make it as open and
visible as possible. Typically, most participants will stick to their initial
proposal, but some may want to change their proposed grade, for example, because
“having heard the arguments on the readability of Maria’s thesis given by John,
I would like to change my proposed grade from 7 to 6.”
·
Now an open, relatively unstructured
discussion may develop, in which arguments are tested, questioned, clarified,
compared, ranked, and so forth. This step is in practice sometimes skipped,
particularly when the chair feels the arguments and proposals are converging to
a large extent. (In that case, the chair moves on to the next
step.)
·
At some point, the chair will
propose to decide on a certain grade motivated by a summary of the main
arguments raised thus far. For example, the chair may propose to assess Maria’s
thesis with a “6 as final grade, in view of its high practical relevance and
well-designed structure as strong points, but its readability as the main weak
point.” The chair will go around the group to ask each individual participant to
give consent to this proposal. At this stage, at least one participant typically
withholds his or her consent. Depending on the (additional) arguments given and
any subsequent discussion(s) the chair will then either adapt his/her earlier
arguments for the same proposed grade or move to another proposal. This process
continues until the circle agrees on a proposal by consent of all those
participating in the same (or possibly the next) meeting.
This assessment procedure appeared
to work very well in case of the second thesis evaluated, and since then we have
been using it in all subsequent assessments. Note that most students, when
entering the circle, already were familiar with this kind of procedure, as a
result of having participated in a course in Circular Organizing in which a
similar procedure is studied and used, for example, to choose a chairperson.
Moreover, the latter procedure was also used several times during the first year
of the circle for choosing its (new) chairperson and secretary. Thus, by the
time we started developing and applying this assessment procedure, most
participants already were familiar with the key ideas behind this assessment
system.
It should be recognized that, given
the fact that most students working on their thesis tend to complete their
project close to a certain deadline (e.g., determined by bursary arrangements or
university restrictions, such as the maximum period of enrollment), there may be
a lot of “external” pressure for convergence toward a consent decision. In fact,
this kind of personal interest of the student submitting the thesis for
evaluation has played a substantial role in at least one masters thesis that was
evaluated in the circle’s first year.
Another kind of critical incident
involved a number of situations in which a student member criticized certain
initiatives or interventions of one of the supervisors. For example, one of
these incidents involved a student who recently started exploring the area where
he wanted to do research for his masters thesis, but who felt that the
supervisor tried to demotivate him from going a certain direction he himself
strongly preferred, whereas at the same time the same supervisor appeared to be
moving in that same direction outside the circle! When this student raised this
issue during a meeting, the student and supervisor got the opportunity to
clarify, explore and adjust their intentions and expectations. What appears to
be essential for students in this and several other cases, was the real
opportunity they had (and felt) to express their anxiety, doubts, critical
questions, etcetera. For the supervisors, these situations serve to built
awareness of the implications of their actions inside and outside meetings,
particularly regarding the effectiveness of their interventions in the learning
processes of students. In more general terms, the experience of all participants
in this circle’s meetings is perhaps best described in the words of one of the
student members:
“In all tutorial groups I did, I was
always struggling with two problems: ‘how to get my ideas or points across’, and
‘how to make sure I’m being heard?’; the latter issue is no longer of concern to
me in this circle, I don’t have to worry about having and using opportunities to
participate in the discussion, so I can concentrate entirely on the
contents.”
Finally, it should be acknowledged
that this experiment may have profited from several beneficial conditions and
circumstances. For example, the experiment almost immediately raised the
interest of the community of practitioners, in the form of suggestions and
proposals for research projects and internships. In addition, the supervisors
had been experimenting with circular organizing and advocacy and inquiry skills
long before the start of this project. These conditions may have provided extra
support in trying to overcome the effects of a limited learning system in order
to move toward a learning system in the true sense of the
word.
Concluding
Remarks
To
a large extent both students and teachers are apparently thrown in at the deep
end in the PBL-based curriculum (cf. Keizer, 1995). That is, students as well as
teachers are far from well-prepared for the fundamental transformation that
should take place when adopting a problem-based learning approach based on the
idea of self-directed learning, both at the individual and the group level. The
work of Argyris and others suggests that this transformation from a limited to
an effective learning system requires shared leadership and control as well as
skills in the area of advocacy and inquiry which serve to confront and remove
defensive routines inhibiting learning.
Moreover, on the basis of ideas
about the circularity of power we argued that sharing leadership and control
between students and professors would reinforce the self-directed nature and
thus effectiveness of PBL, particularly in the area of dialogue and
problem-solving. This initial hypothesis was not falsified by the outcomes of
the thesis circle experiment described in the previous section. The discussions
during meetings can be described in terms of an ongoing flow of ideas, problems
and solutions, which is characteristic to dialogue (Senge, 1990). The nature of
collective problem-solving was perhaps the most unexpected outcome. Of crucial
importance appears to be that several students quickly learned to use inquiry
and advocacy skills, and also that a final assessment approach consistent with
the notion of circularity of power was developed.
Thus, the effectiveness of PBL (and
perhaps other learning approaches) as an effective learning system would
probably benefit from shared control over the learning process including
assessment and evaluation. In this respect, an effective learning system is
based on shared leadership and control and core values such as valid
information, free and informed choice, and internal commitment. The principle of
shared control and leadership easily becomes an ideological device rather than a
practical tool to improve the effectiveness of the learning system, particularly
if it is not organized in a straightforward manner. The notion of circularity of
power, and more broadly the circular organization approach, appears to provide
such a practical tool.
Note in this respect that assessment is a
controversial and somewhat neglected issue in self-directed and problem-based
education (Rogers, 1969; Williams, 1992). Some programmes tend to rely on
self-evaluation as an input for the final decision of the instructor or tutor
(Rogers, 1969). In other programmes, there are no examinations at all, and
tutors carry out the assessment on the basis of direct observation of the
student’s learning process (Williams, 1992). There are also programmes that
apparently rely on examinations in the traditional sense, and again others have
developed examinations that try to test the problem-solving abilities students
have developed (Williams, 1992). Because the goal of PBL is discovery on the
part of the students, a self-managed approach which incorporates assessment by
all parties involved appears to be essential. The experiences in the thesis
circle show that an assessment procedure which incorporates elements from
self-assessment, peer assessment and assessment by the supervisor may provide an
instrument which reinforces rather than inhibits self-directed
learning.
In sum, the thesis circle experiment
suggests redistributing formal power on the basis of circularity can be an
important next step in the development of the problem-based learning method.
However, it should be noted that the outcomes of this single experiment may also
have been produced by a number of other (beneficial) conditions. Therefore, a
new thesis circle in another department of Maastricht University was recently
set up, and in addition, experiments in other parts of the undergraduate
curriculum in Maastricht are currently being
conducted.
1.
Albano, M. G., Cavallo, F.,
Hoogenboom, R., Magni, F., Majoor, G., Manenti, F., Schuwirth, L., Stiegler, I.,
& & Van der Vleuten, C. (1996). An internatinal comparison of knowledge
levels of medical students : the Maastricht Progress Test. Medical
Education(30), 239-245.
2.
Driessen, E. W., Van der
Vleuten, C. P. M., & Van Berkel, H. J. M. (1998 (in press)). Beyond the
Multiple-choice v. Essay Questions Controversy: combining the best of both
worlds. Journal of Legal Education. .
3.
Jansen, J. J. M., Tan, L. H. C.,
Van der Vleuten, C. P. M., Van Luijk, S. J., Rethans, J. J., & Grol, R. P.
T. M. (1995). Assessment of competence in technical clinical skills of general
practitioners. Medical Education, 29, 247-253.
4.
Muijtjens, A. M. M., Hoogenboom,
R. J. I., Verwijnen, G. M., & Van der Vleuten, C. P. M. (1998). Relative or
absolute standards in assessing medical knowledge using progress tests.
Advances in Medical Sciences Education, 3(2), 81-87.
5.
Newble, D., Dawson, B.,
Dauphinee, D., Page, G., Macdonald, M., Swanson, D., Mulholland, H., Thomson,
A., & Van der Vleuten, C. P. M. (1994). Guidelines for assessing clinical
competence. Teaching and Learning in Medicine, 6(3), 213-220.
6.
Perrenet, J. (1997). Between
Aalborg and Maastricht: student assessment at knowledge engineering. In
M.Wassenberg & H. Philipsen (Eds.), Placing the student at the centre:
current implementations of student-centred education (pp. 143-148).
Maastricht: Maastricht University.
7.
Romme, G. A. L. (1998 (in
press)). Redistributing power in the classroom: the missing link in
Problem-based learning. In J. Troy & e. al (Eds.), Learning in a changing
environment . Dordrecht/London/Boston: Kluwer Academic Publishers.
8.
Segers, M. S. R. (1997). An
alternative for assessing problem-solving skills : the overall test studies in
educational evaluation. Educational Evaluation, 23(4), 373-398.
9.
Segers, M. S. R., Dochy, F. J.
R. C., & Sluijsmans, D. (1999 (accepted)). The use of self-, peer- and
co-assessment in higher education: a literature review. Studies in Higher
Education. .
10.
Van der Vleuten, C. P. M.
(1996). The assessment of professional competence: developments, research and
practical implications. Advances in Health Sciences Education, 1(1),
41-67. .
11.
Verhoeven, B. H., Verwijnen, G.
M., Scherpbier, A. J. J. A., Holdrinet, R. S. G., Oeseburg, B., Bulte, J. A.,
& Van der Vleuten, C. P. M. (1998). An analysis of progress test results of
PBL and non-PBL students. Medical Teacher, 20(4), 310-316.
12.
Van der Vleuten, C. P. M., &
Swanson, D. B. (1990). Assessment of clinical skills with standardized patients:
State of the Art. Teaching and Learning in Medicine, 2(2), 58-76.
.
1. Van der Vleuten, C. P. M.,
Scherpbier, A.J.J.A., Wijnen, W.H.F.W., & Snellen, H.A.M. (1996).
Flexibility in learning: a case report on problem-based learning.
International Higher Education(2), 17-24.
2. Van der Vleuten, C. P. M.
(1996). The assessment of professional competence: developments, research and
practical implications. Advances in Health Sciences Education, 1(1),
41-67.
FLEXIBILITY
IN LEARNING:
a case
report on problem-based learning
C.P.M. van der Vleuten, A.J.J.A. Scherpbier, W.H.F.W. Wijnen, H.A.M. Snellen
University
of Limburg, Maastricht, The Netherlands
Abstract
The need
for change in higher education has had quite some attention in recent years.
Societal needs require educational systems to produce graduates better equipped
with highly specialized and qualitatively superior professional skills.
Economical needs require educational programmes to be efficient and
cost-effective. The vanishing boundaries between countries requires educational
systems to be transparent and internationally orientated. Developments in
science has led to an explosion of knowledge forcing educational systems to be
dynamic and flexible. The rapid change in knowledge forces educational systems
to emphasize learning skills and maintenance of competence, rather than the
provision of knowledge alone. Educational technology obliges educational
programmes to use multimedia and computer technology. Progression in educational
theory requires educational systems to activate the learner and to critically
reflect upon traditionally accepted adagia of educational practice.
In this
context it is argued that flexible educational systems require a shift from
teaching programmes towards learning programmes. The distinctive characteristics
of both approaches will be outlined. Subsequently, these principles will be
illustrated by an explanation of an existing learning programme. This programme
uses problem-based learning as an instructional method. A week of a medical
student will be taken to explain a number of educational principles, including
self-directed learning, choice of teaching and learning formats, assessment of
achievement, and curricular and organisational management.
Introduction
Educational
programmes in higher education, particularly in Europe, have a long standing
tradition. The fundamental bricks of the teaching methods used in these
programmes have not changed since several hundred years and perhaps even longer
than that. Teaching is an activity which has been modeled by our own teachers,
has been copied to our own teaching activities and will serve again as a model
to our students. Many teachers actually have not been trained or specifically
prepared for their teaching roles. With a certification in their discipline most
teachers are assumed to be qualified, usually for life, for their teaching
tasks. It is therefore not surprising that educational programmes and teaching
activities are mainly governed by tradition. As far as changes occur in
educational programmes, they are usually restricted to changes of content, but
are hardly ever related to the underlying concepts of
teaching.
The
question to be raised is whether this situation is desirable. In this article we
will reflect upon a number of reasons for the necessity of change in education
and challenge some of the more fundamental assumptions of regular teaching
programmes. We will subsequently discuss a new educational method called
problem-based learning (Barrows & Tamblyn, 1980). By no means this new model
should be considered as the golden standard for innovative education, but as one
attempt to change educational programmes. The purpose is to critically reflect
upon education and not to 'sell' the educational model. It is truly a case
report in order to demonstrate the viability of educational innovation, and the
reader should realize that the model described is probably one among the many
options. Before describing the model we will review some of the reasons for
educational change and discuss general characteristics of existing and desirable
educational programmes.
Reasons
for educational change
In this
century, and particularly after the second World War, virtually all Western
countries have undergone the same change in higher education for obvious
political and societal reasons: a larger part of the population has taken part
in higher education training. The number of students has therefore dramatically
increased. Not only did this increase require a further change from the classic
apprenticeship model in teaching used in the last centuries, it also required a
substantial investment of resources. Governments nowadays have a problem of
trying to control the continuously growing budget for (higher) education. These
economical reasons have led many governments to urge for reduction of
cost and of quality control in education. Economical pressure forces educational
programmes to consider change from a production perspective rather than an
academic perspective: how many graduates of a particular quality can we produce
in a particular amount of time? The consideration of efficiency and
effectiveness is a completely new issue for most educational programmes.
Although the various European countries still greatly differ in the respect that
economic reasons affect education, it is a matter of time that the effect will
be universal. The exclusive reliance on academic criteria for defending the
quality of educational programmes will become increasingly difficult to
maintain.
In a
similar way changes in the society induces changes in education. The independent
academic position of universities and other institutions of higher education is
increasingly challenged. In business, engineering, health care and other fields
particular expertise profiles and skills are emphasized, adapted to modern needs
in these fields. These needs are often badly met by educational programmes.
These societal reasons will have an increasing impact on change in
education.
An
important reason for change in education is the advancement of science and the
explosion of knowledge. The problem of selection and coverage of content
is an emerging one and many educational programmes suffer from 'overload'.
Moreover, progressive advancement of scientific knowledge will make any
curriculum outdated within a few years. Therefore, life-long learning skills are
more essential than the consumption of temporary knowledge. The fostering of
these skills must be a task of educational programmes.
The
European community allows any graduate to work within any other country of the
community. The vanishing boundaries will require educational programmes to
change. This will require critical appraisal of licensure requirements. The
preparation of professionals capable of operating in an international context
will become more important. Hence, internationalisation will require
education to change.
The rise
of information technology has an overall effect on society in general and
will provide particular challenges for education. Information technology
provides new carriers of information and can make learning less location and
time dependent.
Finally,
the necessity of change in education is induced by progression in educational
theory. Quite some knowledge has accumulated with regard to what conditions
facilitate learning and how individuals mature from novices to professional
experts. The need of meaningful contexts for storing and retrieving information,
the importance of repetition of content, the recognance of student learning
strategies, the educational impact of examinations, the tools developed for
quality control, the utility of organisational strategies for managing
educational programmes are just a few areas where educational theory has
something to offer. Teachers being professional educators should be aware of
this kind of information. Its use should be part of their professionalism and
scholarship.
Teaching
and learning
As the
above makes clear, we take the position that there are sufficient reasons that
point to the necessity of change in education. However, the question coming to
mind is the direction of that change. Where should it lead to; what is the
target or objective? In addressing that question we would like to make a
fundamental assertion. We would argue that a distinction is in order between
teaching and learning. We notice that both concepts are used interchangeably: we
tend to take for granted that teaching leads to learning. In discussing
educational programmes we automatically speak of teaching activities. Yet we
would like to argue that both concepts are quite different and that the mission
of educational change should emphasize the learning aspect rather than the
teaching aspect. After all, learning is what educational programmes should be
about, teaching is a vehicle, or better one of the vehicles, to achieve
learning.
The
educational programme of the future should be a learning programme rather than a
teaching programme. To describe what we mean by a learning programme a number of
descriptors related to teaching and learning programmes are contrasted in figure
1. We will not discuss each entry in the figure but will restrict ourselves to
an overall characterisation.
Figure
1:
Characteristics of teaching programmes versus learning
programmes.
Teaching
Programmes
Learning Programmes
!
Knowledge transfer
! Knowledge acquisition
! Teacher
centered
! Student centered
! Static
and rigid
! Dynamic and flexible
! Teaching
objectives
! Learning objectives
!
Uniform
! Individual
!
Reinforces passiveness
! Reinforces activeness
! Students
are led
! Students may discover
! Learning
paths are described
! Learning paths are offered
! Teachers
provide answers
! Teachers ask questions
! Teachers
direct students
! Teachers guide students
! Teaching
is essential
! Learning is essential
! Lectures
are essential
! Assessment is essential
! Lecture
halls are essential
! Library and learning facilities are essential
! Supply
is essential
! Demand is essential
! Location
dependent
! Location independent
! Time
dependent
! Time independent
! Uniform
study pace
! Individual study pace
! Uniform
study sequence
! Variable study sequence
! Uniform
content
! Variable content
! Teachers
work in isolation
! Teachers work in collaboration
In a
learning programme the centre of the universe is the student. The key issue is
to create an environment that stimulates the student to actively acquire
knowledge (and skills, attitudes, etc.). In stead of being a (passive) consumer
of learning material prescribed by the teacher, the student should become
responsible for seeking information offered by the teacher. An active learning
attitude is essential in order to achieve self-directed learning skills. This
should be the basis for life-long learning. After graduation no teacher will be
available to provide further directions, while current knowledge will rapidly
decay and professional skills need to be further developed. In stead of
lectures, individual learning and learning in peer groups becomes important. In
stead of lecture halls library and learning facilities become essential. Rather
than stacking memorised information to pass the next examination, information
should be used to understand phenomena or problems and knowledge should be not
displayed but applied to relevant contexts.
Many of
our current educational programmes are very distant from a learning programme as
envisioned here. Most of our programmes are a concatenation of topics prescribed
by teachers and consumed by the students. Not uncommonly, little communication
exists between teachers, sections or departments on the content provided.
Usually teachers or disciplinary units are fully autonomous. Yet it is hard to
believe that individual teachers can overview the educational programme as a
whole. Moreover, teachers being specialists in their field are inclined, quite
understandably, to over-emphasize the importance of their discipline in relation
to the integral objectives of an education programme. In a system with many
individual autonomous elements there is little space for monitoring, quality
control, flexibility or, more importantly, synergy between elements. Moreover,
the attitude towards professional quality in education is remarkably different
from other academic areas. Professional quality in research, for example, is
defined, and unequivocally accepted, through rigorous peer review. Quality of
education, on the other hand, is left to the professional integrity of the
individual.
Until now
our discussion of the need and direction for change has been quite theoretical
and perhaps perceived as somewhat utopistic. To make some of these issues more
concrete we will subsequently describe an existing programme where an attempt
has been made to apply some of the learning environment
characteristics.
Problem-based
learning
As a case
report we will describe an educational method applied at the University of
Limburg in Maastricht, The Netherlands. Although all faculties of this
university use this educational method with variations adapted to the needs of
individual disciplines, we will describe one faculty in which discipline
problem-based learning originated: medicine. The medical school applies
problem-based learning since the faculty started in 1974 (Van der Vleuten &
Wijnen, 1990). Medicine in the Netherlands is a six-year programme in which the
last two years are spent in clinical attachments in ambulatory and
non-ambulatory settings. We will focus on the system as it is used in the first
four preclinical years of the study.
We will do
so by describing an exemplary week in the life of a student and discussing the
principles behind this program. This week is schematically represented in figure
2.
Figure
2: A week of
a student in a medical problem-based learning programme.
|
|
Monday |
Tuesday |
Wednesday |
Thursday |
Friday |
|
am |
Skills
training |
Tutorial
group |
Communica-tion
and attitude training |
|
Tutorial
group |
|
pm |
|
Lecture |
|
Health
practice contact |
|
The
tutorial group
The heart
of the matter is the meeting of the tutorial group. Twice a week a group of
approximately 8 students and one staff member, called the tutor, meet. They have
a so called blockbook consisting of a number of problems related to the content
of that unit of the curriculum. Figure 3 provides a sample problem.
Figure
3: A sample
problem as used in a tutorial group.
Mr. Brown,
aged 68, comes to your surgery and tells you that he has been feeling dizzy
recently. He is seriously worried because he has always been healthy; he has
never had any medical problems. But the complaints, which he has had for a few
months, are now getting worse and worse. The dizziness occurs when he gets out
of bed in the morning, but it can also be provoked by a sudden movement of his
head. "When it happens, everything swims before my eyes and I feel unwell, light
in the head and a little queasy. When I sit down for a moment, the dizziness
slowly disappears."
1
Problems
are used to ensure a meaningful context for learning. By providing this context
knowledge can be integrated with previous knowledge, and knowledge can be better
retrieved and applied when necessary (Schmidt, 1983; Norman & Schmidt,
1992). The problems also lead to an integration of disciplines. For the problem
presented in figure 3 the students may for example study the anatomy of the
brain as well as neurological aspects of dizziness. The study of basic sciences
and applied sciences are integrated.
In one
tutorial session the students will analyse a single problem and discuss their
prior knowledge related to the problem. They will subsequently define what they
need to know to tackle the problem; they will define the learning objectives. In
this group discussion one of the students acts as a chairman and one keeps
minutes on the whiteboard. These tasks rotate within the group with every
session. The task of the tutor is to monitor the group process. He may for
example intervene when the discussion is unclear or when individual students do
not contribute to the discussion or when the objectives are too vaguely defined.
Often the tutor is not even an expert to the particular problem at hand. The
tutor is not teaching, but guides the students: he may ask specific questions,
probe particular topics, etc. After having defined the learning objectives as a
group, the students will pursue the required information individually. They are
learned to use multiple sources of information and to compare and synthesize
that information (e.g., different handbooks, recent articles). In the next
tutorial session they will discuss what they found. They are required to report
in such a way that they demonstrate understanding of the material learned, e.g.
not by reading their notes, but by presenting an overview or a schematic
summary. Unclear concepts are discussed. If necessary new learning objectives
are defined. A tutorial session lasts two hours, usually one hour for reporting
back and one hour for discussing a new problem. Tutorial group sessions are held
twice a week. A curriculum unit usually consists of six weeks. Every unit new
tutorial groups are formed through randomisation: the students have no choice in
the composition of the group. This forces the students to work effectively in
any team, as they will also have to do in their later
career.
Each unit
is interdisciplinary in nature and addresses a particular theme, such as for
instance fatigue or blood loss. The units are scheduled according to a master
plan in which curricular objectives are defined in content areas deliberately
arranged in such a way that a number of desirable principles could be achieved.
The curricular architecture includes an increasing complexity, a spiral
hierarchy of recurring topics, and a transition from normal to abnormal
functioning.
To foster
internationalisation three units are fully in English (other units are in Dutch)
and students are encouraged to spend some study-time abroad. The English units
allow exchange with foreign students, for instance through the Erasmus
programme. A wide network is established with other schools for sending our
students abroad.
Practical
skills
The
intention of the programme is to integrate theory and practice as tight as
possible. Therefore an elaborate skills training programme is arranged starting
right in the beginning of the first year.The skills programme is integrated with
the content discussed in the tutorial groups. In our illustrative week two
trainings are scheduled. For example, for the sample problem in figure 3 the
skills training on Monday morning could consist of practising the neurological
examination on each other or on a patient. Attitude and communication skills, a
pressing societal demand for doctors, is also considered important in skills
training. In each curricular unit every student will have an encounter with a
(simulated) patient. In a safe laboratory environment the student may practice
his social skills, and, as the curriculum progresses, can practice to apply
knowledge in relation to a real (or simulated) patient.The training on Wednesday
morning could for example encompass the bringing of bad news to a (simulated)
patient with a neurological problem.
Health
practice contact
The same
integrative objective is pursued with the health practice contact in the week of
our student. Throughout the curriculum a number of these contacts are organised.
They may include a tour on an ambulance, a week nursing patients in a hospital,
a day in a midwifery practice, etc. The health practice contacts and the skills
programme contributes highly to the motivation of the student. Directly from the
start they can act as 'real' professionals, and in the process they obtain an
accurate view on the demands of their later profession, allowing them to make an
informed choice to continue their training in the field.
Lectures
Traditional
lectures are also part of the curriculum. However, they are carefully planned
and should have a specific additive function to the learning programme. They are
used to
introduce
a curriculum unit, to activate prior knowledge, to help students on difficult
topics, to provide unique information (e.g. from an invited speaker in the
field), etc. On average, approximately two lectures are held per
week.
Non-scheduled
time
The open
space in the week of our student is significant. Problem-based learning requires
students to work independently. To facilitate self-study a substantial
investment is made in providing facilities for students. Next to a library a so
called 'study-landscape' has been created. This facility provides a library
(although books cannot be loaned) with multiple copies of all current handbooks,
a video and slide library, computer facilities for computer-assisted learning
and for other information technology applications (access to library files,
CD-rom, word-processing and statistical facilities, the Internet, etc.), copying
facilities, and ample space to sit quietly for studying. Invariantly throughout
the curriculum approximately 10 to 12 hours per week are scheduled activities;
the remaining time is for the student to fill in.
In
summary, problem-based learning requires students to acquire knowledge by using
problems as a learning context, stimulates self-directed learning for life-long
learning and integrates disciplines both horizontally (multiple disciplines
integrated with one unit) and vertically (basic and applied sciences; theory and
practice are integrated).
Assessment
The way
student achievement is assessed is quite important in a problem-based learning
programme. Tests and examinations have a tremendous impact on how students
learn. A discipline oriented assessment programme would be detrimental in a
problem-based programme. Similarly, a classical system consisting of course
related examinations in which students go from hurdle to hurdle would not be
beneficial for problem-based learning. In a course related examination system
students work to pass for the test. Students in a problem-based programme are
expected to define their own (or group) learning objectives, i.e. their
self-directed learning is paramount. Test-directed studying is the opposite from
that. Moreover, the focus is on functional knowledge and little value is
attributed to the momentary knowledge of a student cramming for a
test.
Next to
integrated unit-related tests, the assessment programme heavily rests on a
different format of testing called progress testing. A progress test is a
comprehensive test (250 test items) covering the end objectives of the
curriculum just like a final examination, including all disciplines within the
programme. The same test is administered to all the students in the curriculum
(year 1 to 6) at the same time. Every three months a new test is constructed and
administered. First year students are not able to answer many questions
(approximately 20%), second year students somewhat more and so on. A single
student will make 24 (6 times 4) progress tests during his study and will find
himself growing gradually in different areas. The average overall growth shows a
near perfect linear incline until graduation. Test directed studying is
difficult since it is not known to the student what to expect; any question can
be asked. Conversely, by simply working continuously on their own objectives
students will see automatic growth of knowledge. There is no need for cramming
or particular anxiousness.The progress test allows students to concentrate on
their tutorial group work. Moreover, the test reinforces functional knowledge.
Instead of passing from one examination to the other, a progress testing system
continuously assesses the previously learned material. For example, when
biochemistry is learned in the first year the students are still required to
answer biochemistry questions upon graduation.
Other
parts of the assessment programme include performance-assessments of students
actually interacting with (simulated) patients using direct observation under
standardized conditions, and written or computer-based exercises and tests using
problems and patient cases. All tests are submitted to a careful review
procedure by interdisciplinary test review committees. Every test is public
after administration and open for critique from the students. Their comments are
reviewed by these test committees and final scores for students are calculated
after this process has taken place. Special attention is given to the feedback
function of tests either by providing detailed information on profile scores, by
peer reference information, and by providing literature references and
suggestions. Achievement testing as a learning resource, i.e. as an integral
aspect of the educational process, is highly emphasized.
The
educational organization
The task
of the teacher in this programme is clearly different from a traditional
programme. There is relatively little classical teaching such as lecturing. The
role of the staff is more the role of the provider, the developer, the organiser
and facilitator. Examples of teaching roles include being a tutor in a tutorial
group, member of a unit planning group, member of a test review committee,
developer of a training programme in practical skills, trainer in a faculty
development programme, etc. All new staff members are required to take a number
of educational courses on problem-based learning and its specific teaching
skills before they are allowed to participate in the programme. The different
teaching roles have a certain hierarchy. For instance, to become a unit
coordinator one must have extensive experience as a tutor and as a member of a
planning group. When there are openings for teaching roles staff has to apply.
Part of the selection decision is the quality of past performance in previous
teaching tasks. In promotion decisions teaching performance is an important
criterion. Some of the teaching roles are formally evaluated by students. The
evaluations are brought to the attention of department chairs and are used in
yearly staff evaluation rounds.
To manage
all these activities a matrix-management system is used. The matrix is defined
by two axes: disciplines (departments) and educational activities. Depending on
the activity, a number of disciplines are involved and staff members of multiple
departments are allocated or linked to that activity. Planning an educational
unit is an illustration of one educational activity and a planning group will
typically consist of six to nine representatives of departments. There is a wide
variety in educational activities, including a number of educational support
activities. For instance, a group of people is responsible for library and study
facilities, another for systematic programme evaluation, another for faculty
development, etc. The roles of teachers can be quite
diverse.
All
educational roles are quantified in educational hours. Different roles are
differently rewarded depending on their time involvement. Therefore, it is
relatively easy to monitor the contribution of departments. The summation of
educational credits per department should match the number of staff labelled on
teaching activities in that department. If that is not the case the department
will lose staff in the longer run. If poor quality is delivered individual staff
members will have difficulty in competing for educational roles, which will
burden the department because they do not achieve sufficient input in the
curriculum overall. On the other hand, the credit system provides flexibility
for departments because it allows the planning of variability in teaching load
across individual members within the department.
The
coordination of the curriculum as a whole is organised at a central level. An
educational committee with elected members from departments and students
determines to a large extent the overall educational policy. The operational
management is in the hands of a separate committee chaired by the dean for
educational affairs. Educational input and educational quality is the basis for
a yearly review session with all departments.The curriculum is systematically
monitored using student evaluation questionnaires reflecting all educational
activities. These evaluations are fed back to the responsible educational
project groups and changes are monitored. Review groups within the educational
committee periodically make an in-depth evaluation of educational activities. In
this way quality control and educational innovation is tried to be built in
within the programme; i.c. an attempt is made to achieve a 'learning
organisation'.
Conclusion
Problem-based
learning intends to create a flexible learning environment. It tries to meet the
demands of change as they were discussed in our introduction. As will be clear
now, learning can be much more than teaching. It is the learning which we try to
foster and teaching is only part of it. The teachers are the architects, the
managers, the controllers, and the helpers. It is intended as a dynamic and
flexible process: quality control, rationality, change and innovation are vital
elements of this approach to education.
Two
questions come naturally into ones mind: is it any better, and what does it
cost? However, the effectiveness question is difficult to answer. If the spread
of problem-based learning is used as a criterion, then it is quite effective.
Virtually all over the world problem-based learning is introduced in many
schools of higher education and universities, both in western as in developing
countries. The answer is more difficult if outcome is the criterion. A number of
review articles have recently been published (Albanese & Mitchell, 1993;
Berkson, 1993). In general, knowledge examinations do not demonstrate systematic
differences between students in problem-based learning programmes and
conventional programmes. On the one hand this is a reassuring finding, on the
other hand one may alternatively question the need for all the effort. On
specific skills problem-based learning students are often rated to be superior.
These include for example library skills and practical professional skills.
However, the most consistent and conclusive finding in favour of problem-based
learning is 'fun': students in a problem-based learning programme have more
pleasure in studying and are more motivated. A final difference concerns
attrition rates. In the Netherlands, problem-based learning programmes show
consistently smaller drop out rates and the discrepancy between nominal and
actual study-time is smaller. In other words, more graduates are produced in
shorter time with at least equal proficiency. Problem-based learning may be
economically more efficient.
There are
no studies published comparing the resource requirements of problem-based
learning and conventional programmes. However, within our situation in the
Netherlands the situation is quite simple: there are no differences in funding
across universities. The problem-based learning programmes are carried out with
the same budget as the other universities. The current popularity of
problem-based learning in so many institutions is another token of its
feasibility.
The
case-report here concerned medicine, problem-based learning has been
successfully applied in many other disciplines (Boud & Feletti, 1991;
Gijselaers et al., 1995; Bouhuijs, Schmidt & Van Berkel, 1995). Naturally,
the method may not work identically for every discipline and changes may be in
order. We would like to stress that the method itself is not so important. More
important is the creation of an adequate and flexible learning environment and
there may be many ways to achieve that.
Literature
Albanese,
M.A. & Mitchell, S. (1993). Problem-based learning: a review of literature
on its outcomes and implementation issues. Academic Medicine, 68,
52-81.
Barrows,
H.S. & Tamblyn, R.M. (1980). Problem-based learning: an approach to
medical education. New York: Springer.
Berkson,
L. (1993) Problem-based learning: have the expectations been met? Academic
Medicine, 68 (Supplement), S79-S88.
Boud, D.
& Feletti, G. (Eds.) The Challenge of Problem-based Learning. London:
Kogan Page, 1991.
Bouhuijs,
P.A.J., Schmidt, H.G., Van Berkel, H.J.M. (Eds.) Problem-based learning as an
Educational Strategy. Maastricht: Network Publications,
1995.
Dolmans,
D.H.J.M. (1994) How students learn in a problem-based curriclum. Ph.
Dissertation, Maastricht: University of Limburg.
Gijselaers,
W.H., Tempelaar, D.T., Keizer, P.K., Blommaert, J.M. & Kasper, H. (Eds.)
Educational Innovation in Economics and Business Administration: The Case of
Problem-Based Learning. Dordrecht: Kluwer Academic Publishers,
1995.
Norman,
G.R. & Schmidt, H.G. (1992). The psychological basis of problem-based
learning: a review of evidence. Academic Medicine, 67,
557-565.
Pochet, B.
Le “Problem-based Learning”, une révolution ou un progrès attendu? Revue
Française de Pédagogie, 111,
95-107.
Schmidt,
H.G. (1983). Problem-based
learning: rationale and description. Medical Education, 17,
11-16.
Van der
Vleuten, C.P.M. & Wijnen, W.H.F.W. (Eds.) Problem-based Learning:
Perspectives from the Maastricht Experience. Amsterdam:
Thesis-publ.
The
assessment of professional competence: developments, research and practical
implications
C.P.M. van der Vleuten
University of Limburg, Maastricht,
The Netherlands
Educational achievement testing is
an area of turmoil in the health sciences. Examinations are a constant source of
problems for many teachers, curriculum designers and educationalists. The
evaluation of student achievement is continuously debated at educational
meetings, conferences and workshops. It is also an area in which tradition,
personal values and experiences tend to dominate discussions. On the other hand,
the number of scientific publications on assessment over the last decade has
exploded. The number of proposed instruments, each preferably using an
intriguing acronym, are countless. The literature is however often difficult to
access, since the psychometrics usually involved in educational testing
discourages the average health professions reader. Assessment in the health
professions education is nevertheless an area which is fortunately
well-researched and has delivered a number of well documented outcomes. The
purpose of this article is to highlight these outcomes and to attempt to
translate them into practical implications and research suggestions. We will not
review individual methods and instruments in detail, but will describe some
classes of methods contingent on a (supposed) theoretical framework. We will
delineate what we consider paramount findings within and across these classes
and discuss their theoretical implications and its effect on the evolution of
new testing instruments. By using a simple conceptual framework the utility of
assessment methods will subsequently be defined in a generic sense meant to be
helpful in deciding how to compromise and make trade-offs in assessment
practice. To improve the utility of assessment a number of practical suggestions
and research recommendations are proposed using this
framework.
The
search for instruments to assess clinical competence has been stimulated through
emerging logistical constraints and the dissatisfaction with ongoing assessment
practice in the health sciences. Particularly in the era after the Second World
War the number of students in higher education has grown exponentially which
poses problems of logistics, since assessment (and teaching) was largely based
on, or derived from, the apprenticeship model (implicit assessment, holistic
judgements, unstandardized tests). The subjectivity and the poor measurement
characteristics of this approach was probably another factor to strive for new
pathways of assessment.
While searching for new instruments
an implicit conception of the nature of professional competence was used.
Competence was seen as an aggregate of different components or latent
attributes, which were seen as relatively distinct from each other. The
development of competence was contemplated as being equal to the development in
each of the components, with growth defined as a monotonic process resulting
from learning experiences.The components were also considered to be relatively
stable across (clinical) situations and time. Expertise in a component allows a
person to act professionally regardless of the particular nature of the
situation or circumstances. In essence, the implicit conception of clinical
competence reflected a trait-conception as was quite prevalent in
contemporary personality and educational psychology. It is a very intuitive
approach in which professional behavior is attributed to be caused by a set of
latent factors: they are within the person and cannot be directly observed, but
must be inferred from observed behaviour.
The
trait approach was also implicitly applied in the use of methods for measuring
the components: each of these components could (or should) be measured
separately and different methods or formats are appropriate to test different
traits. The validity of an assessment method would be demonstrated if low
correlations were found between scores on methods measuring different traits,
while high correlations were required between methods measuring similar traits.
The agenda became to develop methods appropriate for each of the components of
clinical competence such as is schematically outlined in figure 1: the 'jig saw
puzzle' was 'merely' to find the right pieces. And again, in the course of the
history, very many instruments have been proposed each supposedly tapping into
different competency areas.
After these many years of research
and development, one should expect a review article to present a grid as in
figure 1, completed with a consensus list of well-defined components of
competence or traits and lists of instruments to be used, preferably with
extensive 'how-to-do information' on each of them. Unfortunately this is not the
state of the art. Despite the many proposed typologies, no consensus exists on
any taxonomy of clinical competence and none of the 'traits' are well-defined.
Even simple constructs such as 'knowledge' can be interpreted and subdivided in
infinite ways, with more complex constructs such as 'attitudes' and 'humanistic
skills' being total mind-breakers. Particularly when the conceptual level is
translated into operational terms, i.e. into test material, even the smallest
definition of a construct being measured tends to vary as much as people
involved in the process. Similarly, there is no consensus on 'best' methods of
assessment. Although new measures of clinical competence were often eagerly
presented with an aura of a panacea, empirical evidence usually tempered the
original enthusiasm.
The
by now disappointed reader needs not, however, be discouraged entirely. There
are good reasons for the deficiency of the conventional model and, compared with
a few decades ago, we know substantially more about assessment, both from a
theoretical and a practical perspective. Although we cannot present an overview
of best methods which would drown the turmoil and reduce the frustrations of
test development, we will try to clarify the difficulties in competence
assessment and perhaps provide some practical suggestions. In order to do this,
it is helpful to first review some significant developments which progressed our
understanding. To describe these we will review a few cells of figure 1 and
describe four classes of methods, each attempting to measure different
components of competence. To some extent they also reflect the history of the
research. They will not, by any means, cover all significant developments in
competence assessment, but they will disclose some major issues in educational
testing in the health professions. The four classes of methods are: multiple
choice questions, written simulations, learning process measures and live
simulations.
The
introduction of multiple choice questions (MCQs) accommodated the need to cope
with the increased logistical demands for educational testing in higher
education . Not surprisingly multiple choice tests were massively introduced
after World War Two, additionally boosted by the introduction of computer
technology. Many different forms have been developed (single best answer,
multiple best answers, true/false questions, matching questions, short and long
menu's of options, et cetera)[i]
[ii].
In all probability, all readers are familiar with them and there is probabaly
not a single educational institution which does not use MCQs in the assessment
program. Although multiple choice tests are time consuming to construct, they
are efficient in handling large numbers of examinees, and their reliability is
excellent. There are no subjective influences of scoring answers (as a result of
which they are called objective, which might however be a misleading term as we
will see later) and the content of interest in a particular domain can be
efficiently sampled, since a single test can easily contain many
items.
The
MCQ test is significant in this context because it was and is the subject of
many criticisms[iii]
[iv]
[v].
The MCQ is designed to measure
aspects of knowledge. However, according to the critics, selecting options from
a list of options is considered as trivial knowledge. Instead of requiring
active generation of responses, such as in free response tests, examinees in
MCQ-tests are only required to recognize the correct answer or to eliminate the
incorrect ones (cuing effect). The MCQ is therefore supposedly only suitable to
measure lower taxonomic levels of cognitive functioning.
Despite the criticisms the use of
MCQ-tests remains widespread. Although it did not allay the critique, research
has shown that the effect of cueing is marginal. Cueing mainly has an overall
effect on the mean score, and may hence have consequences for standard setting,
but the rank-ordering of examinees usually remains unaffected[vi]
[vii].
However, the acceptability of MCQ-tests has nevertheless always been its
Achilles' heal. It has stimulated test developers to look for alternatives,
particularly to devise tests to address higher cognitive taxonomic levels and
tests more closely linked to professional reality[viii].
In
the sixties, attempts were made to measure clinical reasoning ability or
problem-solving.[ix]
The typical approach was to present an examinee with a patient problem and then
ask for management decisions. The decisions and answers to questions were taken
as an index of an examinee's problem-solving ability. Sometimes ingenious
technical devices (invisible ink, latent image printing) were used to simulate a
dynamic and realistic discourse of a patient problem. The most prominent example
of this approach was the Patient Management Problem (PMP)[x].
The examinee was required to collect data on history, physical examination, and
investigations. Some PMPs allowed a branched pathway through the problem
depending on the choices being made. Other instruments with a similar purpose
were also introduced such as, among others, the Modified Essay Question (MEQ)[xi]
[xii],
and less known measures such as the 'P4-deck'[xiii]
and the 'Film Test' and 'Programmed Test'5. With emerging computer technology some
of the drawbacks of the paper-and-pencil formats could be circumvented and
computer simulations were introduced[xiv]
[xv]
making the simulations even more realistic. The common denominator in all these
instruments was the utilization of a realistic (patient) problem to simulate
reality in order to assess the process of problem-solving.
Through their realistic nature
written simulations became quite popular, despite their cost of production.
Because of their high acceptance they became rapidly part of many examination
programs, including some national licensing examinations.
Apart from the scoring problems
involved in these simulations (disagreement on correct options, complexity of
pathways, differential weighing of responses)[xvi]
[xvii]
three significant consistent empirical outcomes were found which casted doubt on
the existing conceptual framework of problem-solving. The first consistent
outcome was that a score derived from one problem was not very predictive for a
score on another problem. Apparently the ability to solve problems was dependent
on the (clinical) content of the problem. Even changes of content within limited
content areas or smaller contextual changes yielded different outcomes. The
typical found correlation between scores of different problems varied between
0.10 and 0.30[xviii].
This finding was quite a surprise and puzzled many researchers and test
developers since it contradicted the (implicit) trait conception of
problem-solving as generic attribute: the transfer of ability from one problem
to another turned out to be very low. The phenomenon became to be known as
'case-specificity' or 'content-specificity' of problem-solving[xix]. A second surprising outcome was the
finding that experienced clinicians scored hardly better, and sometimes worse,
than less experienced clinicians or students[xx]
[xxi]
[xxii]
[xxiii].
Apparently, a monotonic growth of competence with increasing expertise, as was
hypothesized from the trait conception, did not exist. A third unexpected
finding was that once reliable scores on problem-solving tests were obtained
(either in reality or by statistical correction), very high correlations were
found with other measures including multiple choice tests[xxiv]
[xxv].
Problem-solving appeared to be much more closely linked to knowledge (and other
constructs) and is not as independent a construct as was originally
supposed.
Apart from the theoretical
implications, these empirical findings posed major practical problems. In
educational testing we prefer to make an inference of an examinee's ability
independent of the particular sample of items (questions, patient cases,
problems) used in the test. The items merely constitute a random sample from a
large domain of possible items. In a next test, or in later practice, this
sample will be different. When the ability to solve problems generalizes poorly
across problems one is required to incorporate many problems in a test before a
sample-independent conclusion can be drawn. In other words, the test length
needs to be increased. With a favorable (not often found) average inter-problem
correlation of 0.30, at least 10 problems are necessary for achieving minimal
reliability (i.e., to reach an arbitrary alpha of 0.80); with a correlation of
0.10 more than 35 cases are required. These lengthy tests would naturally have
major resource implications, both in terms of testing time as well as in terms
of cost to produce these tests. From a purely decision making perspective high
correlations imply redundancy of information: a score on one test is highly
predictive of a score on another method. Limiting the assessment to the most
resource-saving method is a logical consequence and it is not surprising that
most licensing institutes removed these expensive simulations from their
examination program.
As a reaction to the content
specificity problem a new direction was suggested in the mid-eighties. It was
argued that any (clinical) problem has one or more essential elements crucial to
the management of the problem. The other elements in the problem follow from
these key elements or are less important. For assessment purposes it was
suggested to limit the assessment to the key elements in order to use the saved
time for testing additional problems and to improve the
reliability18. This has been called the 'key feature approach' to the
assessment of problem-solving and several instruments were proposed using this
approach[xxvi]
[xxvii].
The problems encountered in the
development of written simulations are illustrative of the pitfalls of the
implicit and intuitive approach to assessment of professional competence. They
caused quite some disturbance in the conception of problem-solving and the way
competence should be assessed in general, including unbelief and/or mistrust of
psychometric data-analysis and their producers. It has, however, contributed
significantly to the understanding of competence
assessment.
In
the seventies and eighties educational reform urged for new approaches to
teaching and learning. Instead of passive consuming of learning material,
students were supposed to take a more active role in acquiring knowledge.
Instead of using rote learning strategies students should learn to understand,
synthesize and apply learning material. A concern was expressed that most
conventional methods of assessment and assessment programs tend to reinforce
unwanted learning behavior (where indeed the MCQ is often named). A need was
expressed to develop assessment instruments which measured the process of
learning more directly.
A number of these instruments were
proposed. A prominent example is the Triple Jump Exercise (TJE) which was
intended to measure problem-solving skills, and to evaluate the quality of
information gathering[xxviii]. The TJE consists of three steps (jumps):
a structured oral examination based on one or more patient problems, a
time-limited study assignment (mostly 24 hours) in relation to the patient
problems in the first oral, and a repeat oral examination in which the quality
of self learning around the assigned topics is assessed. In a similar way a
problem-based learning exercise has been proposed more recently which assesses
the quality of solving a task using a problem-based learning strategy: a tutorial group meeting for generation of
learning objectives and one week of self-study, followed by a written individual
report[xxix].
Other process evaluation measures
included self assessment measures, peer ratings and faculty ratings[xxx]
[xxxi]
[xxxii].
They evaluated competencies such as group interaction skills, task
orientatedness, leadership skills, communication skills and community
interaction skills.
Except for a few innovative schools
experimenting with their assessment program, learning process measures have
never been widely introduced. Probably this was due to their unfamiliar nature
as well as to their poor measurement characteristics[xxxiii]
[xxxiv]
[xxxv].
They are important here because they explicitly highlight the educational value
of assessment. Over the years, the
dramatic impact of examinations on learning became increasingly clear. The
learning process measures explicitly acknowledged this relationship by
attempting to use it strategically: they communicated to students the importance
of a number of educational
objectives through assessment.
A
new development emerged at the end of the seventies and 0the eighties when the
previous 'in vitro' simulations were advanced one more step by assessing actual
performance of examinees in standardized live simulations of clinical
situations. Examinees were brought into a simulated clinical situation called a
'station', where an assignment was given to perform a particular skill or to
manage a patient. The skills may be demonstrated on real or simulated patients
or on special technical devices such as gynecological and cardio-pulmonary
rescucitation models. The performance of examinees is recorded by faculty staff
examiners or by trained (simulated) patients. Some stations have post-encounter
written stations, with written tests probing the previous clinical situation. A
single test typically consists of a number of different stations and examinees
rotate in a round-robin format through each of these. In order to achieve
maximal standardization, examiners and (simulated) patients are usually
extensively trained in preparation of their roles. The performance of examinees
is scored on precoded checklists and/or rating scales. Therefore these tests
were called Objective Structured Clinical Examinations (OSCE)[xxxvi],
however several other names are also used (standardized patient-based testing,
performance-based testing, authentic assessment).
Since its introduction the multiple
station examination has dramatically conquered the world. Medical schools on all
continentsuse some kind of station examination in their assessment program[xxxvii].
In Canada multiple station examinations are nowadays part of the national
licensing examination[xxxviii]
and actually applied on a national scale to over 1500 candidates per year tested
across the country in a single weekend. Their popularity is probably due to the
combination of the close approximation to the real world and the use of
standardized testing procedures at the same time.
The multiple station examination is
intensively researched. Overall the outcomes quite parallel the findings in
relation to the written simulations: content specificity is the major concern
for reliability, high (true) correlations are usually found with other measures,
and, depending on the checklists used, absence of differences between groups of
expertise is not an uncommon finding[xxxix]
[xl]
[xli].
However, the importance here is that they represent a next step to standardized
professional testing approximating real life.
The
above developments show a few significant evolutions in the history of clinical
competence assessment and, more importantly, disclose some general and
consistent findings in the research associated with the developments. They have
both practical and theoretical implications. We will discuss the research
consistencies in more detail below.
Reliability issues
The
variability of performance of candidates across tasks originally found in the
problem-solving research appeared to be one of most consistent findings in all
measurements of clinical competence. Except for some very basic communication
skills[xlii],
it has been found in all measurements of professional competence, including oral
examinations[xliii],
essay tests[xliv],
chart-audits[xlv],
multiple station examinations39, and in practice performance[xlvi].
It appears not unique for the health professions, since task variability has
also been found a dominant source of variability in mathematics and science[xlvii],
law[xlviii],
and in military jobs[xlix].
As we indicated, the direct practical consequence is that tests containing a
small sample of items (essays, stations, patient problems, tasks)[6]
produce unstable or unreliable scores. Naturally, this will also vary with the
size of the domain being tested, but even in smaller domains the required sample
size of test items is usually high. Sample size requirements also vary with the
efficiency of testing methods. In general, more efficient testing methods which
need less time to sample a single item will be more reliable per unit of testing
time than tests requiring more time per item. For instance, the MCQ is efficient
in this respect and can sample a few hundred questions in a relatively short
time span, whereas a computer simulation might require thirty minutes or longer
to test a single patient problem and will therefore need very long overall
testing time to produce reliable scores. To produce adequate reliabilities (i.e.
0.80 or more) one should take into account that even efficient tests usually
require several hours of testing time25. Less efficient methods such
as multiple station examinations most often require more than four hours of
testing time or (much) more depending on the context, purpose and interpretation
of scores of the examination17 39.
Other sources of variability
challenging the reliability of examinations such as rater, patient or examiner
variability are usually either less important or can be better controlled. In
general, some standardization and structuring of the assessment may have an
adequate impact on the improvement of reliability. For example, when (simple)
scoring keys are used to score essay tests adequate levels of reliability can be
achieved as compared to the use free judgements[l].
By providing (simple) protocols to structure and score oral examinations they
can become significantly more reliable[li].
Even when less analytical methods are used reliable scores (or at least as
reliable as their 'objective' counterparts) can be obtained when the sample size
of test items is sufficiently large and the test design is adequate. The test
design is important. In general, the test design should be arranged in such a
way that potential sources of variability (e.g. of examiners or patients) are
adequately sampled in order to diminish or neutralize their effect on the
precision of the measurement.
Table 1: Reliability of role-playing
oral examinations as a function of
patient-case and examiner sample-size using different examiner and case
allocation strategies.
|
Testing time in
hours |
Number of patient
cases |
Same examiner for all
cases |
New examiner for each
case |
Two new examiners for
each case |
|
2 4 6 8 10 20 |
4 8 12 16 20 40 |
0.45 0.47 0.47 0.48 0.48 0.48 |
0.69 0.82 0.87 0.90 0.92 0.96 |
0.76 0.86 0.90 0.93 0.94 0.97 |
As
an illustration, table 1 contains generalizability coefficients as a function of
the number cases in an oral examination using different test designs as reported
by Swanson43. When the same examiners are used to test all cases for
each examinee the reliability remains poor. By using a different new examiner
for each case the final judgement over an examinee is based on more raters and
the bias introduced by examiners will average out across cases. Adding a second
examiner per case is hardly worthwhile. One might argue that the reliability is
still poor, unless large samples of cases and raters are used, but this is no
different for many other testing methods, including the in vivo and vitro
simulations.
The illustration in table 1 also
shows that objectivity of testing methods as a means to classify test methods
can be misleading. While one view is that objectivity is equivalent to
reliability, objectivity as a demonstration of subjectivity may be different
from objectivity as a set of strategies to reduce measurement error (such as the
use of checklists). Depending on the sampling strategy applied, like checklists,
objective measures may produce unreliable test scores and subjective measures
such as more holistic and global professional judgements may yield reliable test
information[lii].
Assessment methods which are both
unstandardized and global are hopelessly unreliable[liii].
A prototypical example is the clinical rating as they are used in clerkships in
many medical schools. They usually consist of a number of ratings on global
categories of clinical performance, and often judged over a lengthy period of
non-standardized performance of examinees, obtained from other sources and
seldomly based on direct observation. It is not only difficult to pass judgement
on a candidate who one has (closely) worked with for a period of time, but it
also difficult for any person to make a judgement on performance which covers an
extensive period in the past, particularly when unstandardized or unstructured[liv].
Psychological research indicates that the human mind is easily led by what we
think we have seen, usually based on gross generalizations of a few cues or
samples of performance, which not necessarily coincides with reality[lv]
[lvi].
For any measure to become reliable we need a sufficient sample of performance
gathered and scored with at least minimal standardization and structure.
Validity Issues
To
determine whether educational tests measure what they intend to measure,
criteria or standards are necessary. The validity research has always been
plagued by the absence of good criteria and gold standards simply do not exist
(otherwise they would be used in the assessment). Validity research, including
our own, more often than not contains methodological weaknesses: absence of a
theoretical framework, explicit hypotheses about expected results, strength of
relationships or differences to be accepted or rejected, a lack of information
on the reliability of the instruments used, etecetera.[lvii]
As a result validity research in educational testing contains a plethora of
correlational studies, replete with mid-range correlations, which are more like
Rorschach tests for the creative researcher to interprete favorably regardless
of the outcome (glasses are always half empty or half full anyway). Validity
research is therefore characterized by variable and often uninterpretable
outcomes. Some remarkable trends are nevertheless worth mentioning.
The trait-approach suggesting
divergence and convergence of scores of tests measuring respectively different
and similar constructs has not yielded very encouraging empirical support.
Although comprehensive studies using multiple measures of the same and different
components of competence[7]
are extremely rare and apart from the methodological weaknesses of these studies
the conclusion seems warranted that the communality between different methods of
assessment found is usually larger than we intuitively are inclined to think.
The high correlations found between problem-solving tests and other measures has
been a recurrent finding with many methods of assessment (provided that the
tests involved were reliable). It has been found between scores of free-response
tests with MCQs7 44 [lviii]
[lix],
PMPs and MCQs25, between oral examinations, computer simulations and
written tests[lx]
[lxi],
and between written tests and multiple station examinations[lxii].
The finding is again not restricted to the health sciences[lxiii]
[lxiv].
Using a corollary of testing methods
a number of studies have found clear relationships between certification
examinations and performance in practice, suggesting validity of examination
methods in relation to later performance[lxv]
[lxvi]
[lxvii],
but there were no differential method interactions[lxviii].
Causal inferences are always
difficult to make with correlations and high correlations do not necessarily
imply that the same construct is being measured. However, these findings
indicate that method characteristics do not inherently determine what is being
measured. At least the attributed uniqueness of the methods of assessment for
measuring particular and unique aspect of competence is challenged. McGuire has
actually called the conception of methods dictating what is being measured as
one of the most damaging myths in competence assessment which has significantly
delayed progress[lxix].
What is being measured depends more on the content of the method or the task
posed to the examinee, than any characteristic of the method itself. The
validity more likely depends on the stimulus format of a test item rather than the response
format with which the answer is captured (and the cognitive process involved
as we will see below). An MCQ does not measure factual knowledge because it
requires a selection from a list of options, but may measure factual knowledge
when the question probes for factual knowledge. It may however also measure
aspects of problem-solving if it, for example, provides a patient case scenario
and prompts for a management decision. The same holds for essay tests, oral
examinations or any other test format: what is in the method is more
important than its wrapping. This is not to say that any method may measure any
component of competence or skill and naturally some methods more easily assess
some competencies. For example, measuring communication skills will be hard
without some form of direct observation. The opposite reasoning is however
challenged here: what is being measured is not dictated by the method but rather
what is put into the method.
Several authors have cautioned for
sacrificing validity as a compromise to objectivity, particularly in complex
professions such as the health sciences[lxx]
[lxxi].
Assessment techniques that avoid professional judgement in the name of
objectivity may lead to an atomization of complex skills thereby trivializing
the content of the assessment. For example, to break down communication skills
into its smallest possible behavioral components in order to be able to check
them better on a performance list may enhance objectivity but will not reflect
the intended complexity of the skill[lxxii].
Trade-offs in validity are clearly to be made, and there are clear pitfalls
involved in fragmenting complex skills that require some holistic and
professional judgement71.
Educational issues
The
concern about the driving force of examinations on the learning and the
curriculum that stimulated the development of the learning process measures has
increasingly become an issue of test developers. Many authors have emphasized or
documented the tremendous impact that the assessment program has on the
learner[lxxiii]
[lxxiv]
[lxxv]
[lxxvi]
[lxxvii]
[lxxviii].
At the risk of adhering to a naive behavioristic view on learning[lxxix],
there is some hauntingly truth in that students do whatever they are tested on
and are not likely to do what they are not tested on. Regardless of the
curriculum objectives, students in a learning program will follow the
examination program. This is the heart of the 'hidden curriculum'[lxxx]:
examinations define academic success and the students cannot be blamed for
optimizing their chances to achieve success. The challenge for test developers
is to use this phenomenon strategically and to reinforce desirable learning
behavior. This strategy, also referred to as 'measurement-driven instruction'[lxxxi]
may have powerful educational consequences. However, there are a couple of
pitfalls involved.
First, there are risks involved in
mere test-directed studying. Many assessment programs are structured in such a
way that examinations are in competition with each other and invite students to
peak from hurdle to hurdle. Particularly if the contents of the examinations
reward rote learning one can seriously question the retention-rate of the
information gathered and the ability to apply the information appropriately[lxxxii].
Second, the effects of assessment
are often difficult to predict and sometimes even opposite to original
expectations. For example, Van Luijk et al. reported a study in which a multiple
station examination after a number of years of usage deteriorated the competency
of students because they started a blossoming trade in previously used
checklists which were subsequently memorized by the students when preparing for
the examination[lxxxiii].
This was enhanced by the detailed of the checklist (for objectivity reasons) of
largely cognitively oriented items. Another study has shown that an intended
switch from multiple choice tests to free-response tests to avoid memorization
led to an expectation of students to actually memorize more44. Yet
another study reported that teachers being directly involved in small-group
learning may theoretically be the best resource persons for information on the
students' progress, but their judgmental role may conflict their facilitating
role35. In conclusion, any assessment action will result in an
educational reaction. The unpredictability, however, of these educational
effects require careful and continuous follow-up analysis of the
side-effects.
The
research evidence has clearly challenged the appropriateness of the trait-model
of professional competence. Components of competence show great variability
across tasks, they cannot be well differentiated empirically, and growth in
competence is more capricious than expected. This is not so much a surprise,
since the notion of inherent and robust traits has been similarly challenged
(and abandoned) in psychology already some time ago[lxxxiv].
In the health sciences, the empirical disillusions have stimulated more
fundamental cognitive psychological research into the nature of clinical
competence and development of expertise. In recent years major progress has been
made in this area which may explain a number of phenomena in the assessment
research with implications for the future[lxxxv]
[lxxxvi]
[lxxxvii]
[lxxxviii].
Expertise development of
professionals appears strongly connected to knowledge. However, the way in which
knowledge is stored, used and retrieved characterizes differences between
novices and experts. The accumulation of knowledge is necessary to be able
handle concrete problems and to be able to reason; i.e. to explain (clinical)
phenomena by their underlying (pathophysiological) mechanisms. Knowledge being
accumulated (stored) in a relevant (problem) context provides the best chance
for retrieval (Ref Needham) when faced with a new problem. However, with
accumulating experience explicit reasoning as a cognitive process diminishes in
importance because it is no longer instrumental. In stead, clinical situations,
specific signs and symptoms - or at first sight irrelevant details,
specific cues or patient characteristics - are recognized immediately. The
reasoning process becomes automated and is condensed into 'chunks' or 'scripts'
of clinical and contextual information which are activated instantaneously at an
appropriate moment. Experienced doctors often formulate their diagnostic
hypotheses in the very first few instances with a patient, and they are usually
correct. In summary, the cognitive psychological model views professional
expertise developing as a transition from a conceptually rich and rational
knowledge base (acquired from educational experiences) to a non-analytical
ability to recognize and handle situations efficiently and effectively (acquired
from clinical experiences). The ability is not easily transferred from one
problem or situation to another, but remains relatively dependent on the
specific situation. One person may therefore function at several cognitive
levels at same time depending on the problem at hand. With increasing experience
and specialization, this expertise will be further individualized.
This new theoretical framework may
explain a number of the encountered
unexpected findings. It provides a logical explanation for the dominance
of task variability influencing test scores. Instead of generic underlying
constructs responsible for consistent professional actions across tasks,
expertise is characterized by 'states' of development restricted to specific
content areas. They are based on previous personal experience, they hardly
generalize across situations or tasks and they change continuously as a result
of new experiences. In educational tests this is subsequently reflected in
substantial task variability.
What is being measured by an
individual item in a test will depend on the cognitive processes involved when
answering the question and this will not only vary from item to item but also
from person to person for a single item: a response to an item may be the result
of pure recognition for one person or the result of a reasoning process for the
other. A summation of item scores to a test score must therefore yield a very
heterogeneous aggregate. Perhaps this aggregate, once sufficiently sampled,
reflects a G-factor (general factor, such as claimed in intelligence research)
potentially responsible for the correlations across test methods59 60
62.
The unexpected absence of
differences between expertise groups is probably the result of different
cognitive processes. Both written and live simulations often reward thoroughness
rather than efficiency and efficacy, which may again lead to higher scores for
the less competent and less efficient examinees. In a study comparing multiple
station examination scores (assessing what doctors can do) with similar cases
tested in clinical practice by using hidden simulated patients (to assess what
doctors actually do) no correlation was found using the raw scores. However,
after a correction for efficiency and time needed to obtain vital information a
substantial association was found[lxxxix].
The difficulty of assessment
programs in fostering retention of knowledge or the inability to apply knowledge
to new situations is not a surprise when realizing that retention is stimulated
by a meaningful context, repetition and resemblance to the original situation.
By contrast, tests often consist of
decontextualized test items, and total examination programs usually
involve little repetition and integration. More often than not, 'wiping the hard
disk' to prepare for the next examination increases the chances of
success.
The cognitive psychological model
may be useful for new directions in assessment. Perhaps it will provide new
measures of assessing expertise for the future[xc]
[xci]
(although by now the reader is probably aware of the relativity involved in the
promises of new measures), but at least the model may help us to better
understand some assessment phenomena.
It may also provide new pathways for research and test
development.
The
intention was to clarify a number of issues in the assessment of professional
competence. The discussion so far has made clear that assessment of professional
competence has indeed stumbled on many difficulties and that perfect assessment
is an illusion.Trade-offs between what is desirable and achievable are therefore
inevitable.
In order to derive at some practical
implications and to clarify the compromises involved we will use a very simple
model to define the utility of assessment methods. So far we have discussed
three variables more extensively which should be part of the model: reliability
(R), validity (V) and educational impact (E). Two additional important variables
were implicitly addressed: acceptability (A) and cost (C). In educational
practice decisions are rarely based on research outcomes[xcii]
and particularly in assessment one has to deal with opinions, sentiments and
traditions of teachers, students and institutions. The extent to which an
assessment procedure is accepted by the people involved in the assessment is a
crucial element for consideration. The mere existence of so many examination
procedures with severe shortcomings in reliability is evidence of the
phenomenon[8].
The cost of assessment is an obvious variable hardly needing further
explanation: resource limitations are universal, even more so for single
institutions or individual test developers.
We will define the utility (U) of an
assessment method as a multiplicative function of these variables with
differential weights (w) associated with each of them:
U = RWr x VWv x EWe x
AWa x CWc
It
should be noted that this definition is purely intended as a conceptual model
and not meant as an actuarial
algorithm since it is clear that most of the elements can never be
quantified. However, as a model it makes the trade-offs clear. Perfect utility
is a utopia. In practice we will always be required to compromise and assign
different weights in different individual situations, depending on the context
and purpose of the assessment. For example, in a situation where the assessment
involves a high-stake examination with decisions having marked consequences on
the future of examinees, reliability will probably have a heavier weight in the
decision to use an assessment method. On the other hand, in the context of
in-training assessment, where the final decision is based on many assessments,
one probably is prepared to compromise more on reliability in favor of
educational impact of the assessment. The relationship among variables is
however deliberately conceived as multiplicative. If one of the elements is zero
the utility will be zero. A reliable, valid and feasible test will have a short
life if its accepted by no one.
Having defined the elements of the
utility of assessment methods the question now is what we can do to improve it.
Using the research outcomes and the theoretical developments described above we
will derive a number of practical suggestions for each of the variables involved
and delineate some research requirements where applicable.
Reliability
suggestions
The
obvious practical implication of the content specificity problem is that one
cannot rely on tests containing few cases. Traditional clinical examinations
such as the 'clinical viva' or 'long case' which often consists of no more than
a single patient are totally unreliable because of their limited sampling of
content (even when the examiner influence has been ruled out). Regardless of the
testing method, wide sampling of content across the area of interest is
imperative to allow for stable and reproducible scores on educational tests.
Several hours of testing time are usually required to sufficiently reduce the
error introduced by task variability. As an alternative to increasing the
content sample per test one may consider increasing sampling across time by
using multiple test occasions. However, (some) compensation of test scores
across occasions are then in order to allow decision errors per test - which are
sizable due to unreliability - to average out across tests. A third alternative
to increase reliability is to combine methods into a larger battery of different
subtests[xciii].
To contain costs, efficiency is the
hallmark. This may be achieved through the selection of efficient testing
methods, through considering efficiency per test item and through careful test
administration procedures. MCQ-tests are very efficient for sampling across
content whereas simulation-based instruments are less efficient. The choice of
method will be depending on the trade-offs to be made with the other
utility-variables. Efficiency per test item may be achieved by using a
key-feature approach: assess the key elements of a task only and use the saved
time to assess more tasks. Again, this will depend on the willingness to
compromise on the other variables.
As an illustration of compromises to
be made, the University of Limburg used multiple station examinations testing
skills in isolation (examination of the knee in one station, interviewing skills
in another, etcetera) each tested with detailed checklists. The examinees
started to memorize the checklists and students complained that the examination
did not reflect clinical reality ("monkeys doing tricks" as they expressed their
feelings). In reaction, stations were integrated, at the cost of an increase in
station time, and checklists were globalized using items that judge the quality
of integral elements of skills on rating scales, at the cost of a decreae in
inter-rater reliability[xciv].
Efficiency may also achieved through
alternative test administration strategies by adjusting the test length to
ability of the examinees[xcv].
In 'sequential testing' a short test is given to all examinees as a screening
assessment. Examinees scoring distant from the cut-off score are excused from
further testing. The assessment is subsequently continued only for the remaining
examinees. In 'adaptive or tailored testing' each subsequent test item presented
to an examinee is dependent on the performance on the previous one. Tailored
testing may be quite efficient for saving testing time, but poses however very
strict psychometric demands on test material.
Introducing standardization and
structure will improve reliability considerably. It is however not always
necessary to totally standardize the testing situation or to use analytical
scoring methods only52. In general, factors introducing error in a
measurement require more sampling within that factor as was illustrated in table
1. Careful test designs with efficient sampling strategies may substantially
improve reliability while saving resources at the same
time43.
From a research perspective,
reliability studies are continuously needed for support of individual testing
methods and their particular contexts of use. In general, however, most sources
of unreliability are well documented. Perhaps a research area of interest is the
closer study of the content specificity problem. When personality psychology
abandoned the trait approach it was replaced by the person-by-situation
interaction paradigm[xcvi],as
a way to better understand the variability across situations phenomenon. In
educational testing for the health sciences the question could be posed whether
all examinee by task interactions are simply error variance, or whether certain
non-random consistencies exist. For example, if growing expertise is
characterized by accumulating individual experience the person by task
interaction could be correlated with level of training and experience (some
first evidence suggests it is not[xcvii]).
Similarly, type of scoring method could be expected to interact with task
variability: analytical methods are anchored to the specific task situation
whereas more holistic methods are not, therefore yielding more or less task
variability variance respectively (some first evidence suggests this is true[xcviii]).
Validity suggestions
No
single method provides a panacea to competence assessment. In educational
practice we tend to occupy ourselves a great deal with the method of assessing,
but we should rather concern ourselves more with what we put into the method.
Similarly, we tend worry about the kind of competency we are measuring and the
theoretical validity of our instruments, while it is probably better to worry
more about the content and the relevance of the tasks we are posing to the
examinees. The historical developments in competence assessment could be
summarized as the continuous search for approximating professional or
educational reality as close as possible while applying standardized test
conditions. It is this concern which we should translate to any measurement
procedure, irrespective of the method used.
The critique against MCQ-tests is
critique against badly written multiple choice questions. If the knowledge to be
asked is placed in an appropriate context, (e.g. a patient problem), the MCQ
might have considerably more acceptability. The response format may still
require to recognition of an option rather than recall (although there is no
impediment to provide longer menus of options), but the impact of this cuing
effect is only marginal. Similarly, essay tests may assess bare factual
knowledge, a multiple station examination, clinical rituals and oral
examinations memorization. It all depends what was in the test. Providing
professionally or educationally valid challenges to examinees is a general
requirement for any format of assessing professional competence.
To discuss relevant tasks rather
than the theoretical competency being measured is a strategy which will also
work more effectively in test development practice. It is quite difficult to
reach consensus about the definition of competency and its subsequent
operationalization into test material. However, agreement is more easily reached
when professionals discuss the kind of (clinical) problems that examinees should
be able to handle or which element of a (clinical) problem can be identified as
a key feature[xcix]
[c].
Providing appropriate context in test material is also fully in line with the
cognitive psychological framework. Storage and retrieval is contextually driven.
Recognition of information or patterns can only be achieved by providing a
relevant professional or educational context. By bringing these relevant contexts into
test items higher cognitive abilities are more likely to be addressed and
expertise differences will emerge.
Once the tasks have been defined it
is important to select the most appropriate format, and here again compromises
must be made, where elements of the other variables will play a major role. For
example, one could argue that since validity research demonstrated that scores
on MCQ-tests are able to predict scores on multiple station examinations, the
cheaper and more efficient MCQ-test is to be preferred. However, the application
of such an MCQ in a medical school will undoubtedly have undesirable
consequences on how students will prepare themselves. When however the purpose
of the test is to screen a large group of professionals (e.g., to determine
needs for CME) the MCQ might be best.
Validity is strongly enhanced when
test material is scrutinized by a review process (including test and item
analysis afterwards). It is virtually impossible to write flawless test material
regardless of the method. Even the simplest reviewing process will have
beneficial effects. It requires however a preparedness of item-writers to submit
their products to the critique of others. Although quite usual in research, this
willingness is not so common in education.
In 1961 Ebel wrote about validity
research that it is "universally praised, but the good works done in its name
are remarkably few" [ci].
Unfortunately, this observation is still true. In our view, the type of validity
studies required for the future need to be different. Except for predictive
studies, the typical correlational research between different measures of
competence to infer conclusions on their validity is neither compelling nor
informative. A more fruitful line of inquiry may result from adoption of a
cognitive psychological framework (or any theory-based framework) in order to
study the theoretical validity of educational tests. This led to validity questions like "what is the
relationship between various stimulus formats and their psychometric
characteristics"; "does contextual information influence test scores and how
does it affect groups at differing levels of expertise"?
There is, however, another type of
research needed. Ebel has pleaded
for researchers to apply more direct validation procedures to achievement
testing (as opposed to 'derived validation' using the theoretical, correlational
approach)101 [cii]
[ciii].
Unlike psychological aptitude and personality tests, educational tests reflect
directly meaningful tasks and allow rational analysis in relation to the domain
of interest. Direct validation investigates the extent to which the tasks posed
by the test represent the real-world tasks of interest. Ebel suggested that
validity can be "built into" a test through careful operational definition of
the tasks and content to be assessed.
Although several other authors have expressed similar views in relation
to educational testing39 75 [civ],
the literature mainly reports derived validation studies. Yet, as we have
concluded, the content of an achievement test and the kind of tasks posed to the
examinee, the exact focus of direct validation studies, will primarily dictate
what is being measured. Therefore,
with Ebel we believe that direct validation studies are more
needed.
Direct validation studies need not
to be restricted to descriptive or qualitative studies into the content validity
of tests. Empirical studies are required to validate the process between task
given to the examinee and the test score reflecting the quality of performing
the task. Particularly where this relationship is more complex, such as in the
tests using written and live simulations, studies of this process are needed.
The validity of the resulting score will depend on the appropriateness of this
process: does the scoring system
reward efficiency or thoroughness; are non-indicated actions penalized; how
should scores be aggregated to a total; how do the raters/examiners/patients
influence the scoring, do questions sample the domain of interest, etc. When the
item scores are valid the total test scores should also be valid. This approach
has been called 'microscopic approach' to validity39. I am not suggesting a discontinuance of
studies into the theoretical nature of tests for professional competence with a
macroscopic focus, (for instance of the kind suggested above in relation
cognitive psychological issues), but we need more studies at the microscopic
level. Continuing the correlational studies in the absence of a sound
theoretical framework with usually flawed methodological designs have been and
will be of little use for research and test development.
Educational suggestions
The
assessment objectives should clearly match educational objectives. When they are
not, the assessment objectives will prevail. The implication for practice is to
be of constant vigil of the educational effects of assessment and to try and use
the driving force of assessment to achieve desirable educational effects. This
is more easily said than realized, however, because there are (again) no fixed
rules and there are, as we discussed, pitfalls involved. Assessment may drive
learning in at least four ways.
a) Assessment drives learning through
its content. If we want students to be able to manage problems we should not
give them tests of memory reproduction. The illustration above of the multiple
station exam of the University of Limburg shows that isolated skills-testing
achieves fragmented competence only: you will get out of it what you put into
it. This conclusion is perhaps trivial through its simplicity, however looking
at educational practice it apparently is not. Once more, the tasks should
reflect professional or educational reality as close as
possible.
b) Assessment drives learning through
its format. The earlier reported unexpected negative effect of using detailed
checklists is an illustration how format may influence learning. Another
illustration is an assessment procedure deliberately developed for its
educational effect: the progress test35 [cv]
[cvi],
a comprehensive test using MCQs covering the integral end-objectives of a
curriculum across all disciplines involved (including basic sciences). It is
periodically (e.g. every three or four months) administered to all students in
the curriculum regardless of their point in training. Since the test is not directly tailored
to course objectives it is difficult for students to prepare themselves
specifically for the test. It has proven to be effective by not interfering with
ongoing learning such as in problem-based learning programs35 [cvii].
c) Assessment drives learning through
the information given. Instead of a decision tool, assessment should also be a
learning exercise. The providing of information is a key to achieve that.
Feedback of assessment results, profile scores, literature references,
debriefing meetings, appeal procedures are elements which enhance the
information flow and increase the formative value of assessment. Similarly, assessment
results can be fed back to test developers, departments, educational committees
and other institutional bodies. Test results also reflect the quality of the
training program and may be used for quality monitoring and control, both at the
micro-level (i.e. the evaluation of courses) and at the macro-level (i.e. the
evaluation of complete programs or instructional methods).
d) Tests drive learning through their
programming. The frequency, the timing, the number of repeat examinations, the
regulation of student promotions are elements of how the programming of
assessment drives learning. Examinations are often in continuous competition
with each other and with the ongoing educational program, and students jump from
hurdle to hurdle. To organize repeat examinations may seem quite fair to
examinees, but at the same time they encourage examinees to adopt minimal
learning strategies. They allow examinees to 'scout' at first attempts or invite
students to prepare minimally: there is always a chance of succeeding
(particularly with unreliable tests) and if not, there is always a next chance.
The student promotion regulations directly define the academic success and
students will react strategically (which are the most important exams, what is
to done first, what can we skip?). Particularly in many European countries
university programs have problems with their attrition rates: unrealistic
numbers of students drop out or get substantially delayed. In part, this problem
is an assessment problem: too many hurdles of too high standards, few
compensatory rules across assessments, etcetera. A recent study showed that
variations in attrition rates in a medical school across 25 years were directly
linked to their examination rules while their was no evidence of variations in
ability between cohorts of students[cviii].
As was suggested and illustrated
earlier, educational effects of assessment are often unexpected. Moreover, the
dangers of mere test-directed studying were pointed out. An additional
complexity is the fact that the strategic use of assessment surmounts the level
of the individual test developer, teacher or department. The integrated
assessment program as a total system will drive learning. Therefore, changes at
the micro level in parts of the system will not have major effects. Strategic
use of assessment is most effective at the macro-level. School-wide assessment
or centralized assessment programs are however very rare in educational
programs[cix].
In our view the impact of assessment
on the educational process is a variable which allows little compromise. We
would argue that educational impact is the heart of educational achievement
testing: assessment should be part of learning process in order to achieve
educational objectives set out in the training program. Any compromise here
directly affects the quality of the educational training
program.
The educational use of assessment
should also have higher priority in research. Despite the wide recognition of
its importance, the empirical work reported is scarce. Methodologically it will
be difficult to carry out research. The complexity of context-bound interactions
may limit the relevance of an experimental approach. Survey research and
case-studies are more likely indicated.
Acceptability
suggestions
Just like students or examinees
adhere to understandable behavioral patterns, faculty have similar human
patterns. For example, examiners usually hate strongly structured assessments.
It does not exploit their expertise and restricts their freedom as
professionals. Similarly, examiners usually value direct contact with examinees,
even through their written responses, rather than impersonal and mechanistic
judgements. These are important factors to be considered (and to be used). More
difficult is the set of values which are often brought to assessment. They are
based on personal experience, beliefs and (mis)conceptions. Although using
research results and empirical evidence is considered as professional behavior
in health practice and in research work, this attitude does not easily
generalize to education. Faculty are usually unaware of educational research or
do not consider it very important92. In addition, educational
traditions are often deeply
anchored in countries and institutions. Students naturally have similar beliefs
and attitudes towards assessment.
The practical implication is that,
regardless of their justification, elements influencing the acceptability of
achievement testing need to be considered in the choice and design of an
assessment procedure or program. Assessment not accepted by staff or students
will not survive. The issue is to attempt to strategically use the information
on faculty and student beliefs in order to get their commitment. Provision of
information is a key element in this strategy, but the willingness to compromise
is definitely another.
Cost
Good assessment is definitely
costly. Test construction with built-in review and control processes,
development of high fidelity simulations, training of examiners and patients,
test administration, data processing, feedback to students, staff and the
institution, monitoring of effects, are all resource intensive activities. The
cost of assessment requires compromises in practice. There are three remarks
however in this respect.
First, investing in assessment is
investing in teaching and learning. Given the lawful relationship between
assessment and learning, good assessment will facilitate good learning. In other
words, an investment in educational achievement testing is worthwhile and will
pay off. Second, a different perspective emerges when cost of assessment is
related to the cost of teaching. The expenditure for teaching is more easily
accepted than for assessment, but it remains a matter of priorities and
allocation of resources. With relatively small shifts in this balance
substantial improvement in test development could be achieved. Third, perceived
resource intensive assessment methods turn out to be feasible in practice. The
widespread use of multiple station examinations, including initiatives for
nation-wide introduction38 [cx],
is proof. However, more studies would be useful reporting the precise costs
involved in assessment procedures[cxi].
The
current state of the art in the assessment of professional competence is
unfortunately more complex than a recipe book of agreed testing technology
options. Many intuitive beliefs about assessment appeared naive or incorrect. On
the other hand, clear progress has been made. The history of assessment is
characterized by continuous attempts to approximate the real professional or
educational world as close as possible, while maintaining standardized
test-taking conditions. This is the essence of professional competence
assessment, and should be applied to any assessment of professional competence,
regardless of the format. Numerous assessment technology has been developed in
the course of time and is available. However, there is more than the technology
of assessment. Assessment as an educational strategy should become more of a
concern of test developers and training institutions. Extending assessment
technology towards maximal fidelity and its planned educational use will be the
challenge for the future.
References
[1]' In 1986. USSI = L 1490
[2] In
1985, the lowest rate was US$ 1 = ¥ 263.65: the highest rate was US$ 1= ¥
199.80. In 1986. the lowest rate was US$ 1 = ¥ 203.30; the highest rate was US$1
= ¥ 152.55 in Tokyo.
[3]From
Fuji to Everest, Forbes. May 2, 1988.
[4] Harvard
Business School case study, Canon Inc., World-wide Copier
Strategy, 1983. page 2
[5] InfoSource's
classification scheme was- generally used as a way to segment the market, as
follows: Category 1 - less than 20 copies per minute (cpm): Category 2 - 20-39
cpm; Category 3 - 40-59 cpm; Category 4 - 60-89 cpm: Category 5 - 90 + cpm.
The Personal Copier category was
subsequently added for copiers generating less than 10cpm.
[6]A test item is the smallest independent test unit, and may consist of multiple sub-units. For instance, a checklist for a particular station will probably contain a number checklist-items, but they are clustered together through the content of the station. If an examinee has no knowledge of the content area of the station, he or she will have more chance to fail all items, i.e. the items are dependent on each other. The station score is therefore the 'item' in the test.
[7]This is the mulitrait-multimethod approach to validity considered to be the strongest design in classical trait research, using multiple measures assessing multiple traits in a fully crossed way in order to separate trait variance form method variance (Campbell DT, Fiske DW. Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin 1959; 56: 81-105.)
[8]Some may call this as face validity. Face validity is often used to indicate to the validity of tests at first impression or at face value. Face validity will evidently influence acceptability, but the latter it is meant here in a broader sense and includes the entire belief system of people in relation to assessment or an assessment method.
[i]. Ebel RL, Frisbie DA. Essentials of educational measurement. Englewood Cliffs, New Jersey: Prentice-Hall, 1986.
[ii]. Ory JC, Ryan KE. Tips for improving testing and grading. Newbury Park, Califorina: Sage Publications, 1993
[iii]. Pickering G. Against multiple-choice questions. Medical Teacher 1979; 1: 84-6.
[iv]. Newble, DI, Baxter A, Elmslie G. A comparison of multiple choice and free response tests in examinations of clinical competence. Medical Education 1979; 13: 263-8.
[v]. McGuire C. Perspectives in Assessment. Academic Medicine (Supplement) 1993; 68: S3-8.
[vi]. Case SM, Swanson DB. Extended-matching items: a practical alternative to free-response questions. Teaching and Learning in Medicine 1993; 5: 107-15.
[vii]. Schuwirth LWT, van der Vleuten CPM, Donkers HHLM Open ended questions versus multiple choice questions: An analysis of cueing effects. In: Harden RM, Hart IR, Mulholland H (Editors) Approaches to assessment of clinical competence - Part II. Norwich: Page Brothers, 1992: 486-91.
[viii]. Linn RL. Educational assessment: expanded expectations and challenges. Educational Evaluation and Policy Analysis 1993: 15; 1-16.
[ix].Van der Vleuten CPM & Newble DI How can clinical reasoning be tested? The Lancet 1995; 345: 1032-1034.
[x]. McGuire CH, Babbott D. Simulation technique in the measurement of problem solving skills. Journal of Educational Measurement 1967; 4: 1-10
[xi]. Hodgkin K, Knox JDE. Problem centered learning: The modified essay question in medical education. Edinburgh: Churchill-Livingstone, 1975.
[xii]. Feletti GI, Saunders NA, Smith AJ. Comprehensive assessment of final‑year medical student performance based on undergraduate programme objectives. Lancet 1983; 2(8340): 34-7.
[xiii]. Barrows HS, Tamblyn RM. The portable patient problem pack (P4). A problem-based learning unit. Journal of Medical Education 1977; 52:1002-4.
[xiv]. Wiliams RG, Vu NV, Barrows HG, Verhulst S. Profile of the Clinical Reasoning Test (CRT): An objective measure of problem solving skills and proficiency in using medical knowledge. In: Schmidt HG, De Volder ML (Eds.) Tutorials in Problem-Based Learning. Assen: Van Gorcum, 1984; 81-90.
[xv]. Norcini JJ, Meskauskas JA, Langdon LO, Webster GD. An evaluation of a computer simulation in the assessment of physician competence. Evaluation in the Health Professions 1986 ; 9: 286-304.
[xvi]. Bligh TJ. Written simulation scoring: comparison of nine systems. [dissertation] Urbana-Champaign (IL), University of Illinois, 1980.
[xvii]. Swanson D, Norcini J, Grosso L Assessment of clinical competence: Written and computer-based simulations. Assessment and Evaluation in Higher Education 1987; 12: 220-46.
[xviii]. Norman G, Bordage G, Curry L et al. A review of recent innovations in assessment. In: Wakeford RE, ed. Directions in clinical assessment. Report of the First Cambridge Conference. Cambridge: Cambridge University School of Clinical Medicine, 1985; 9-27.
[xix]. Elstein A, Shulman LS, Sprafka SA. Medical problem solving: An analysis of clinical reasoning. Cambridge Massachusetts: Harvard University Press, 1978.
[xx]. Friedman R, Korst D, Schultz J, Beatty E, Entine S. Experience with the simulated patient physician encounter. Journal of Medical Education 1978; 53-825-30.
[xxi]. McLeskey C, Ward R. Validity of written examinations. Anesthesiology 1978; 49: 224.
[xxii]. Marshall J (1977) Assessment of problem-solving ability. Medical Education 1977: 11; 329-334.
[xxiii]. Newble DI, Hoare J, Baxter A. Patient management problems: issues of validity. Medical Education 1982; 16: 137-42.
[xxiv]. Norman GR, Feightner JW A comparison of behaviour on simulated patients and patient management problems. Journal of Medical Education 1981; 55: 529-37.
[xxv]. Norcini JJ, Swanson DB, Grosso LJ, Shea JA, Webster GD Reliability, validity and efficiency of multiple choice question and patient management problem item formats in the assessment of physician competence. Medical Education 1985; 19: 238-47.
[xxvi]. Bordage G, Page G An alternative approach to PMPs: The "key features" concept. In: Hart IR, Harden RM, eds. Further Developments in Assessing Clinical Competence. Montreal: Heal-Publications, 1987; 59-75.
[xxvii]. De Graaff E. Post G. Drop M. Validation of a new measure of clinical problem-solving. Medical Education 1987; 21: 213‑218.
[xxviii]. Powles ACP, Wintrup N, Neufeld VR,Wakefield JH, Coates G, Burrows J. The triple jump exercise: Further studies of an evaluative technique. Proceedings of the 20th Annual Conference on Research in Medical Education, Washington: American Association of Medical Colleges, 1981: 74-9.
[xxix]. Fiedman CP, Murphy GC, Smith AC, Mattern WD. Exploratory study of an examination format for problem-based learning. Teaching and Learning in Medicine 1994; 6: 194-8.
[xxx]. Boud D. The role of self-assessment in student grading. Assessment and Evaluation in Higher Education 1989; 14: 20-30.
[xxxi]. De Grave W, De Volder M. Peer-evaluation and problem-based learning. In: Schmidt H, De Volder M. (Editors) Tutorials in Problem-based learning. Assen: Van Gorcum, 1984: 116-122.
[xxxii]. Magzoub M. Studies in Community-based Education [dissertation]. Maastricht: University of Limburg, 1994.
[xxxiii]. Case SM, Swanson DB, Van der Vleuten CPM Student assessment in problem-based learning curricula. In: Boud D, Feletti G. (Editors) The Challenge of Problem-based Learning. London: Kogan Page, 1991: 260-73.
[xxxiv]. Gordon MJ. A review of the validity and accuracy of self-assessments in health professions training. Academic Medicine 1991; 66: 762-69.
[xxxv]. Blake JM, Norman GR, Smith EKM. Report card from McMaster: student evaluation at a problem-based medical school. Lancet 1995; 345: 899-902.
[xxxvi]. Harden R, Gleeson F. Assessment of clinical competence using an objective structured clinical examination (OSCE). Medical Education 1979; 13: 41‑54.
[xxxvii]. Rothman AI, Cohen R. (Editors) Proceedings of the Sixth Ottawa Conference on Medical Education. Toronto: University of Toronto Bookstore Custom Publishing, 1995: 292-321.
[xxxviii]. Reznick R, Blackmore DE, Cohen R et al. An objective structured clinical examination for the licentiate of the Medical Council of Canada: From research to reality. Academic Medicine [Supplement] 1993; 68: S4-6.
[xxxix]. Van der Vleuten CPM, Swanson DB. Assessment of clinical skills with standardized patients: State of the art. Teaching and Learning in Medicine 1990; 2: 58-76.
[xl]. Vu NV, Barrows HS. Use of standardized patients in clinical assessments: recent developments and measurement findings. Educational Researcher 1994; 23: 23-30.
[xli]. Swanson DB, Norman GR, Linn RL. Performance-based assessment: Lessons from the health professions. Educational Researcher 1995; 24: 5-11,35.
[xlii]. Van Thiel J. Kraan HF, Van der Vleuten CPM Reliability and feasibility of measuring interviewing skills using the revised Maastricht History Taking and Advice Checklist. Medical Education 1991; 25: 224-9.
[xliii]. Swanson DB. A measurement framework for performance-based tests. In: Hart IR, Harden RM, editors. Further developments in assessing clinical competence. Montreal: Can-Heal, 1987: 13-45.
[xliv]. Stalenhoef-Halling BF, Van der Vleuten CPM , Jaspers TAM, Fiolet JFBM. The feasibility, acceptability and reliability of open-ended questions in a problem-based learning curriculum. In: Bender W, Hiemstra RJ, Scherpbier AJJA, Zwierstra RP (Editors) Teaching and Assessing Clinical Competence. Groningen: Boekwerk Publ, 1990: 552-7.
[xlv]. Erviti V, Templeton B, Bunce J, Burg F. The relationships of pediatric resident recording behavior across medical conditions. Medical Care 1980, 18, 1020‑31.
[xlvi]. Rethans JJ, Sturmans F, Drop MJ, Van der Vleuten CPM (1991) Assessment of performance in actual practice of general practitioners by use of standardized patients. British Journal of General Practice 1991; 41: 97-9.
[xlvii]. Shavelson RJ, Baxter GP, Gao X. Sampling variability of performance assessments. Journal of Educational Measurement 1993; 30: 215-32.
[xlviii]. Klein as cited in: Linn RL. Educational assessment: expanded expectations and challenges. Educational Evaluation and Policy Analysis 1993: 15; 1-16.
[xlix]. Shavelson RJ, Mayberry P, Li W, Webb NM. Generalizability of military performance measurements: Marine Corps rifleman. Military Psychology 1990; 2: 129-44.
[l]. Frijns PHAM, Van der Vleuten CPM, Verwijnen GM, Van Leeuwen YD The effect of structure in scoring methods on the reproducibility of tests using open-ended questions. In: Bender W, Hiemstra RJ, Scherpbier AJJA, Zwierstra RP (Editors.) Teaching and Assessing Clinical Competence. Groningen: Boekwerk Publ, 1990: 466-471.
[li]. Van Ham I, Gerritsma J. The assessment of clinical competence in general practice with chart stimulated recall. In: Bender W, Hiemstra RJ, Scherpbier AJJA, Zwierstra RP, editors. Teaching and assessing clinical competence. Groningen: Boekwerk, 1990: 306-9.
[lii]. Van der Vleuten CPM , Norman GR, De Graaff E. Pitfalls in the pursuit of objectivity: Issues of reliability. Medical Education 1991; 25: 110-8.
[liii]. Streiner DL. Global Rating Scales. In: Neufeld VR, Norman GR (Editors) Assessing Clinical Competence. New York: Springer, 1985: 119-41.
[liv]. Streiner D. Clinical ratings - ward evaluation. In:Shannon S, Norman G. (Editors) Evaluation methods: A resource handbook. Hamilton: The Program for Educational Development, McMaster University, 1995: 29-31.
[lv]. Hastorf AH, Schneider DJ, Polefka J. Person perception. Reading, Massachusetts: Addison-Wesley, 1970.
[lvi]. Ross M. Relation of implicit theories to the construction of personal histories. Psychological Review 1989; 96: 341-57.
[lvii]. Norman GR, Swanson DB, Case SM. Conceptual and methodological issues in studies comparing assessment formats. Teaching and Learning, in press.
[lviii]. Norman GR, Smith E, Powles A, Rooney P, Henry N, Dodd P. Factors underlying performance on written tests of knowledge. Medical Education 1987; 21: 297-304.
[lix]. Jean P, Schuwirth L, Van Santen M, Van der Vleuten C. Do problem analysis questions (PAQs) and true/false questions (TFQs) measure different skills? Medical Education, in press.
[lx]. Maatsch J, Huang R. An evaluation of the construct validity of four alternative theories of clinical competence. Proceedings of the Twenty-fifth Annual Conference on Research in Medical Education, American Association of Medical Colleges. Washington, DC, 1986.
[lxi]. Maatsch J. Model for a criterion-referenced medical specialty test. Final Report Grant No. HS-02038-02, Office of medical Education Research and Development Michigan State University, 1980.
[lxii]. Van der Vleuten CPM, Van Luijk S, Beckers HJM. A written test as an alternative to performance testing. Medical Education 1989; 23: 97-107.
[lxiii]. Ward W. A comparison of free-response and multiple choice forms of verbal aptitude tests. Applied Psychological Measurement 1982; 6: 1-11.
[lxiv]. Thissen D, Wainer H, Wang X. Are tests comprising both multiple-choice and free-response items necessarily unidimensional than multiple-choice tests? An analysis of two tests. Journal of Educational Measurement 1994: 31; 113-23.
[lxv]. Ramsey PG, Carline JD, Inui YS et al., Predictive validity of certification by the American Board of Internal Medicine. Annals of Internal Medicine 1989; 110: 719-26.
[lxvi]. Solomon et al 1990 as cited in Norman GR. Can an examination predict competence? The role of recertification in maintenance of competence. Annals of the Royal College of Physicians and Surgeons of Canada 1991; 24: 121-124.
[lxvii]. Norman GR, Davis DA, Painvin A, Rath D, Ragbeer M. Comprehensive assessment of clinical competence of family-general physicians using multiple measures. Proceedings 28th Conference on Research in Medical Education. Washington: American Association of Medical Colleges, 1989.
[lxviii]. Norman GR. Can an examination predict competence? The role of recertification in maintenance of competence. Annals of the Royal College of Physicians and Surgeons of Canada 1991; 24: 121-124.
[lxix]. McGuire C. Written methods for assessing clinical competence. In: Hart IR, Harden RM, editors. Further developments in assessing clinical competence. Montreal: Can-Heal, 1987: 44-58.
[lxx]. Hager P, Gonczi A, Athanasou J. General issues about assessment of competence. Assessment & Evaluation in Higher Education 1994; 19: 3-16.
[lxxi]. Norman GR, Van der Vleuten CPM, De Graaff E. Pitfalls in the pursuit of objectivity: Issues of validity, efficiency and acceptability. Medical Education 1991, 25, 119-126.
[lxxii]. Van Thiel J, van der Vleuten, CPM, Kraan H. Assessment of medical interviewing skills: Generalizability of scores using successive MAAS-versions. In: Harden RM, Hart IR, Mulholland H. (Editors) Approaches to assessment of clinical competence - Part II. Norwich: Page Brothers, 1992: 536-540.
[lxxiii]. Newble D, Jaeger K, The effect of assessments and examinations on the learning of medical students. Medical Education 1983; 17: 165-71.
[lxxiv]. Popham WJ. Measurement as an instructional catalyst. In: Ekstrom RB. (editor) Measurement, technology and individuality in education. San Francisco: Jossey-Bass, 1983: 87-103.
[lxxv]. Frederiksen N. The real test bias: influences of testing on teaching and learning. American Psychologist 1984; 39: 193‑202
[lxxvi]. Entwistle N. Styles of Learning and Teaching. Chichester: John Wiley & Sons, 1981.
[lxxvii]. Stillman P, Swanson D. Ensuring the clinical competence of medical school graduates through standardized patients. Archives of Internal Medicine 1987; 147: 1049‑52.
[lxxviii]. Gibbs G. Improving the quality of student learning. Bristol: Technical & Educational Services, 1992.
[lxxix]. Shepard LA, Psychometrician's beliefs about learning. Educational researcher 1991; 20: 2-16.
[lxxx]. Snyder BR. The hidden curriculum. New York: Knopf, 1971.
[lxxxi]. Popham WJ, Cruse KL, Rankin SC, Sandifer PD, Williams PL. Measurement-driven instruction: It's on the road. Phi Delta Kappan 1985; 66: 628-34.
[lxxxii]. Semb GB, Ellis, JA Knowledge taught in school: What is remembered? Review of Educational Research 1994; 64: 253‑86.
[lxxxiii]. Van Luijk SJ, Van der Vleuten CPM, Schelven RM. The relation between content and psychometric characteristics in performance-based testing. In: Bender W, Hiemstra RJ, Scherpbier AJJA, Zwierstra RP (Editors) Teaching and Assessing Clinical Competence. Groningen: Boekwerk Publications, 1990: 202-207.
[lxxxiv]. Mischel W. Personality and assessment. New York: John Wiley, 1968.
[lxxxv]. Schmidt H, Norman G, Boshuizen HA. cognitive perspective on medical expertise: Theory and implications. Academic Medicine 1990; 65: 611-21.
[lxxxvi]. Norman G, Allery L, Berkson, L et al. Research in the psychology of clinical reasoning: implications for assessment. Paper from the Fourth Cambridge Conference, Cambridge University School of Clinical Medicine, Cambridge, 1989.
[lxxxvii]. Higgs J, Jones M. (editors). Clinical reasoning in the health professions. Oxford: Butterworth/Heinemann, 1995.
[lxxxviii]. Norman G, Regehr G. Contempory issues in cognitive psychology: Implications for professional education. In: Rothman AI, Cohen R. (Editors) Proceedings of the Sixth Ottawa Conference on Medical Education. Toronto: University of Toronto Bookstore Custom Publishing, 1995: 17-25.
[lxxxix]. Rethans JJ, Sturmans F, Drop MJ et al. Does competence of general practitioners predict their performance. British Medical Journal 1991; 303: 1377‑80.
[xc]. Norman GR. Reliability and construct validity of some cognitive measures of clinical reasoning. Teaching and Learning in Medicine 1989; 1: 194-9.
[xci]. Newble DI, Raymond GA. The Pattern Completion Item (PCI): A potential measure of clinical problem-solving skills. In: Rothman AI, Cohen R. (Editors) Proceedings of the Sixth Ottawa Conference on Medical Education. Toronto: University of Toronto Bookstore Custom Publishing, 1995: 191-2.
[xcii]. Nelson MS, Clayton BL, Moreno R. How medical school faculty regard educational research and make pedagogical decisions 1990; 65: 122-6.
[xciii]. Hays RB, Fabb WE, Van der Vleuten CPM. Reliability of the fellowship examination of the Royal Australian College of General Practitioners. Teaching and Learning in Medicine 1995: 7; 43-50.
[xciv]. Van Luijk SJ, Van der Vleuten CPM. A comparison of checklists and rating scales in performance-based testing. In: Hart IR, Harden RM, Des Marchais J, editors. Current Developments in Assessing Clinical Competence. Montreal: Can-Heal, 1992: 357-62.
[xcv]. Newble, D., Dawson, B., Dauphinee D, et al. Guidelines for assessing clinical competence. Teaching and Learning in Medicine 1994: 6; 213-220.
[xcvi]. Endler NS, Magnusson D, editors. Interfactional psychology and personality. Washington DC: Hemisphere, 1976.
[xcvii]. Van der Vleuten CPM, Schuwirth LWT, Ronteltap CFM. A cognitive psychological interpretation of a few remarkable psychometric findings. In: Rothman AI, Cohen R. (Editors) Proceedings of the Sixth Ottawa Conference on Medical Education. Toronto: University of Toronto Bookstore Custom Publishing, 1995: 506-8.
[xcviii]. Frijns PHAM. Scoringsmodellen voor open-vraag vormen (Scoring models for free-response formats) [dissertation]. Maastricht: University of Limburg, 1992.
[xcix]. Brailovsky C, Bordage G, Carretier H, Page G. Content validity of the key features' approach of the Medical Council of Canada's Exam. In: Harden RM, Hart IR, Mulholland H (Editors) Approaches to assessment of clinical competence - Part II. Norwich: Page Brothers, 1992: 476-7.
[c]. Bordage G, Brailovsky C, Carretier H, Page G, Content validation of key features on a national examination of clinical desicion-making skills. Academic Medicine 1995; 70: 276-81.
[ci]. Ebel RL. Must all tests be valid? American Psychologist 1961; 16: 640-7.
[cii]. Ebel R. Measuring Educational Achievement. Englewood Cliffs: Prentice‑Hall Inc, 1965.
[ciii]. Ebel R. The practical validation of tests of ability. Educational Measurement: Issues and practice 1983; 2: 7-10.
[civ]. Kane M. The validity of licensure examinations. American Psychologist 1982: 37; 911‑918.
[cv]. Arnold L, Willoughby TL. The Quarterly Profile Examination. Academic Medicine 1990; 65; 515-6.
[cvi]. Van der Vleuten CPM, Verwijnen GM, Wijnen WHFW. Fifteen years of experience with progress-testing. Medical Teacher, in press.
[cvii]. Van Berkel HJM, Nuy HJP, Geerligs T. The influence of progress tests and block tests on study behavior. Instructional Science 1995; 22: 315-331.
[cviii]. Cohen-Schotanus J. Effecten van curriculumveranderingen. [dissertation, with English summary]. Groningen: University of Groningen, 1994.
[cix]. Van der Vleuten CPM, Verwijnen GM. A system for student assessment. In: Van der Vleuten CPM, Wijnen WHFW, editors. Problem-based learning: Perspectives from the Maastricht experience. Amsterdam: Thesis-publ., 1990: 27-49.
[cx]. Klass D, Clauser B, Fletcher E et al. Progress in developing a standardized patient test of clinical skills at the National Board of Medical Examiners: Prototype two. In: Rothman AI, Cohen R. (Editors) Proceedings of the Sixth Ottawa Conference on Medical Education. Toronto: University of Toronto Bookstore Custom Publishing, 1995: 324-6.
[cxi]. Reznick RK, Smee S, Baumber JS, et al. Guidelines for estimating the real cost of an Objective Structured Clinical Examination. Academic Medicine 1993; 68: 513-17.