Maastricht University

Department of Educational Development
and Research

 

 

 

 

 

 

 

 

 

 

 

 

 

 


 



Contents

 


1.            Maastricht OverAll Test 1995 – 1996
School of Economics and Business Administration
Extracts from:

-        Study guidelines for case studies Olivetti and Canon

-        Case studies

-        Test items

-        Segers, M.S.R., An alternative for Assessing Problem-solving Skills: the Overall Test. Educational Evaluation, 23(4), 373-398.

 

2.            Maastricht Skills Test
Faculty of Medicine

-        Examples of criteria lists

3.            Maastricht Progress Test
Faculty of Medicine

-        Extracts from progress test September 1990

4.            The Thesis Supervision Experiment

-        Extract from Redistributing power in the classroom: the missing link in Problem-based learning. (see reference nr.7)

5.            Reference list to the assessment procedures used in Maastricht

6.            Recommended reading

 

-        Van der Vleuten, C. P. M., Scherpbier, A.J.J.A., Wijnen, W.H.F.W., & Snellen, H.A.M. (1996). Flexibility in learning: a case report on problem-based learning. International Higher Education(2), 17-24.

 

    -    Van der Vleuten, C. P. M. (1996). The assessment of professional competence:

         developments, research and practical implications. Advances in Health Sciences

         Education, 1(1), 41-67.

 

 


1.        

 


Maastricht OverAll Test 1995 – 1996

School of Economics and Business Administration

Extracts from:

-        Study guidelines for case studies Olivetti and Canon

-        Case studies

-        Test items

-        Segers, M.S.R., An alternative for Assessing Problem-solving Skills: the Overall Test.


UNIVERSITY OF LIMBURG

 

Faculty of Economics and Business Administration

International Business Studies

 

 

 

 

 

OVERALL TEST I

 

Information and Study materials

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Academic year 1995-1996


1.         INTRODUCTION

 

The first Overall Test (OAT) will take place on January 26, 1996 (between 9:00 and 1 2:00). Two weeks are available to you to prepare for this test, during which time no tutorial groups are scheduled (please refer to the Study Guide p.18 and 84). This reader tells you about the objectives, form and procedures of the test. It also includes the test articles and some study guidelines, to assist you in your preparation.

 

2.         THE OVERALL TEST IN THE FIRST YEAR

 

The Faculty's main objective is to train people who are able to recognize, to define and analyze, and to find ways to solve problems related to business economics and administration. Our aim is to offer an integrated curriculum on business studies. Four testing procedures are used to fit this objective: know­ledgde tests, practical exams, writing assignments and overall tests.

 

Subject matters

The OAT is designed to measure a student's ability to apply concepts, theories, formulas, and problem-solving skills, and to assess student's understanding of the relationship between the various disciplines. In other words, the OAT tries to answer questions such as: Is the student able to place the author's view within business/economic theory?; Is the student able to interpret a diagram that is part of a case?; Is the student able to solve a problem by arguing from different perspectives? The OAT covers the entire leaming contents of the two blocks preceding it, including the contents of skills training, but excluding the language training.

The main testing objective of the knowledge test is to measure student's knowledge (e.g. the definition of the product lifecycle), whereas the overall test is focused on the application of knowledge by analyzing the problem (concept, model etc.) from different perspectives.

 

The importance of passing an overall test

Overall tests are part of the first-year examination process, together with the knowledge tests, exams for Quantitative Methods and writing assignments.  For information on the determination of the passfail grade, the minimum result for passing an OAT and compensation rules, please refer to the exam regulations.

 

Preparation

lt is important that you gain a clear understanding of the learning contents of the previous blocks. After the 8-weeks block education, you have been given two weeks without group meetings to prepare for the OAT. During these two weeks you are expected to study the enclosed articles. Study guideline: try to grasp the contents of the articles and relate them to the knowledge and skills you acquired during the last two blocks. This implies the acquired knowledge on the diciplines of marketing, organisation, microeconomics, international economic relations, economics of the public sector and knowledge and skills acquired during various skill training sessions.

 

Study materials

Enclosed in this reader you find the study materials for the OAT. They consist of a number of articles from joumals and/or books. You are expected to study them using the learning contents of the past blocks. On the OAT, questions will refer to these articles. Some articles are accompanied by study guidelines, which indicate the topics to be dealt with in particular, so you can concentratie on them.

 

Form

The OAT consists of two types of questions: open questions, marked 'O' (also called essay questions) the answer to which you must formulate yourself; and true/?/false questions, marked 'C' (similar to the questions on the knowledge test at the end of each block).

Questions typically refer to the articies. Articles may be used in different ways. For example:

·        the article forms the context within which questions concerning theory will be asked. In this case the article is similar to an 'extensive stem' (as in the knowledge test).

·        questions on the article itself are asked. Emphasized are the abilities to comprehend, to interpret and to place the article in relation to current literature. In this case the article will be more extensive and complex.

The test is an 'open book' test: you are allowed to bring with you and use all literature and other materials, ranging from textbooks to notes, from dictionaries to calculators. You may use a textbook as a 'reminder', in order to look up a term or model. However, you will not have enough time available to study the textbook in detail during the test.

You are not allowed to share study materials with other students during test administration. Be sure to bring the enclosed articles with you.

Questions requiring the interpretation of an article often refer to certain passages. Obviously, it would be difficult to answer such questions without the article.

Since the true/?/false questions have to be marked on a computer forrn, you are further requested to bring an HB-pencil with you -

 

Notification

Shortly after the test administration the answers to the true/?/false questions and the model answers to the open questions will be published on the notice boards.

 

Make-up Exam (re-take)

The Make-up Exam is scheduled for March 26, 1996. Students receiving a failing grade for the first exam are expected to partake. Prior registration is required for all students.

 

Objections

Objections concerning the nature

Objections concerning the nature of true/?/false questions or answer keys as well as objections to open questions and model answers must be filed WITHIN 5 WORKING DAYS after the test date. Late objections will not be handled.

 

Objections concerning grading of open questions

Graded exams will be available for inspection on March 08, 1996 between 9:00 - 12:00. Each student is allowed to look into his or her graded exam for 15 minutes. However, PRIOR REGISTRATION is required, so individual exams can be pulled. Registration can take place Monday March 04 till Wednesday March 06 at the Education Desk, room 0008. Well-founded objections to the grading of open questions must be filed WITHIN 5 WORKING DAYS after the inspection date. Late objections will not be handled.

The OAT coordinator deals with students' written objections. His or her written response to these objections are kept on file at the Education Office, room 3064. They will be available for review during regular office hours.

 

Form requirements

The form requirements for filing objections are:

·        objections must be typed, and submitted in duplicate;

·        a separate form must be used for each objection;

·        objections conceming content have to be argued on the basis of literaturen

·        the top-left hand corner of each objection must indicate

*    student name and address

*    student ID number

*    study (and graduate) programma

*    name of test.

Objections that do not meet these requirements will not be processed.

Objections must be addressed to the OAT coordinator, and can be submitted to the Education Office

(next to the Education Desk or next to the secretariat, room 3072).

 

Publication of final marks

The final marks will be published Friday, February 23, 1996.

 

 

 

 

3.         LITERATURE

 

·        Case study Nestié S.A.

·        Case study Procter and Gamble Europe.

·        Paul J.H. Schoemaker, "Scenario Planning: a Tool for Strategic Thinking", Sloan Management Review, Winter 1995, pp. 25-39.

·        Case study Olivetti

·        Case study Canon

 

4.         STUDY GUIDELINE

 

·        Case studies Nestié S.A. and Procter and Gamble Europe.

These case studies deal with several of the topics treated in block 1.1 "Introduction to Organization and Marketing". More specifically, they illustrate concepts such as corporate and business-level strategy, organizational structure, marketing and organizational control.

lt is important that you read the cases very carefully. You should have a thorough understanding of the situations and problems faced by Nestlé and Procter and Gamble. This implies that if you do not understand certain concepts or words, you should search for additional literature in order to find adequate explanations.

The questions are designed to assess and apply the knowledge you acquired during block 1.1 in new problem situations. This has two implications. First, we assume that you have sufficient, ready to use knowledge of the literature associated with block 1.1. If this is not the case we advice you to review the literature. Second, you should study the cases with the literature of block 1.1 in mind. That is, try to link the situations and problems described in the cases with the relevant literature.

 

·        Paul J.H. Schoemaker, "Scenario Planning: a Tooi for Strategic Thinking", Sloan Management Review, Winter 1995, pp. 25-39.

This article will be used as a background for the questions regarding the QM subjects. In order to allow yourself an optimal preparation, take enough time to study, analyse, go through and check two specific parts of this article.

The first part that deserves additional attention is the description of the two applications of scenario planning (pp. 30-36). On these pages Schoemaker relates his method of 'scenario planning' with the (matrix) and correlation (matrix) as to be found in W&W? And next, what exactly is the relation between a correlation matrix (such as the one in table 3) and the scenario proflies (given in figure l). To phrase last question in other words: if we give you an arbitrary correlation matrix, could you possibly derive the corresponding scenario profiles?

The second part of the article that deserves extra study time, is 'Table 5, p. 37' of the article, describing in full detail the outcomes of several test experiments, together with a rather briefly worded comment on these outcomes by Schoemaker. The comment itself doesn't clarify the experiments too much, but with the contents of table 5, we have enough information to 'reconstruct' the experiments. Reconstructon in the meaning of: to find out essential characteristics of the tests such as its type (test on a single mean, test on a proportion, test on the difference in means, test on the difference in proportions, ... )?; what is the null hypothesis to be tested?; which is the distribution of the test statistic?; is the test one-sided or two-sided? Some of those characteristics can only be discovered by calculating back, starting from Schoemaker's outcomes as to be found in table 5. Make these calculations as part of your preparation at home; otherwise, you will have shortage of time during the test!

 

·        Case studies Olivetti and Canon

The cases - Olivetti and Canon - discuss the global corporate policies of two major companies. After reading the cases very carefully, you should be able to apply theoretical issues that are dealed within block 1.2 “Introduction to International Business" to the real life business situations as described in the cases. In order to evaluate the strategies of the two companies in today’s global marketplace you should, among others, understand important concepts such as internationalisation, strategic alliances, product life cycle theory, central coordination and local adaptation. Furthermore, you should be aware of the political and economic situations that affect the behaviour of (multinational) corporations. Read the cases at home and try to focus on those parts of the case studies that are relevant to block 1.2.

For the Olivetti case, p. 254-261 are omitted because they were not relevant for your study.


Global corporate policies

 

Case 7.1 Olivetti

 

Copyright (© 1993 by the International Institute for Management Development (IMD), Lausanne, Switzerland. Not to be used or reproduced without written permission directly,from IMD.

 

This case was prepared by Research Associate JoyceMiller, under the supervision of Professors George Taucher and Dominique Turpin, as a basis flor class discussion rather than to illustrate either effective or ineffective handling of a business situation.

 

In late 1986, Elserino Piol, Executive Vice President Strategies and Development in the Olivetti Group, one of the world's foremost information technology companies and the second largest indigenous personal computer manufacturer in Europe, was concerned about the company's photocopier business. Their Agliè plant located near Olivetti's headquarters in Ivrea was producing about 20,000 units annually, most of which were sold in Italy. This operation was expected to be an important component for Olivetti in creating the 'integrated office', where several pieces of standalone equipment would be linked up in a muiti-functional, automated system.

But the window of opportunity was closing. With the fast pace of development in the telecommunications technology that provided the networks and links between formerly disparate pieces, several new contenders were poised to enter the office-of-the-future market. A few months earlier, Mr. Piol had travelled to Tokyo to meet with senior management in Canon Inc., a major Japanese copier manufacturer, to sound out the possibilities for co-operation. At this point in time, Mr. Piol wondered whether it might make sense to form a basic technology alliance with a leader in the copier field.

 

Ing. C. Olivetti & Co., SpA

 

Ing. C. Olivetti & Co., SpA was the parent company of the Olivetti Group, whose product line included-distributed data processing and office automation equipment, typewriters, calculators, cash registers. and photocopiers (Exhibit 7.1).

In 1986, the Olivetti Group obtained a net income of L565.5 billion on sales of L7,317 billion, up 12.3% from the previous year[1]. At this time, Olivetti had manufacturing activities in 27 plants in seven countries.

 

 

                                                                                                         1986                     1985

 

Distributed data processing and office automation

Electronic professional typewriters. videotyping systems         14.0                      13.2

Personal computers                                                                        28.5                      29.5

Minicomputers and terminals                                                        28.0                      32.2

Printers                                                                                               7.0                         7.2

Telecommunications equipment                                                      2.8                         2.7

 

Total                                                                                                  80.3                      84.8

 

Office products

Portable and office manual and electric typewriters                      8.2                         5.9

Calculators, cash registers                                                                6.7                         5.7

Copiers                                                                                               3.6                         2.3

Office furniture                                                                                 1.2                         1.3

 

Total                                                                                                  19.7                      15.2

 

Overall total                                                                                   100.0                    100.0

 

Source:   Annual Report.

Exhibit 7.1 Olivetti Group revenue breakdown, by market sector in 1986 and 1985 (%)

 

Founded in 1908 and headquartered in the foothills of the Italian Alps, just over the border from Switzerland, Olivetti was known for many years as the family-owned company that turned out elegantly designed typewriters.  By the mid-1960s, Olivetti was the sixth largest industrial organization in Italy. and 80% of its revenues from the sale of manual and electronic typewriters, calculators, accounting machines and office furiture were generated outside Italy.

In the following decade, as a result of its ambitieus growth strategy, Olivetti became seriously undercapitalized, and it appeared that the company would either go bankrupt or fall into the hands of the ltalian government. In April 1978, a dynamic leader from outside the family was brought in to turn the company around. Carlo de Benedetti, an ltalian industrialist who had previously spent several months as Managing Director of Fiat, took over as Vice Chairman and CEO. De Benedetti invested over $17 million of his own personal fortune in the company (and thereby became the majority shareholder) and launched a programma to revitalize Olivetti.

 

 

Today, the ink-jet technology used in bubble jet printers is a much better approach, offering a standard of reproduction that was once thought impossible to achieve. Many companies. including Olivetti, are working to further develop ink-jet technology. Currently, Canon is using a lightweight printer head to spray ink through nozzles that are one-third the diameter of a human hair. The biggest obstacle to increasing the speed is finding a way to dry the ink fast enough. There is no solution yet, but many are working on it.

Olivetti’s ink-jet area is expected to develop into a growing. and profitable business over the next decade. Until this point, we've developed a technology that is similar to Hewlett-Packard's technology to make bubble jet printers, and we've gained a strong position with our dry-ink-jet non-impact printing calculator. Our bubble jet printers are very sophisticated electromechanical printers, and we're now the largest producer of printers in Europe. For Olivetti. this was a natural transition from the typewriter. We have a research lab of 70 people in lvrea working on ink-jet physics, chemistry and application, and in addition, we have about 60 people in an R&D -group in Yverdon, Switzerland, looking at how this technology could be implemented in new products.

 

 

Olivetti had put together a group of close to 70 engineers in Agliè who were involved in designing low-end copiers. These machines were fully developed by this group, and there was ongoing R&D concentrated on photoconductors and toners.

By late 1986, the Agliè operation was turning out about 20,000 units annually. However. several assembly line problems were occurring, and the source of these difficulties could often be traced back to external parts. The high reject rate was resulting in additional costs for Olivetti as well as its suppliers. Mr. Demonte remarked:

 

 

We're losing money in the copier business. But, closing up the operation entirely would certainly lead to additional expenses. We have a large infrastructure built up to support this business. We have a strong market position in ltaly, and we can't just pull out of that. There would also be a question of what to do with the dealer channel and after-sales service organization. it isn't part of Olivetti's culture to just switch off something like this. Moreover, there are strong employment laws in ltaly.

We have tried several times to enter a partnership in the copier business. Sometimes. the companies we contacted wanted to buy our operation outright. At one point. we approached one of our Japanese OEM suppliers, but they didn't want to be in a joint venture with an industrial operation. We always asked for R&D, management and production to be put into such a venture, and the Japanese counter-proposal was always to have the management and R&D in the venture and then subcontract out the production to a Japanese company. They were concerned about the quality of the end product as well as the level of production know-how.

 

 

Exploring the possibilities for cooperation

 

In late 1986, Elserino Piol. Executive Vice President, Strategies and Developrnent, travelled to Tokyo and approached senior Canon management with the idea of cooperating in some way in the copier business. Mr. Piol was intrigued by Canon's replaceable cartridge technology, which was introduced in 1982 in the world's first personal copiers, and he believed that -great potential benefits for both parties could be derived if the two companies could work together. Mr Piol elaborated:

 

 

I strongly felt that we could mutually benefit from this kind of cooperation. In our initial meetings. 1 found Canon's top management to be quite open and willing to talk about cooperating with a foreign company. Before going to Tokyo, 1 had also initiated discussions with another large copier manufacturer that was not Japanese but had a large European presence. Olivetti needs a partner to share R&D with, one whom we could acquire technology from and would give us access to an additional market in the copier business.

Olivetti was one of the first firms with a strategy to acquire technology not strictly by inhouse development but also through joint ventures, alliances, venture capital companies, and so on. At present, we have close to 200 joint ventures in operation (Exhibit 7.4).

Olivetti has a lot of experience with this kind of arrangement.

 

 

In 1986, Canon was the dominant player in Europe, placing an average of 17,000 units each month, which represented a 22.7% share of the European copier market. For years, Canon had used an OEM strategy in Europe, while all other safes were handled by its Amsterdam-based regional headquarters, Canon Europe. This arrangement had enabled Canon to concentrate on cementing its position in the highly competitive domestic market. Over time, the larger of Canon's European sales subsidiaries that were subsequently put in place began to operate more independently. As of late 1986, Canon had only a small position in southern Europe and believed that it would be expensive and time-consuming. to develop its own distribution channels there.

Filippo Demonte, who as head of the Office Products Group was directly responsible for Olivetti's copier business. remarked:

 

 

For whatever arrangement we might enter into. it is important that Olivetti be the majority shareholder. Any venture has to be 100% under Olivetti management so that we can guarantee to the government and to the company that we would not be selling Italian technology to a foreign company. In these things, it is  important not only to show but also to be. Moreover, succeeding in ltaly is more likely if you are a successful lialian company than if you are a successful foreign company: the same principle that exists everywhere. It is important to the policy makers, the opinion makers, the unions and other national bodies. Having majority ownership would also ensure that we could participate in ltalian ,government, inter-government programmes, and Europe-wide programmes.

 

 

Mr. Piol believed that much could be learned from being in a partnership with a .company like Canon. particularly with regard to production process, supplier relationships and basic copier technology. He explained:

 

 

If we were to put together some kind of joint venture with Canon - and I'm not sure just what that would look like in terms of ownership, structure, and the kind of assets and staff each partner would put into it - there could be some significant benefits on both sides.

This could be an opportunity for Canon to strengthen its presence in Europe, and we could learn about Japanese techniques. The Japanese have more exacting goals for quality and better control over development time. Right now, we're working with an inventory level of 45 days, and in Canon, it's five. We've used value engineering techniques many times in the past to improve this level, but not with the same success as the Japanese. They apply these techniques in a strict and methodical way, with a determination not to stop until .good results - results which may seem impossible to obtain - are achieved. On the other side. it's hard to know how strongly Canon would ask us to adopt the Japanese way.

In the early stages of such a venture, 1 imagine that we would manufacture an Olivetti machine, which would be received by the Canon and Olivetti sales organizations, as well as their dealers in Europe. Over time, we would license the basic technology from Canon Japan and refine it for European needs. Perhaps we would also buy the photographic drums a.nd mirror mechanisms from Canon factories in Japan and/or France, and Canon would presumably make a profit on these safes. One of the essential negotiating issues would be to determine the kind of R&D that would be done in Agliè and its scale, as well as whether we could eventually compete with other Canon design centres

 

 

Mr. Demonte added:

 

 

lt would be interesting to have Canon as a partner because then we would have a parent that is both a shareholder and a customer. When we're speaking with the shareholder, we'11 be talking about profit and loss, net equity, and so on. When we're speaking with the customer, we'11 be talking about the level of logistical and quality improvement. As well, we'11 be trying, to anticipate what the customer wants, which should help us with the product design specifications and in the production level we attain.

On the one hand, we would be an Olivetti company. On the other side, we would become part of Canon's copier machine division and part of the Canon family of copiers. One of the inherent challenges with any venture where two partners are involved is to manage the identity question. There will always be some people on both sides who will have difficulty making the distinction. Big companies are not made in such a way as to understand that they don't own a whole organization.

 

 

Case 7.2 Canon

 

This case was prepared by Research Associate Joyce Miller, under the supervision of Professors George Taucher and Dominique Turpin, as a basis flor class discussion rather than to illustratie either effective or ineffective handling of a business situation.

We are grateful for the assistance of Professor Gene Gregory in the preparation of this case study.

 

 

From its humble beginnings in a small workshop in Tokyo's Roppongi district, Canon Inc. had become, by 1986, one of the world's leading manufacturers of cameras, business machines, and precision optical equipment. In the following year, Canon would celebrate its 50th anniversary, and President Ryuzaburo Kaku planned to use the occasion both to review the company's past achievements and carefully plan for the future. Mr Kaku's aim was to make Canon into a premier global corporation

 

 

Well before the yen entered the steepest arc of its upward curve, Canon had seen the necessity of moving- manufacturing into its markets, of putting production close to the place of consumption. The new phase of 'internationalization' was initially prompted by the trade imbalance (and trade friction) between Japan and the chief countries where Canon sells ... Canon has advanced quite briskly towards becoming truly global - and the intention is to take the .global process further by establishing R&D centres in its markets as its national companies develop into free-standing. businesses within the global corporation.

The imperatives of global rationalization - especially in copier operations - require Canon ownership and finely - tuned management of R&D, production and marketing. As with all strategic alliances. the fine line between compelling. necessity and expediency is not always readily apparent.

 

 

In the mid-1980s, Olivetti, a long-time player in the office equipment market with particular strength in Italy, was looking for a way to bolster its presence in the European copier market. Canon, at the time, was eager to expand its market share in Italy and to strengthen its European manufacturing base.

 

 

 

 

 

CANON INC.

 

A young company by Japanese standards, Canon traced its history back to November 1933 when a small group of camera enthousiasts led by Mr. Goro Yoshida founded Precision Optical Research Instruments Laboratory in Roppongi, then a suburb of Tokyo, to conduct research into quality compact cameras. Two years later, the Hansa Canon, Japan's first 35 mm focal plane shutter camera, remarkably resembling the German-made Leica, was introduced in Tokyo. In 1937, Precision Optical Industry Co., Ltd was established to manufacture the Hansa Canon, with Mr. Saburo Uchida as its first Executive Managing Director. When Mr. Uchida was drafted for service in the army in the late 1930s, Dr. Takeshi Mitarai, a practising physician who had invested in the new company and become its auditor, took over the company's management and became its president in September 1942.

During the war, Precision Optical was forced to abandon 35 mm camera production to become a supplier to the Japanese military. In this capacity, the company developed an indirect X-ray camera for mass-screening to detect tuberculosis infection. In 1944, the company diversified into binocular production with the acquisition of Yamato Kogaku Seisakusho. After rapid reconversion to camera production, with the war's end, the company changed its name to Canon Camera Co., Ltd in 1947. Over the next two decades, the company -grew into the world's leading camera manufacturer.

Canon's international operations began modestly in 1951 with the appointment of Hong Kong,-based Jardine Matheson as its sole worldwide agent. Responding to the growing US market for quality cameras. Canon established its first overseas branch office in New York in 1955, and two years later formed Canon Europa in Geneva as an exclusive distributor in Europe.

 

 

Vertical integration and product diversification

 

Early in the decade, Canon began the dual processes of vertical integration and product diversification that accounted for much of its strength in the domestic and world markets. Subsidiaries were established to produce micromotors and metal parts, and a supplier of precision components was acquired. Then, in 1956, the first major expansion of the product line was made with the addition of personal cine-cameras.

An overly ambitious diversification strategy led to Canon's first and, thus far, only major product failure. Introduced in 1958, the Synchroreader, designed to record voice messasges on paper for educational use, proved to be technologically far ahead of its time- Within a year of its introduction, the product had to be withdrawn from the market, leaving the company with a division staffed with electronic engineers who could not be dismissed simply because management had made a serious strategic error in product planning and marketing

Determined to transform adversity into advantage, Canon harnessed the skills of these people to make a major move into business machines with microfilm equipment for banking use in 1959 and, in a major new departure, with the development of the Canola 130 electronic calculator introduced to the market in May 1964.  Success in the calculator market set the stage for venturing, into the copier market in 1968, with a 'New Process' plain paper system that challenged and eventually broke the tight hold of Xerox.

 

 

Competitive pressures intensify

 

In 1974, Canon found itself in serious trouble, Malfunctioning calculators, with faulty light emitting diode displays, had to be recalled in large numbers, a mishap that could not have come at a more inauspicious time. Ferocious competition, led by Casio and Sharp, had driven prices to the ground, forcing many calculator makers to withdraw from the market. Those that remained were operating at the margin, with little or no profit. At the same time, the growth of camera sales slowed as markets became increasingly saturated. Exports of camera and other products decreased under the pressure of a higher yen, and production costs were rising as a result of higher petroleum prices. In the first half of 1975, Canon was forced to suspend dividends for the first time in its history, an experience still regarded in the company with some horror almost 20 years later. The combination of forces battering the company exposed the company's structural and managerial weaknesses. Ryuzaburo Kaku, then in charge of Finance, recalled:

 

 

Canon's technical strength - demonstrated in a stream of pioneering that began with Japan's first 35 mm precision cameras - had not been backed by a coherent management strategy. Marketing was weak. Competitors were copying (our) products before (we) could fully exploit (our) sales potential. Canon was like a ship chat constantly changed cours and got nowhere ... Components were being manufactured in too many scattered locations ... As in many old Japanese companies, our people were so afraid of making mistakes that they did nothing. We've had to teach them not to fear being creative - or even failing.

 

 

Introduction of the premier company plan

 

Mr. Takeo Maeda, the new president who had assumed office just before the gale of misfortune swept over the company, responded with a 6-year premier company plan. Launched in 1976, the plan called for a restructuring, and internationalization of the company, and the introduction of new efficient production systems, to avoid the pressures of yen appreciation, protectionism and energy shortages in the future. The objectives of the plan were clear and ambitious. Canon was to become a leading corporation in Japan within three years, and a world leader in the subsequent three years. The new plan began by reducing- operations, curtailing costs and undertaking, efforts to strengthen camera, calculator and copier sales. An operating profit rate of 15%, with no debt, became the principal tenets of financial management. Sales were targeted to increase 15% annually - considered to be a reasonable growth rate - with the goal of substantially increasing market share in all product lines. All this was to be achieved through more rapid and higher quality product development, improved production, and total marketing management

A new matrix organization linked the three major product divisions - camera, business machines and optical products - with functional committees for new product development, production and marketing. The Canon Development System (CDS) was established to improve the efficiency of R&D, shortening the time to market for new products. The task of the Canon Production System (CPS) was to resolve quality problems, eliminate waste, and activate employees within the new rationalized organizational structure. The objective of the Canon Marketing System (CMS) was to relate the company's products and services to customer satisfaction in all of Canon's worldwide markets. Pushing responsibility down the line, the three product divisions were to operate as autonomous vertical profit centres.  Division chiefs were appointed and delegated the authority to act fairly independently.

The new plan was only just put into action when Mr. Maeda suddenly passed away. Mr. Kaku, who as Managing Director had been largely responsibie for shaping the new direction of the company, was elevated to the presidency and charged with the task of completing the reforms underway.

 

 

The Canon way

 

From the outset, Canon had been endowed with a strong corporate sense of purpose. Self-motivation, self-awareness and self-management were the three pillars on which the company had been created. Mr Kaku continued to give these philosophical principles primary importance, adapting and embellishing the company purposes for the task ahead (Exhibit 7.5). In his words:

 

 

(When 1 took over, Canon was) 'sluggish' and 'full of bureaucratic attitudes' which drained the organisation of its ability to respond to changes in the operating environment . . (My basic philosophy was) to build a company which further upholds human rights and dignity, while striving to develop better technology and products through innovation.

 

 

 

 

 

 

 

 

 

 

Our corporatie philosophy

- To be a global corporation providing

kyosei 'living and working together for the common good'

in all counnies where we operate

 

 

 

Our mission

 

Our objectives

 

Our business

development

goals

 

 

Our values

 

 

§        To make a positive contribution through continued growth and reinvestment in the world's communities

 

§        To be a responsible global citizen

§        To have unique and quality products

§        To build an ideal company for continuing prosperity

 

§        To combine our traditional hardware strength with software systems development

§        To create information systems and networks which integrate hardware, .software and services

§        To operate on a global scale

 

§        Respect cultural differences

§        Encourage self-motivation, self-awareness and self-management

§        Respect dignity, value initiative and recognize merit

§        Work together in harmony

§        Sustain our physical and emotional health

 

 

Exhibit 7.5 The Canon way

 

 

A decade later, in 1985, Canon was weil on the way to becoming a premier company by world standards. Significant increases in investment and R&D had resulted in a spate of new products, many of them 'firsts' in the marketplace. Canon's product line ranged from 35 mm and video cameras to copiers, electronic typewriters, laser printers, facsimile machines, and microlithographic equipment for producing semiconductors and medical equipment. At this time, Canon's manufacturing and marketing organisation spanned over 100 countries and employed 34, 100 people (Exhibit 7.6). In 1985, profits rose to ¥ 37 billion on net sales of close to ¥ 956 biliion. Business machines accounted for 71% of sales, with cameras and optical equipment generating 21% and 8%, respectively.

 

 

The response to endaka

 

But new problems were on the horizon. Unlike most other Japanese companies, Canon relied heavily on overseas markets for the bulk of its business, with North America and Europe each accounting for 30% of sales. Although the process of globalizing manufacturing was well underway, a high percentage of overseas safes were still ,generated by exports, making Canon particularly vulnerable to endaka (yen appreciation), which followed the Plaza Accords in 1986.[2]

Canon's response to the rising yen was guided by past experience. R&D expenditures were increased, cost reduction efforts were broadened and intensified, and capital outlays for overseas production facilities were boosted. After posting record profits for the previous ten years, Canon's income dropped 70% in 1986 to ¥ IO.7 million, threatening a cut in dividends. Shinji Tatewaki, who had just returned to Canon's copier division in Tokyo after heading up the company's Chicaco sales office for several years, recalled:

 

 

 

 

The US government devalued the dollar and, within the space of virtually a day, the yen was worth significantly more against other currencies. In 1984, the yen was strong at ¥ 251 to one US dollar. Then the level dropped down to ¥ 150. Production costs increased dramatically. and there was no way that we could recover the loss. We had to reconstruct our entire operations. We launched a large-scale cost reduction activity and a campaign to avoid waste. In Canon Tokyo. people soon began pinning '¥ 150 badges' on their shirts. We were all focused on what we had to do to live in a ¥ 150 world.

 

 

Because of the strength of the yen, Canon products made in Japan had become more expensive overseas. Further expansion of overseas production was essential. In addition, as a Forbes reporter commented:[3]

 

 

Canon's strongest defence against a rising yen is innovation. With innovative products, price is less important than in commodity-type products . . . .

This means heavy spending on research and development, of course. Canon's R&D amounts to some 11% of parent company sales, one of the highest ratios among Japanese companies outside the chemical and pharmaceutical industries.

 

 

Given the increasing trade friction in the US and European markets, Canon had further cause to reposition itself to rnaintain future growth. Three-quarters of the company's safes were in office equipment, including both standalone machines, such as copiers, and the systems that would combine them in the 'office-of-the-future'. It was in this sector that globalization became increasingly imperative.

 



 

 


Exhibit 7.6 The Canon organization (Source: Canon Handbook)


Canon’s copier business

 

Canon first entered the copier market in 1965 with a coated paper copier, based on technology licensed from RCA. Realizing the limitations of this technology, Canon formed a team of engineers led by Dr. Keizo Yamaji, to develop a copier drum with an insulating layer that would be suitable for plain paper copying using, a more photosensitive chemical than the one then used in xerography. This new design prolonged the drum's life and reduced the risk of discharging toxic chemicals. Introduced in Japan in April 1968, Canon's 'New Process' (NP) plain paper copying system was completely free of Xerox patents. Hiroshi Tanaka, who was part of this effort, commented:[4]

 

 

Engineers working on the plain paper copying project thoroughly investigated the patents held by Xerox. In the process, we learned how not to violate patents and how to obtain patents to protect our own technology. The NP technology was completely free of Xerox's airtight patent network.

 

 

In 1972, the company launched a second generation 'liquid dry' NP system, which used plain paper and liquid toner and turned out dry copies. This new technology reduced machine breakdowns by eliminating the complex heat-fusing mechanism and simplifying the developing and cleaning process. These machines had lower production costs, were more compact and more reliable than anything available at the time, and they matched Xerox on copy quality. Canon subsequently licensed out this technology to 20 manufacturers in Japan and three in the United States.

NP copiers were manufactured at the Toride factory on the outskirts of Tokyo, which had been set up a decade earlier to make synchroreaders and later, cameras. Toride used a flexible manufacturing system that could accommodate differences in models and electrical specifications. The four assembly lines could handle any NP model after a 2-day changeover. Each line had the capacity to turn out between 3000 and 8000 units monthly. About 2000 parts were required to produce the range of NP copiers.

Initially, copiers were sold in Japan through Canon Business Machines Sales Inc., set up in 1968 to market calculators. In 1971, this subsidiary was merged with Canon Camera Sales Inc. to form Canon Sales Inc., whose shares were listed on the first Tokyo Stock Exchange a decade later. Beginning with 200 people dedicated to the sale and service of copiers, the new company sold Canon copiers outright and offered customers a Total Guarantee System.

In the early 1970s, Canon established a dealer network throughout Japan. Dealers received extensive training and, within a few years, had completely taken over the task of servicing copiers. Canon did not be-in selling its NP systems in the US until 1972, when a dealer safes network was established. However, these copiers were being distributed in Europe through Canon's marketing unit in the Netherlands as early as 1972, although sales were modest.

 

 

The personal copier breakthrough

 

Canon's copier strategy was formed largely by its camera strategy. 'A camera for everyone' was translated into 'a copier for everyone'.

Canon's copier line initially was aimed at small and medium-scale users, a market that had been largely ignored by Xerox, the Xerox strategy focused on large users in ,government, business, and universities. Following Canon's strategy, Dr. Keizo Yamaji, who had become the General.Manager of Canon's Reprographic Products Division, wanted to open up an entirely new market for the PPC. Dr. Yamaji had market data showing that there were over 4 million offices in Japan with fewer than five employees that were not being addressed by the conventional copier business. The lowest-priced unit available was ¥ 500,000, about US$2300, which was too expensive for a small business. As well. professional service engineers needed to come in regularly to maintain these machines. Again, this cost limited their use to larger offices. The 'dream' was to come up with a compact, maintenance-free copier that would cost about $1000 and could be sold to small offices, for home use, or as a personal desk-side copier. This idea was totally different from the Xerox system which, until this point, had dominated the world copier business.

Introduced in late 1982, Canon's personal copier (PC) represented a revolution in reprographic technology. The PC used a replaceable cartridge that eliminated the need to maintain the machine regularly. After making 2000 copies, the user simply replaced the cartridge, which contained a photoreceptive drum, toner assembly, cleaner, and charging device. Cartridges were available with four toner colours.

In time, copier manufacturers around the world began purchasing Canon's personal copier on an OEM basis. For example, Olivetti began importing Canon personal copiers in late 1984. Increasingly, large firms operating internationally were completing their product line by buying certain models from other producers.

 

 

Canon in Europe

 

In 1957, Canon Europa was established in Geneva as Canon's sole distributor for Europe. Over the following decade, a network of national distributors was developed to market, distribute and service Canon cameras and calculators. To better manage the increasing volume of European business, especially in EC countries, the European headquarters functions were transferred to Canon Amsterdam NV in 1968, leaving Canon Geneva as a finance company.

With the introduction of the Premier Company Plan and the Canon Marketing System, the first task was to reorganize the complex system of multiple national distributorships that had evolved over the first two post-war decades. Given the rapid diversification of product lines and the increasing importance of global rationalization of marketing, total control over the marketing and distribution system became imperative to respond to customer needs. In 1975, Canon gradually began the process of replacing distributors with integrated Canon marketing subsidiaries in each country. Over the following years, Canon Europa NV was established, with 19 subsidiaries, including, Canon Amsterdam NV, to manage the intricate European organization. A senior manager in Canon described the process:

 

 

In some countries, we had to start from scratch; in others, we already had relationships with distributors. In France, for instance, Canon's camera importer wanted to get into the copier business, and we quickly had the 200 people in this organization selling copiers directly. In the UK, we were using Marubeni. a sogo shosha or general trading house, which then sold products through several companies. This arrangement lasted only 2-3 years. Then we had to build something up ourselves. We put cameras and copiers together and distributed through dealers. In Germany, Canon's camera distributor was not so interested in selling copiers. Eventually, we were able to put together an arrangement, but it was strictly a sales and marketing venture. In Italy, our camera importer was also not interested in copiers. Cash recovery was a real problem. For- copiers, Canon couldn't expect to get payment for up to ten months after the sale. In the camera business, payment was available within 30 days. We had a good business in Italy with calculators, but it was clear that we needed more sophisticated salespeople to market copiers.

In many cases, Canon ended up buying out the distributors because of their limited financial strength and cashflow problems. This put a major strain on Canon's own financial resources.

 

 

Over the next several years, Canon's marketing capabilities in Europe grew substantially. Over time, the various national subsidiaries that were established began to operate more independently and purchase products directly from Japan.

 

 

Canon begins producing copiers in Europe

 

In 1972, Canon acquired the assets of ECE GmbH, a small German R&D house specializing in advanced electrostatic technology in Giessen. near Frankfurt. ECE had contributed significantly to perfecting Canon's 'liquid dry' copy technology, and Canon had been helping the firm financially since 1969. By mid- 1973, the ECE facility had been converted into a factory with the capacity to turn out 1500 low-volume PPCs monthly, to be sold throughout Europe as well as in some Middle East and African markets.[5]

ECE's original management team had remained in place after the acquisition, and Canon Giessen was staffed almost entirely by Germans. Tsukasa Kuge, one of the few dispatched from Tokyo, arrived in 1973 and remained in the operation until 1975, returning for an additional five months in 1977. Mr Kuge recalled:

 

 

By acquiring a well-organized high technology company with considerable experience and know-how in copier development, we were able to start up a new production unit rather quickly. Much of the time usually spent on the details of technology and transferring know-how was saved, which reduced the drain on managerial and technical resources in Tokyo. After a time, R&D activity was to be dedicated entirely to the development of Canon's product, and the R&D activities both in Giessen and in Tokyo had to be performed in conformity with each other, so 1 was sent over.

In the early 1970s, the copier market was not so segmented as it is today. We began making what we felt would sell the best, and we planned to move up in quality.  In the beginning. more than 30 people were doing research and development. and they were creating many ideas that were also implemented back in Japan. Over time, Giessen's R&D capability was made smaller as production became more important.

 

 

After two years in operation. a team of 130 people were manufacturing, 500 NP machines (20cpm) each month under a rigid quality control programme. Production was slated to increase at a level of 20-35% annually. Giessen's assembly process was similar to Toride's, but on a much smaller scale. In 1975, the production capacity at Giessen was doubled, and new lines were added to produce copier drums and toner.

 

 

A second plant in Europe

 

In August 1983, Canon responded to an invitation from the French government to establish a personal copier factory in Liffre, in Bretagne. At this time, Canon was also looking at the feasibility of establishing a PPC assembly plant in Virginia, USA.

By the end of 1984, the Liffre plant was turning out about 3000 copiers per month, and lines were subsequently added to produce electronic typewriters and facsimile transceivers.

 

 

Canon's European presence in 1986

 

By 1986, Canon had become a leading player in the European market. placing more than 200,000 units out of an estimated total market of 897,780 (Exhibit 7.7). Canon's aim was to become the world's leading PPC manufacturer. To achieve this, the company's goal was to obtain at least a 30% unit share in the three major copier markets: Japan. Europe, and the US (Exhibits 7.8 and 7.9).

Canon offered the full range of copiers, from its innovative personal copier to its NP-8000 series (up to 70cpm), which competed head-on with Xerox and Kodak machines. In the near future, Canon planned to introduce a digital colour copier that many believed would not only transform the office environment, but also revolutionize the whole industry. It was rumoured that newly-emerging domestic competitors were also developing colour copiers based on a different product concept.

Currently, Canon had sales subsidiaries in virtually every European country (Exhibit 7.10). as well as independent business machine distributors that dealt with a network of retailers. Many of Canon's European distributors sold only Canon products. In the camera business, they relied mostly on the retailers. In calculators. they used another channel. Business equipment needed more support, and it was becoming apparent that more sales channels were needed to sell copiers. At the same time, Canon's machines were becoming more expensive because of the 15.8% duty that the European Commission had placed on most copiers imported from Japan. This temporary rate was set in 1986, but there was an expectation that the rate would be officially set at 20% in 1987.

 

 

                                                   1984                           1985                        1986 (estimated)

 

 

Personal copiers                   82,640                      111,350                              110,050

                                                 81.7%                         65.2%                                 52.8%

Category 1                             57,880                        49,110                                 54,020

(up to 19cprn)                        15.0%                         12.6%                                 13.0%

Category 2                             30,370                        28,500                                 31,410

(20- 39 cpm)                          16.2%                         15.2%                                 14.6%

Category 3                             11,960                        10,050                                 7,330

(40-59 cpm)                           19,9%                         19.1%                                 14,2%

Category 4                                       0                                  0                                 640

(60-89 cpm)                                    0                                  0                                 8.5%

 

Total                                     182,850                      199,010                              203,450

                                                 24.7%                         24.7%                                 22.7%

 

 

 

Source: InfoSource S.A.

Exhibit 7.7 Canon brand.- sales quantity and market share in Europe, 1984-1986

 

 

Company                                                       1985                                                      1986

 

 

Canon                                                                125                                                        138

Fuji Xerox                                                          97                                                        111

Konishiroku                                                       35                                                           35

Matsushita                                                            6                                                             7

Minolta                                                               34                                                           33

Mita                                                                     28                                                           27

Ricoh                                                                168                                                        162

Sharp                                                                   33                                                           38

Toshiba                                                               32                                                           36

Total                                                                  558                                                        587

 

 

Source.- Dataquest lncorporated

Exhibit 7.8 Estimated PPC placements in Japan, 1985 and 1986, by brand (thousands of units)

 

 

In the low end. Canon was also finding that its copiers were not competitive enough. Other Japanese copier manufacturers were very price conscious. Moreover, customers were getting more sophisticated. In the past, they would accept lower copy quality but, increasingly, they wanted superior reproduction, easy-to-use machines with low maintenance requirements, and customers were becoming more concerned about environmental factors. Canon needed to get a new product in this category, or new technology.

Canon's Giessen facility was one of the largest and most integrated copier plants in Europe, employing 400 people and turning. out 4000 PPCs each month. Giessen manufactured NP systems in Category 2 and Category 3, together with components like photosensitive drums, the heart of the plain paper copier. About 80 suppliers were contracted locally to provide services and parts, including moulded casings, lids, platen glass, print boards, paper supply cassettes, fixing, rollers, solenoids, DC controllers, halogen lamps and low voltage electric sources. Likewise, Canon's Bretagne operation employed about 430 people and used numerous local suppliers. Little R&D was being carried out in either of these operations, aside from modifying designs sent from Tokyo to meet local manufacturing, and local market needs. In principle, the R&D laboratories

 

 

 

                     Personal   Segment     Segment         Segment         Segment         Segment         Segment           Total

                     Copiers             1                2                     3                      4                     5                      6

 

 

 

Adler-Royal             -       16.0               3.2                      -                  0.4                      -                      -                19.6

Canon               165.0       52.0             48.3                  4,5                13.5                  1.4                      -              284.7

A.B. Dick                 -         3.2               0.8                  0.5                  0.3                      -                      -                  4.8

Gestetner                 -         8.1               2.5                  0.3                  0.4                      -                      -                11.3

Harris/3M                -       34.0             13.5                  5.2                      -                      -                      -                52.7

Kodak                       -             -                   -                      -                  7.2                  3.4                  1.1                11.7

Konica                      -       26.0               7.5                  4.1                  5.0                  0.1                      -                42.7

Minolta                 5.4       26.1             25.8                  1.7                  0.6                      -                      -                59.6

Mita                          -       53.0             20.4                      -                  4.5                      -                      -                77.9

Monroe                    -       11,5               5,0                      -                  0.6                      -                      -                17.1

Océ                           -             -                   -                      -                  2.6                      -                      -                  2.6

Panasonic                 -       19.2             12.1                  0.8                      -                      -                      -                32.1

Pitney Bowes           -         7.4             11.8                  1.8                  2.0                      -                      -                23.0

Ricoh                    5.0       26.0               7.9                  3.6                  4.0                      -                      -                46.5

Sanyo                    3.6         3.1                   -                  0.9                      -                      -                      -                  7.6

Savin                         -       17.6               2.0                12.1                  5.4                      -                      -                37.1

Sharp                  28.0       64.3               8.2                10.1                10.0                      -                      -              120.6

Toshiba                     -       43.2             13.0                  7.0                      -                      -                      -                63.2

Xerox                       -       59.0             18.0                26.3                  9.1                  0.8                  9.2              122.4

Others                   7.0         4.9               2.4                      -                      -                  0.6                      -                14.9

 

TOTALS           214.0     474.6           202.4                78.9                68.4                12.5                10.3          1,061.1

 

 

 

This segmentation is based on the following criteria:

 

 

 

Segment                                             Speed                                                 Typical monthly

                                                   (copies per minute)                                      volume range

 

 

PC                                                   under 20                                                                N/A

1                                                             0-20                                                       0-10,000

2                                                          21-30                                               5,000-20,000

3                                                          31-45                                               5,000-30,000

4                                                          40-75                                             10,000-75,000

5                                                          70-90                                           25,000-125,000

6                                                             91 +                                                     100,000 +

 

 

Source:   Dataquest Incorporated.

Exhibit 7.9 Estimated PPC placements in the US, 1986, by brand (thousands of units)

 

 

that Canon set up abroad were linked with R&D in Japan and part of the -global rationalization of Canon's R&D effort. These laboratories were intended to serve Canon's -global operations, not local production. Currently the General Manager of Canon's 145-person Peripheral Development division in Tokyo, Tsukasa Kuge had also

 

 

Countrv                             Canon subsidiaries                 Canon affiliated companies

 

 

France*                                                      1,868                                                                 -

UK                                                              1,071                                                                 -

Germany                                                        812                                                                 -

Spain                                                                   -                                                           387

The Netherlands                                            384                                                                 -

Finland                                                           380                                                                 -

Sweden                                                          332                                                                 -

Austria                                                           193                                                                 -

Italy                                                                187                                                                 -

Belgium                                                         105                                                                 -

Switzerland                                                      51                                                                 -

Luxembourg                                                      9                                                                 -

 

Total                                                           5,392                                                           387

 

                                                   Combined-total                                                        5,779

 

 

* Canon Bretagne is included in the French subsidiaries

Source:      Canon Handbook.

Exhibit 7.10 Canon’s European distribution capabilities (number of employees as of December 1985)

 

 

been directly involved and was familiar with Canon's European operations. Kuge remarked:

 

 

The idea was for Giessen to concentrate on mid-rangre copiers. We had personal copiers being. produced in Liffre and Categories 1, 4 and 5 in Toride. We had significantly fewer people working, in R&D in Giessen than in the beginning. Over time. production became much more important, and it was more effective to do the R&D in Japan.

In developing products, Canon follows a policy of mochi wa mochi-ya. The idea is to have the proper development in the proper place. Mochi is the sticky rice cake that is traditionally cooked for New Year's celebrations. The raw material is popular and the cooking process is simple. Anyone can make rice cakes, but mochi-making is a hard and time-consuming, task. The job of making rice cakes should belong to the lost skilful rice cake maker; namely, mochi-ya.

Ultimately, Canon needs to have a greater R&D capability in Europe if we are to become an insider. We could develop this capability with some incremental investment based on Giessen's original potential or, alternatively, we could set up a new greenfield site in Germany or Switzerland. for instance. As well, we need to further investigate options for locally-produced parts.

 

 

Mr. Kaku commented:

 

 

When we first began production in Europe….there were no compelling economic reasons to transfer this original technology. But it is our established policy, in keeping with our basic corporate purposes, to participate to the fullest in the development of the societies which we serve through our products.

 

 

A possibility for co-operation

 

In late 1986, Elserino Piol, Executive Vice President Strategies and Development in the Olivetti Group, travelled to Tokyo to speak with senior Canon management about joining, forces in the copier business. Olivetti had a firm hold on 85% of the copier market in ltaly - which represented about 5% of the total market in Europe - and Olivetti was looking, for a way to double its share. Canon had had some difficulty serving the southern part of Europe. and it was possible to conclude that combining, the sales effort with Olivetti could expand the total safes for both companies. However, it was also possible that such an alliance would lead to conflicts between the two salesforces.

Currently, Canon had the highest installed base in all of Europe. However, the market was still relatively undeveloped. There was a huge potential for growth with the coming, developments in digital technology and colour copying and the further integration of the copier into the office environment. At this point, the question for Canon was whether it made sense to enter a venture with a company that was ostensibly a competitor in the copier business. Olivetti was a leading player in the office products market with a long history in the business, and this was an area that Canon wanted to enter more strongly in the future. For both partners, such a venture would be a way to learn the way of thinking, history, technology and philosophy of the other.

In the past, Japanese manufacturers had tended to manufacture products in Japan and then export them to Europe and North America. Early on, Canon realized that this tendency could not continue. Canon's philosophy was to produce products in the market where they were used. In fact, Canon was the first Japanese company to set up a factory for copiers in Europe, which was done to have some insurance for the future. Over the years, Canon had set up many ventures, but they had always been built up from ground zero. The transfer of technology was much easier this way, and it was more secure. This would be the only major joint venture for Canon in copiers that involved manufacturing and R&D, and it would be only the second joint venture that Canon had entered into outside Japan. The first one, Lotte Canon, was established in 1985.

Canon's technology was ahead of Olivetti's, so its patents, know-how and projects would probably be put into the joint venture. Canon had just started production of a new Category 1 copier in Toride. In looking at Olivetti's R&D and manufacturing, capability and its sales channels, there was also the possibility of transferring this production into such an operation. The question of Olivetti's relationships with other OEM suppliers would still have to be resolved. Furthermore, Olivetti's suppliers were quite different from Canon's standards on quality, and significant improvements would probably have to be sought on the product cost side. More than 20 years earlier, Canon had launched programmes to study the potential of its suppliers. Although studies could be expensive, the result often saved time and costs in terms of quality assurance. As well, Canon came to understand the level of quality support it needed to provide to its suppliers. As a result, Canon's suppliers had become involved in developing Canon machines, and they operated on a just-in-time basis. This collaboration was natural end ongoing. Moreover. through this arrangement, Canon had gathered a lot of cost data and continually looked for ways to improve. Typically, Canon's inventory level in Japan was less than five days. In Giessen it was seven days, although work was ongoing to bring this level down further.

In Tokyo, Canon had a very different system from the one used by Olivetti and most other European and North American manufacturers. Canon used a mass production system, and the underlying driver was how to improve production volume within a certain time frame. This was based on minutes and seconds, and the idea was to look continually for ways to shorten the work cycle. Canon used conveyor belts, and most people in the copier area worked on a 20-30 second cycle, In contrast, in Olivetti, one person typically worked 25-30 minutes at a station and assembled a lot of parts. The whole unit was manually pushed on a cart to the next station, and there was usually some waiting time for the next step.

There were also differences in the development system. Traditionally, Canon's R&D people concentrated on perfecting the design. There were no major modifications once the drawing was completed and moved into production. Canon looked continually for ways to improve the quality in each step, to make cost reductions, and to develop products faster. In Canon, the objective was for production costs to be reduced every year, which could be achieved by changing the design to use cheaper parts, negotiating with suppliers for price discounts, changing the production process in the factory to work more effectively, and so on. This was the kind of thinking that Canon would need to transfer into a joint venture.

Canon had -never entered into this kind of alliance before. The challenge for both parties would be how to adapt and how to implement changes. The key would be how to structure such a venture, how to leave the good parts of each partner's culture and build on a common basis. Canon had always had a philosophy of coexistence.

 


University of Limburg, Faculty of Economics and Business Administration

 

 

OverAll-Test 1

Blocks 1.1 & 1.2

Friday, January 26, 1996

Testbook

 

study International Business

 

 

Contents test book

 

This exam consists of:

·        7 pages (not including answer sheets)

·        17 closed (true/?/false) questions, numbered C1 to C17

·        8 open-ended questions, numbered O1 to O8.

·        1 computerized answer form for closed questions

·        8 answer sheets for open-ended questions.

 

Articles

·        Case study Nestlé S.A.

·        Case study Procter and Gamble Europe.

·        Paul J.H. Schoemaker, AScenario Planning: a Tool for Strategic Thinking@, Sloan Management Review, Winter 1995, pp. 25-39.

·        Case study Olivetti

·        Case study Canon

 

Time distribution and points awardedTime distribution and points awarded

We advise you to carefully allocate your time (180 minutes), taking into account the weight of each question.  Closed (true/?/false) questions are worth one point each.

The weights (points) of the open-ended questions are as follows:  

 

O1       5                      O3       5                      O6       6

O2       5                      O4       15                    O7       6

O5       15                    O8       3

 

Total open-ended questions:                         60 points

Total closed (true/?/false) questions: 30 points

Total OA                                                         90 points

 

Testing procedureTesting procedure

The closed (true/?/false) questions must be marked on the computerized answer form, using an HB pencil.  The open-ended questions must be written out on the enclosed answer sheets.   Please use a black or blue pen.

 

GradingGrading

The mean test results obtained will determine the passing grade of this OAT.  However, any score below 30% (27 points out of 90) implies failure, while any score above 55% (49,5 points out of 90) is a pass.

 

Answer KeyAnswer Key

Shortly after the test the answer key will be posted on the bulletin boards.

 

Objection policyObjection policy

Objections to the nature of the questions or the answer keys must be filed WITHIN 5 WORKING DAYS after the test date.  Well-founded objections to the grading of individual questions must be filed WITHIN 5 WORKING DAYS after the inspection date.

Graded exams will be available for inspection on March 08, 1996, between 9:00 and 12:00.  Each student is allowed to look into his or her graded exam for 15 minutes. However, PRIOR REGISTRATION is required, so individual exams can be pulled. Registration can take place from March 04 till 06 1996, at the Education Desk, room 0008.

The OAT coordinator deals with students' written objections. His or her written response to these objections are kept on file at the Education Office, room 3064. They will be available for review after March 22, 1996, during regular office hours.

 

Form requirements:

The form requirements for filing objections are:

·        All objections must be typed, and submitted in duplicate.

·        A separate form must be used for each objection.

The top left corner of each objection must indicate:

·        student name and address;

·        student ID number;

·        study (and graduate) programme;

·        name of test.

 

OBJECTIONS THAT DO NOT MEET THESE REQUIREMENTS WILL NOT BE PROCESSED.  Objections must be addressed to the OAT coordinator, and can be submitted to the Education Office.

 

Publication of gradesPublication of grades

Grades will be posted on the bulletin boards, after February 23, 1996.

 

 

GOOD LUCK!


AN ALTERNATIVE FOR ASSESSING PROBLEM-SOLVING SKILLS:

THE OVERALL TEST

Mien S. R. Segers

Universiteit Maastricht, School of Economics and Business Administration, Dept. Of Educational Development and Research, Maastricht, The Netherlands

 

 

Abstract

 

Since the mid 80's, many new terms have enriched the assessment literature, such as performance assessment, authentic assessment, direct assessment, curriculum‑embedded assessment and a few more. Criteria for good instruction as well as good assessment practices are suggested, based on research-based models in the field of cognitive psychology and on expert-novice studies. This article first reports on the translation of these criteria into a set of the characteristics of the assessment system of a problem‑based curriculum in the field of Economics and Business Administration. Secondly, the article reports on three studies conducted to evaluate and improve the assessment and instructional practices. The first study concerns the fairness of the assessment instruments. The article presents a methodology to search for the congruence between the formal curriculum, the operational curriculum and the assessment goals. The findings of the first study suggest that within a Problem-Based curriculum it is possible to implement an assessment system which is fair to the students. Additionally, there is empirical evidence that the student outcomes can be used as one source of input for the evaluation of the instructional practice. The second study reports on the validity of the OverAll Test as an instrument to assess problem-solving skills. The analysis of thinking aloud protocols suggest there is some empirical evidence about the validity of the OverAll test as an instrument measuring problem-solving skills in the field of Economics and Business Administration. The third study addresses the question: is it important to map students’ knowledge profile as a remedial tool for problem-solving performances? The answer on this question depends on the extent to which a student’s problem-solving performance is influenced by the quality of his/her knowledge profile. Students’ knowledge profile is measured by a Knowledge Test and a Sorting Task. Students’ problem-solving skills are assessed by the OverAll Test. The results indicate that students with an organized knowledge base perform better in problem‑solving situations than students whose conceptual models are loosely structured. The implications of these findings for instruction as well as for assessment are discussed.

 

Introduction

 

One task that credit administrators, controllers, business managers and economists in various professional contexts have in common is that they are expected to solve complex problems regularly and effectively. For Economics degree programs, an import question is: how does the graduate deal with the problems s/he faces when starting his/her professional career? Do you hope and pray the organization which hired her/him doesn't come tumbling down? Or do you have sufficient evidence that the graduate will be capable of dealing with the informational load that accompanies the problem and that s/he will use it in a coherent and integrated way to reach a solution to the problem?

Three distinct elements can inform you about the expert status of the graduate: the content of the economics courses studied (syllabus content), the teaching methods adopted and the methods and results of the assessment used to determine the success of students in solving economics problems.

During the past decades, Economics degree programs have been subject to change in their content and instructional methods. This process of change has seldom been matched with changes in the assessment methods used to determine students' outcomes (Mallier, Morwood & Old, 1990). This paper aims to contribute to the development of appropriate assessment methods in Economics Education. The rationale of the assessment system implemented is described. It is informed by the findings of cognitive research on the constituent cognitive features that underlie expert problem solving and how experts acquire their expertise. It is based on the cognitive learning theory postulating that all learning involves thinking. The assessment approach that suits this teaching and learning theory emphasizes the use of a set of measurement tools, integrating conceptual understanding and performance skills to solve authentic problems.

The present contribution describes the results of a number of studies trying to find empirical evidence for quality, in terms of the validity of the instruments adopted in the Maastricht School of Economics and Business Administration. These studies attempt to answer to questions such as: do the measurement tools provide a profile of students' conceptual understanding and problem solving skills? Are they fair, this means to what extent can students be expected to meet the goals measured by the test? Knowledge about the extent of overlap between what is tested and what is taught is critical to the interpretation of the test results. Furthermore, this report presents empirical evidence for some of the basic assumptions of the assessment system adopted.

 

The Maastricht Assessment System

 

Rationale

 

The Maastricht economics curriculum is intended to guide students to become academic professionals: graduates who can recognize the problems of different disciplines within the field of economics, who are capable of analysing and contributing to the solutions of these problems. "Problem" is the key word within this goal definition. The Maastricht School of Economics and Business Administration adopted a problem-based educational approach to design its curriculum.This approach is significantly influenced by the findings of cognitive psychological research, especially results from expert vs. novice studies.Two general characteristics of expert performance can be identified (Yekovich, 1993; Feltovich et al., 1993, Glaser , 1990):

C                                Experts’ knowledge is coherent. Experts possess a well-structured network of concepts and principles about the domain that accurately represents key phenomena and their interrelationships. Beginners’ knowledge is not only patchy, consisting of isolated definitions but they also miss the principles that lie beneath apparent surface features of a problem presented. In contrast, experts’ knowledge is structured and experts recognize underlying principles and patterns;

 

C                                Novices often know facts, concepts, principles without knowing the conditions under which that knowledge applies and how it can be used most effectively. “Experts and novices may be equally competent at recalling specific items of information, but the more experienced relate these items to the goals of problem solution and conditions for action”(Glaser, 1990, p. 477). Dochy and Alexander (1995) identify this type of knowledge as conditional knowledge; experts are able to use the relevant elements of knowledge in a flexible way in order to describe, analyse and solve novel problems.

 

For students to become experts, this expert profile requires the development of a learning scheme aiming at analysing, solving and evaluating problems on the basis of a deep understanding of the subject domain studied. This can be illustrated by an example taken from industrial economics (Lawson, 1992). A student sets out to study the economics of running an airline. First, s/he need to acquire access to what is known about airline operations and how economists analyse different market forms, using a range of models of the firm, from the perfectly competitive firms through to monopolies. Secondly, the students needs to appreciate the purposes and limitations of the theories for the firm. S/he need to be able to assemble the facts of the airline operation. Then s/he need to link theories with the assembled facts of airline operation. For example, describing British Airways as a regulated carrier with considerable but still limited market power within the domestic UK market, requires the student to recognise the appositeness of the regulated industry model to the facts of BA’s domestic operations.

This example suggests some principles for the instruction guiding the learning of the student. They can be summarized as follows:

 

C                                 The curriculum should focus on clusters of related concepts. Developing conceptual networks is enhanced when students are actively engaged in the learning process. Students should be encouraged to manipulate and use the knowledge they are acquiring by confronting them with authentic problems. Acquiring knowledge is not the ultimate goal of instruction. A major goal of instruction should be promoting understanding of important conceptual knowledge in such a way that it can be used in analysing and working with realistic problems  (Feltovich, et al, 1993).

C                                 Feltovich et al. (1993) stress that knowledge that will be used in many ways has to be learned, represented, and tried out (in application) in many ways. Therefore, knowledge (including concepts, models, theories) should be interrelated in diverse ways and cases should be addressed in relation to other cases. The use of a variability of cases involving similar concepts and similar cases embodying different concepts, helps students to work with novel problems. Cases and knowledge should be “revisited” from different relevant points of view and for the purpose of answering different kind of questions.

 

The changes which take place when proficiency develops not only define the criteria for instruction by which competence can be developed but also the criteria by which competence can be assessed. Furthermore, instruction and assessment must be linked for at least two reasons. First, students’ outcomes provide information to use in improving educational practice only when the instruments to measure students’ outcomes match the instructional practice (English, 1992). Secondly, tests are diagnostic aids only when they identify the extent to which the goals are attained. This means that tests must be sensitive to how well students are able to use knowledge in an interrelated way to analyse and solve authentic problems.

The instructional principles described lead to the following assessment principles:

 

C                                 Assessment instruments should measure the extent to which students possess knowledge that is organized in a way that facilitates fast and correct recognition of patterns. A significant dimension for assessment of competence is the presence of interrelated concepts. Additionally, the ability to recognize principles and patterns underlying the problem or task presented is an indication of developing competence that should be assessed (Glaser, 1990)

C                                 The assessment system should substantially focus on measuring the extent to which students are capable of flexibly applying their knowledge to analyse and solve novel problems. These problems should be of a real-world type. Research in the field of mathematics suggests that such problems offer opportunities to develop understanding in context, to develop reasoning in the subject domain studied and to develop the making of interdisciplinary connections (Blum & Niss, 1991; de Lange, 1992; Lesh & Lamon, 1992).

 

The Key Features

 

The Maastricht School of Economics and Business Administration implemented problem-based learning as a way to design the curriculum. Students are confronted with authentic problems, this means problems as they would find them in “real-life”. Because authentic problems are often not solvable within mono-disciplinary constraints, the curriculum is organised on a multidisciplinary basis. This implies that problems are discussed from distinct points of view (disciplines) such as Marketing, Organisation as well as Micro-economics. The problems are the context within which students study the basic concepts and models within the field of Economics and Business Administration. Students acquire and apply knowledge simultaneously.

The assessment system developed follows the core idea of the organisation of the curriculum. The acquisition as well as the application of knowledge is assessed. Therefore, two instruments are implemented: the Knowledge Test and the OverAll Test.

 

The KnowledgeTest.

 

The Knowledge Test measures primarily the knowledge of facts, the meaning of symbols and the concepts and principles of the four particular fields of study: Marketing and Organization, Micro-economics, Macro-economics and Accounting and Finance. This type of knowledge is often defined as declarative knowledge (Anderson, 1983; Dochy & Alexander, 1995). The test items require students to reproduce and/or demonstrate understanding of their knowledge about the main subjects studied.  It is not sufficient for students to remember or even understand isolated definitions of domain-related concepts. They need to understand the frame of reference which organises the distinct subjects.

The Knowledge Test covers the domain studied within one instructional period1. It consists of 100 to 150 multiple-choice true/?/false format items. To assure relatively even coverage of the domain, an analytic grid is used for the construction of the test.

Figure 1 contains some examples of Knowledge Test items.

 

Question 1

 

A very important question is which management principles a manager should use to achieve organizational excellence. During this century several different viewpoints have emerged.

 

true/?/false                     According to the contingency viewpoint, managers should analyse and understand situational differences and choose the best solution suited to the firm and the individual in each situation.

(True)

 

Question 2

 

The ice cream-company “Magnus” was only producer of icecream. Today, “Magnus” is producer and seller of icecream.

 

true/?/false                     When the ice cream-company “Magnus” combines the producing and the selling of ice cream under the same management, vertical integration takes place.

(True)

 

Question 3

 

After a recession, it was observed that employment did not rise at the same time as general economic activity.

 

true/?/false                   This can be explained by referring to organizational slack resources.

(True)

 

Question 4

 

Suppose that there is inflation, and that the Central Bank changes the rate growth of the money supply so as to equal the long run annual growth rate of production. Suppose also that people  believe this money growth rate will continue to equal the growth rate of production. In the following several immediate effects are mentioned

 

true/?/false                      As an immediate effect, the nominal interest rate would fall. (True)

 

true/?/false                      An an immediate effect, actual inflation would temporarily be negative. (True)

 

Figure 1: Examples of Knowledge Test Items

 

The four examples given assess conceptual understanding. The first and the second questions require students to be able to recognize the definition of the contingency viewpoint and the definition of vertical integration. The second question is embedded in a simplified authentic situation. It asks for more than merely factual recall. Students not only have to reproduce the definition of the concept of vertical integration, but apply it to the case of the ice cream-company. Since only the relevant variables are mentioned, students do not need to retrieve the relevant information from the case in order to be able to identify the strategy used as “vertical integration”. For the third question, in order to be able to give the right answer, students need to build the following frame of reasoning: if economic activity grows, organizational activity will grow. In that case, organizations will first use their slack human resources. For example, by transferring people internally, they are able to meet their increased need for personnel instead of hiring externally. As a result, employment will not rise at the same rate as the general economic activity. Being able to the define the concept of “organizational slack resources” is not sufficient. The conditions for the application of these resources and the consequences in macro-economic terms need to be understood.

The fourth question starts from a macro-economics case which, like the second question, presents the critical elements for solving the problem presented. In order to be able to answer the questions, students need to understand the various relevant concepts (nominal interest rate, inflation, rate of growth of the money supply, long-run annual growth rate of production). Additionally, they are required to master the interconnections between these concepts.

As is clear from the examples, we introduced the “question mark” option. This option allows the students to “pass”. Students encircling the question mark option indicate they have not mastered the subject. They are not forced to give an answer and therefore to guess if they do not know the answer. They are not punished for not knowing: choosing the question mark option gives a score of 0 points. On the other hand, they lose 1 point (-1) when indicating the wrong answer. Encircling the right answer means +1 score. The introduction of this scoring system makes guessing only attractive for students if they are reasonably sure of the answer and if they have mastered an important part of the test items. Therefore, if students give the wrong answer, in most cases2  it reveals that they “misunderstood” the objective measured. For the example in table 2, the test results revealed that some students had constructed their own interpretation of the meaning of a (un)differentiated marketing strategy. Although the concept was studied during tutorials, the misconception persisted that differentiating has to do with the target market instead of the product. Since a quite large group of students took the risk of indicating the “false” option, they seemed to be quite sure of their answer.

 

true/?/false            An undifferentiated as well as a differentiated marketing strategy is directed at approximately the whole market.

(true)

 

Figure 2: An Example of a Knowledge Test Item

 

 

The OverAll Test.

 

Figure 3 presents an example of an OverAll Test item which illustrates the difference from the Knowledge Test item as explained in the previous section.

 

Case Mexx

 

The case study presents the history and recent developments in the fashion company Mexx. Main trends within the European clothing industry are described. Mexx Fashion as a company is illustrated by its organizational structure, its product profile and market place, its business system, its corporate culture and some current facts and figures.

 

 

Question 1

 

true/?/false           Mexx’s corporate culture and philosophy is consistent with the systems viewpoint on management.

 

(False, it is consistent with behavioural viewpoint)

 

Question 2

 

Benneton’s and Mexx’s corporate strategies are quite different. More specifically, there are two main differences.

 

a.   Identify these two main differences in corporate strategies. Illustrate your answer with examples mentioned in the case.

 

b.    What are the advantages of Benneton’s corporate strategy compared to Mexx’s approach?

 

Figure 3: Examples of OverAll Test Items

 

The first question is identical to the first example question of the Knowledge Test: they both refer to the different viewpoints on management. The Knowledge Test item requires from the students to recognize the definition of one of the viewpoints. Memorization of the definition is not sufficient to answer the OverAll Test item. Students have to interpret the case and select the relevant information for this test item . On the basis of a comparison of this information with conceptual knowledge of the different viewpoints on management, they have to deduce the answer. The second OverAll Test question resembles the second Knowledge Test item: they both refer to the concept of vertical integration. However, the OverAll Test item requires students to take more mental steps to reach the solution of the problem posed than the Knowledge Test does. For the first part of the question (a), these can be schematized as follows:

 

C                              Define the concept of corporate strategies

C                              Select the relevant information for the Mexx company as described in the case study

C                              Compare it with the definition of the different possible strategies

C                              Select the relevant information for Benneton as described in the case study

C                              Compare it with the definition of the different possible strategies

C                              Compare the relevant information from both cases with the definition of the strategies

C                              Define, for each company, its strategy

C                              Compare both strategies by going back to the definition of the strategies and the relevant information in the case study

 

For the second part (b), students have to evaluate. Therefore, they have to take some extra mental steps:

 

C                              Understand the conditions for efficiency and effectiveness for the different strategies

C                              Select the relevant information on the conditions for both companies

C                              Interprete the factual conditions by comparison with those studied in the textbooks

 

This example illustrates that the OverAll Test measures whether students are able to retrieve the relevant concept (model, principles) for the problem.

Furthermore, it measures if they can use these instruments to solve the problem. It measures if the knowledge is usable (Glaser, 1990) or if students know “when and where” (conditional knowledge). In short, the OverAll Test measures to what extent students are able to analyse problems and contribute to their solution by applying  the relevant tools.

The OverAll Test is organised within the first year curriculum as follows. After two instructional periods (blocks), the students have two weeks free for self-study. During these weeks, they study on the basis of the study manual they receive at the beginning of this period. This manual presents information about the main goals of the OverAll Test, the parts of the curriculum which are relevant for the study of the material presented in the manual, an example of an elaborated case with test items, some practical (organizational) information and finally a set of articles. The character of the articles is different. It may be a description of a case relating to innovations in or problems of a national or international firm as published in a newspaper or a journal. Other articles express theoretical considerations of a scientist, the report of some research, comments on a theory or model. During the self-study period the students are expected to apply the knowledge they have acquired over the preceding weeks, with a view to being capable of explaining the new, complex problem situations which are presented in the set of articles. They are asked, while reading the articles, to try to explain spontaneously to themselves (i.e. without being explicitly prompted by a tutor) the ideas/theories described in these articles by relating them to previously acquired knowledge. This behaviour is often called 'self-explanation' (Chi, Feltovich & Glaser, 1981). In short, the self-study period can be described as an opportunity for students to practice the analysis and synthesis of economics problems as they have learned to do in the tutorial groups. Therefore, the study manual offers them a set of new problems as described in a set of articles. Figure 4 provides an example.

 

Article: Schoemaker, P.J.H. (1995). Scenario Planning: a Tool for Strategic Thinking. Sloan Management Review, pp. 25-39.

 

Study Guideline: “...The first part that deserves additional attention is the description of the two applications of scenario planning. On these pages Schoemaker relates his method of “scenario planning” to the various statistical techniques you have encountered in the Quantitative Methods blocks. Don’t restrict yourself to the role of a passive consumer of his treatment of statistics, but take a more active position by comparing Schoemakers’ use and interpretation of statistical concepts with that to be found in our textbook Wonnacott & Wonnacott (W&W). To give a simple example: what is Schoemaker’s definition of a “correlation matrix”? And how does this view relate to the interpretation of the concepts covariance (matrix) and correlation (matrix) as to be found in W&W ? And next, what exactly is the relationship between the correlation matrix (such as the one in table 3) and the scenario profiles (given in figure 1)? To phrase the last question in other words: if we give you an arbitrary correlation matrix, could you derive the corresponding scenario profiles?” (OverAll Test I, Information and Study Guidelines, 1995-1996)

 

Figure 4: An Example of an OverAll Test Study Guideline

 

After the two weeks of self-study, the OverAll Test is administered. The OverAll Test questions refer to the articles: they assess whether the students are able to interprete and analyse the problems as presented in the articles by applying the concepts, models and tools they have acquired during the tutorials.

Figure 5 displays two questions for the Schoemaker article.

 

Question 1

 

In his introduction, Schoemaker compares the method of scenario planning with other approaches such as contingency planning, sensitivity analysis and computer simulations. Hellriegel & Slogum (1996) textbook) give a similar comparison of three methods: scenarios, the Delphi technique and simulation. They stress that an overlap exists between these approaches, and indeed, it is not difficult to imagine how to use techniques like Delphi and simulation within Schoemakers’s framework of scenario planning.

 

True/?/false                     The Delphi technique fits better in phase 3 of scenario planning (identifying basic trends) than in phase 9 of the scenario planning (develop quantitative models).

 

Question 2

 

The correlations in Table 3, Part B on p.  31 (Schoemaker, 1995) are nearly all positive, which makes the case rather specific.......Give a new example of scenario planning by solving the following tasks:

 

a.                       Write down a hypothetical correlation matrix, having the same size as that in table 3, but the number of entries with ‘+’, ‘-‘ and ‘0' more equally distributed;

 

b.                                       Derive a scenario profile (as figure 1) from this correlation matrix, thereby given special attention to the existence of both positive and negative correlations. If necessary, make additional assumptions in order to find the profile. Start with one single scenario.

 

c.                                       Derive a second scenario profile , assuming this second scenario to be the >reverse= scenario of the first (reverse in its literal meaning; if the first scenario is something like >recession=, then the second is that of >high economic activity=);

 

d.                                       Give a description in words of the consistency requirements one has to observe in assignments b and c;

 

e.                       Give an interpretation of the outcomes of the scenario profiles you constructed yourself. Schoemaker ends up with one scenario that performs best in all possible aspects, whilst the third scenario is the worst one, again in all possible aspects. Is the same true for the case you designed?

 

Figure 5: A Example of an OverAll Test Item

 

The OverAll Test is administered twice a year. Each OverAll Test assesses the application of knowledge from different disciplines which were studied during the preceding two instructional periods. The Schoemaker article illustrates the integration of knowledge in the field of Statistics with the discipline of Organization. Knowledge from both disciplines has to be used to tackle the problem of scenario planning.

The OverAll Test is a paper-and-pencil test. The questions are based on the articles studied at home. As is clear from the Schoemaker example, the OverAll Test combines two item formats: true-false questions with the question mark option and essay- or open-ended questions. The true-false items are mostly intended to measure if a student can apply the acquired knowledge in a new  situation, if he can use an abstract concept in a specific, quite complex situation which is relevant for the 'real life of economists'.

In Schoemaker (1995) the true/?/false questions ask students to use their knowledge about three approaches (which were studied during the tutorials) in order to interpret the method of scenario planning as presented in the article. It is not sufficient for students to memorize the techniques as described in their textbook. They are required to know the interconnections between these approaches and how they can be effectively used within the distinct phases of scenario planning. These kinds of multiple choice questions in the OverAll Test are set in the context of authentic problems and they are focussed on the use of knowledge in a new problem situation. Where it is not the test’s goal to require from the students to elaborate on the relevance of the Delphi technique for scenario planning, the multiple choice format is considered to be appropriate. In contrast, the open ended question asks for elaboration which cannot be accomplished with a multiple choice format. Students are asked to analyse a new problem i.e. deriving two scenario profiles from a correlation matrix and to evaluate the outcomes of the two scenario profiles. The essay subtest and the true-false subtest have the same weight. The OverAll Test consists of seven to twelve cases or articles, describing one or more related economic problems. The choice of this number of cases is based on the finding that because the sampling breadth is limited, the generalizability of scores may be poor due to content specificity (Swanson et al., 1991). These findings were confirmed by the results of a pilot-study with the OverAll Test (Segers et al., 1991, 1992). Most variability was explained by the interaction effect of persons and cases (35.41% for the essay subtest and 65.48% for the true-false subtest). This means that students who perform better for one case are not necessarily the ones who perform well on other cases. It implies that one case has a low predictive value for the other cases. The findings suggest that for an OverAll Test containing 12 cases, the generalizability coefficient is 0.67.

Since it is  the faculty’s intention to simulate real-world situation in the assessment system, the OverAll Test is not only based on authentic cases but also has an open book character. This means that students are allowed to bring with them the study material they think they will need. As in the real world, they have resource materials available. First, students have to be able to select the proper resource materials and equipment related to the subjects of the test. But, if they cannot use it in an interpretative way, they will not be able to analyse and solve the problem posed (Feller, 1994).

 

Main Concerns About The Assessment Practice

 

During the last five years the faculty has gained experience with the assessment system as described. Although there is a lot of enthusiasm, empirical evidence to interpret the effectiveness of the system implemented was lacking. A set of questions emerged. In this paper I will elaborate upon three of them.

 

C                              Are the assessment instruments fair? To what extent are the scores on the OverAll Test and on the Knowledge Test influenced by the match between instruction and test? For the OverAll Test, students’ evaluations3  indicate students experience difficulties because they have not gained enough experience in applying the acquired knowledge within a diverse set of realistic situations. However, the OverAll Test is based on the faculty objectives as operationalized within the study materials of the students and the guidelines for the tutors. The question emerges as to whether there is a lack of match between the formal and the operational curriculum. In that case, students might be expected to have serious difficulties answering the test (Birenbaum, 1996; Pelgrum, 1989). Especially within a problem-based curriculum where tutors are only the guide the students have for generating learning issues and self-study, it is important to obtain information on this issue (Dolmans, 1993). This leads to two interrelated questions. First, is there a match between the formal and the operational curriculum? Secondly, to what extent does the test measure the formal and operational curricular objectives?

 

C                              Does the OverAll Test measures the extent to which the students are able to use a conceptual network to analyse authentic problems ? Or, is it just another instrument for measuring factual recall?

 

C                              What is the use of a Knowledge Test? What is the value-added of a more traditional assessment instrument ? Do the Knowledge Test scores provide additional and indispensable information about the students’ level of expertise ?

 

These questions were addressed in three studies. For each study, the theoretical framework, research method and results will be described.

 

Study 1: Are The Assessment Instruments Fair?

 

This study examines the curricular and instructional validity of the faculty assessment instruments.

 

Rationale.

 

Is it fair to expect the students to answer the test questions? If the test is valid, or in other words, if the knowledge assessed is part of the curriculum, the answer is yes. This means the test content matches the formal curriculum, i.e. the curricular objectives and the curriculum material. To check this match is the most common method test constructors use to establish test validity. In so doing, they assume that the objectives are actually taught. Many studies indicate that this assumption may be questioned (Calfee, 1983;  De Haan, 1992;  English, 1992;  Leinhardt & Seewald, 1981; Pelgrum, 1990). The operational curriculum, what is actually taught in the classrooms,  can differ to a significantly extent from the formal curriculum as described in textbooks and syllabi. McClung (1979) introduced the term instructional validity  to describe  the match between the operational curriculum and what is tested. The overlap between the test content and the formal curriculum is called curricular validity. A mismatch between the formal and the operational curriculum has consequences to take into account. On the basis of the assessment results, the faculty makes inferences on the extent to which the faculty objectives are reached. They are one source of input for the evaluation of the faculty practice. If the test does not measure what has been taught, no inferences can be made about the quality of the teaching process (English, 1992). In a summative context, when the tests are used as selection instruments, the faculty only expects the students with a certain profile to pass the tests. This profile is defined in congruence with the faculty objectives. In the case of a lack of instructional validity of the assessment instrument, how can the students be described in terms of knowledge and skills? To what extent can the assessment results indicate if first year students will be able to follow the second year courses which build on the first year knowledge?

Although instructional validity is a concern for all types of curricula, what about problem-based curricula? In a Problem-Based Learning-setting, more than in a conventional one, students are expected to take responsibility for their learning process. In a conventional curriculum, teaching is the central process. The tutor defines the objectives, the content of the courses, the ways to reach the objectives. In most cases, the teacher directly “delivers” the information to the students by lecturing. In the case of a local test (no national standarized test), he constructs the test on the basis of his notes for the lectures. In Problem-Based Learning, the path from objectives to the test is longer and the students have more freedom to choose their own way.

 

Faculty objectives

ì

Problem

ì

Discussion in tutorial group

ì

Student-generated learning issues

ì

Self-study

ì

Feedback in tutorial group

ì

Assessment

 

Figure 6: The Learning Process in a Problem-based Learning Setting

 

 

Faculty objectives

ì

Lectures/Syllabus

ì

Self study

ì

Assessment

 

Figure 7: The Learning Process in a Tutor-centred Instructional Process

 

The faculty objectives are operationalized in a set of tasks. These tasks present the problem situation which students have to analyse and try to solve. Students work on the problem in small tutorial groups. The result is a list of learning objectives considered to be relevant in order to analyse and solve the problem they are confronted with. Since in a lot of cases the problems are ill-structured (complex as real problems mostly are), there may be differences in learning objectives. The learning objectives are the starting point for students’ self-study. They look for the relevant information to achieve their learning objectives. In discussion with their colleagues and the tutor, they check the relevance of the information for the problem and build the relevant theoretical framework. At the end of the instructional period, a test is administered to measure the extent to which they have mastered the basic knowledge of that instructional period. How sure can we be the test is fair for students with sometimes different paths for analysing and solving the single problem posed? Various studies have tried to gain insight into the relation between the formal and the operational curriculum within a Problem-Based Learning-setting ( Coulson & Osborne, 1984; Dolmans, 1994; Shahabudin, 1987;Tans et al., 1986). They conclude that, to an significant extent, there is an overlap between both curricula. Additionally, Dolmans (1994) investigated the relation between the time spent during the tutorial groups on the core concepts of the instructional period and the scores on the test referring to these concepts. The correlation seems to be weak (r= .22, p<0.5, n=94).  Probably the quality, more than the quantity, of the time spent on problems affects test scores.

 

Research method.

 

Procedure.

 

The formal curriculum was described by analysing the textbooks, syllabi and tutorial manuals. The analysis resulted in a list containing more than 500 detailed topics for each period. This extended list has been screened by domain specialists to get a workable list. They constructed a hierarchical schema of the list of topics. The highest hierarchical levels of the networks of subjects are included in the final version. An example are the concepts of entry strategies, export, licensing and joint ventures. They were all included in the draft version. In the final version only the concept of entry strategies was included. By this screening, the list of central concepts has been reduced to 147 topics for the Marketing and Organization period and 136 topics for the Macro-economics period. The curricular validity is examined by comparing the formal curriculum with the test of the first instructional period. The list of concepts is compared with the list of objectives of the Knowledge Test and OverAll Test.

To examine the instructional validity, two questionnaires were developed on the basis of the lists of concepts. The questionnaires are a modified version of the Dolmans Topic Checklist (1994). The first Topic Checklist (TOC1) consist of 147 topics and eight main themes of the disciplines Marketing and Organization. An example of the TOC 1 is presented in figure 3.The first column contains some examples of the 147 topics of TOC1. The upper row present some examples of the eight main themes. Its  relevance will be explained in study 3.

 

 

Topics

 

organization +

systems

 

marketing mix

 

consumers=behavior

 

Structure follows strategy

 

1

 

2

 

3

 

Product attributes

 

1

 

2

 

3

 

Giffen good

 

1

 

2

 

3

 

Lorenz curve

 

1

 

2

 

3

 

Figure 8: Examples of Topics and Main Themes in the Topic Checklist 1

 

Students are asked to indicate whether the topic was discussed in their tutorial groups or not, by encircling the topic or not. In order to gain some insight into the quality of the time spent on the topic, the second Topic Checklist (TOC 2) on Macro-Economics, consists of two additional questions. Students had to indicate the level of comprehension they believed they had reached. For every respondent, how many topics they had  mastered on each of the three levels of comprehension which were distinguished, were counted. The levels were the level of definition, the level of comprehension and the level of analysis. The level of definition indicates the student is (only) able to reproduce the meaning of the concept as formulated in the textbooks. To comprehend the topic indicates that the student is able to define the concept in his own words, describe its relevance and its relation to other concepts. To master a topic on the level of analysis means that the student is able to apply the concepts when being presented with a problem to be analysed. In addition to the students, the staff members who developed the course were asked to indicate for each topic the intended level of comprehension. Finally, students were asked if a topic received much, moderate or not much attention during the tutorial meetings.

 

Sample.

 

The sampling procedure employed in the study is a quota sample. The group of first year students is, for organizational reasons, divided into four groups. Two groups have their meetings in the morning, two groups in the afternoon. Students were equally selected from these four groups. For the TOC 1, 34 student volunteers participated, for TOC 2, 45 students.

 

Results

 

As the results in table 1 indicate, there is a significant amount of overlap between topics as planned for study by the staff and the topics indicated by the students as being subject of discussions and study during the instructional period. Table 1 indicates that on average 87% of the topics of TOC 1 and 77.4% of the topics of TOC 2 have been subject of study (RT). Other studies investigating the match between the formal and the operational curriculum in a Problem-Based Learning-setting (Dolmans, 1994), show an overlap of 64.2% (s=26.7).

      Students perceive they have mastered on average 47% of the topics of TOC 2 on the level of comprehension. This means that they are able to explain in their own words the meaning of the topics, their relevance and their relation to other concepts. For, on average, 31% of the topics, students state that they are able to use these topics for the analysis of problems (level of analysis). For, on average, 22% of the topics, students indicate that they master them on the level of definition, this means “only” reproducing the definition. The correspondance with the intentions of the staff is considerable.

 

Table 1: The Degree of Overlap Between the Formal and the Operational Curriculum

 

Variables

Mean

Standard Deviation

n

RT1*

NRT1*

RT2

NRT2

87%

12%

77.4%

22.6%

17.33

15.67

12.64

25.67

34

34

45

45

Definition

22.1% (student)

20.6% (staff)

21.24

45

Comprehension

47% (student)

40.4% (staff)

22.64

45

Analysis

30.9% (student)

39% (staff)

16.58

45

 

RT: Recognized Topics (1= TOC1. 2= TOC2)

NRT: Not Recognized Topics

 

This means that they were missing in the Topic Checklist I. Comparing the topics discussed or not (RT/NRT) with test items content, none of the topics which were indicated as not subject of the discussions by more than 29% of the students (percentile 25) are part of the tests. This result suggests high instructional validity of the Knowledge Test as well as the OverAll Test.

Additionally for TOC2, the more topics students indicate as “received much of attention during the meetings”, the higher their OverAll Test score (r= .40*). On the other hand, the more topics students indicate as “received moderate attention during the meetings”, the lower the OverAll Test scores are (r= -.32*). Probably, students acquired partial knowledge by the “small talks” they had about the topic. This partial knowledge might impede instead of enhance succesful problem analysis. There is only a very weak correlation between topics which received not much attention and the test scores (r= .01).

 
Study 2: Criterion Validity Of The OverAll Test

 

The second study focuses on the evaluation of the OverAll Test criterion validity: to what extent do the test scores reflect students= abilities to analyse and solve economics problems?

 

Rationale

 

Despite the enthusiasm for alternative forms of assessment, stressing complex cognitive processes and analogy with the actual conduct of problem solving, they pose some challenging problems. As compared to traditional tests, authentic assessment instruments measuring problem solving are often thought to be better reflections of the criterion performances that are of importance in the students’ future professional careers (Linn & Burton, 1994, Magone et al., 1994). Until now, there are only a few examples of studies offering empirical evidence for this assumption (Burger & Burger, 1994, Magone et al. 1994). Magone et al., 1994 report on the QUASUAR project (Quantitative Understanding: Amplifying Student Achievement and Reasoning) and more precisely on the QUASAR Cognitive Assessment Instrument. It consists of a set of open-ended assessment tasks, asking students not only to select or produce answers but also to show their work or to justify or explain their solutions. Magone et al. used different sources of logical and empirical evidence for judging the validity of the assessment instrument: well-defined tasks specifications, systematic internal and external reviews of each task and qualitative analysis of students’ responses. This quantitative analysis focussed on the processes underlying task performance: does the analysis of the students’ responses indicate their conceptual understanding and their ability to use basic concepts to solve a problem? According to Magone et al. (1994), the results support the validity of the instrument. They suggest that the tasks require high-level thinking and reasoning processes.

Another source of information for the validity of a test, is the relation of the test scores to an external criterion. Shepard (1992) describes the empirical evidence of relations to external criteria as an integral part of today’s definition of validity. Test are always simplifications of what we intend to measure. Therefore, it is important to determine if and to what extent test scores reflect other abilities than those intended. Writing skills, for example, might confound open-ended assessment of the analysis of economics problems. Criterion-related validity is especially important in practice for selection and placement decisions. If the test is used to select students for graduation or for entering postgraduate courses, a practically significant statistical relationship should be evident between test score and relevant criterion. The Burger and Burger study compares three instruments, two performance-based assessment instruments measuring writing and reading skills and a norm-referenced test series designed to measure achievement in basic skills taught throughout the nation. The findings of the Burger & Burger study provide some “encouraging” (p. 14) evidence for the validity of the performance assessment instruments.

The purpose of the study presented in this paper is to use the Burger & Burger approach to determine the criterion validity of the OverAll Test: to what extent does the OverAll Test measure the ability of students to define, analyse and solve economics problems?

 

Research Method

 

Procedure.

 

Student performances on the OverAll Test and on a set of economics problems were compared. Four problems were formulated by experts in the field of macro-economics and finance. The problems deal with real-life situations. The construction and review of the problems were guided by a set of criteria for case writing (Leenders & Erskine, 1989; Vilsteren van et. al, 1993). The lengths of the described economic problems are from 25 to 100 lines. Each problem starts with an introduction, presenting information about the company (context information), and the position of the student. The specific problem situation is described in the body of the case. The problem description ends with a set (max. 3) of analysis tasks. They refer to the analysis of the problem presented as well as to the analysis of reasons (Messick, 1989).

In order to analyse the processes underlying the problem analysis, the method of think-aloud protocols is used. The participating students read the problem description aloud. Then they are asked to think aloud as they analyse the problem (Messick, 1989). In order to analyse the route students follow during their analysis, students are asked to mention if they return to a previous section of the problem description. Immediately after problem-analysis while thinking aloud, students are asked to write down their response to the problem.

The analysis of student responses (written and oral) focuses on the knowledge structures that are used during problem solving. It does not only look at the points of decisions between alternatives, but also attempts to map the whole process from the formulation of hypothesis to the reaching of a solution to the problem, the nature of the knowledge used and the cognitive operations used to reach the solution (Patel & Arocha, 1995) The schemes for the analysis of the responses were based on a detailed model of the analysis of the problems by the expert-constructors. If necessary, the schemes were expanded and modified as a sample of actual responses was reviewed and coded. Central criteria for the coding were the amount of correct concepts, relationships between the concepts used for problem analysis, and the correctness of the product (solution of the problem). For the latter criterion, three categories were used: correct answers, partially correct answers and wrong answers. Additionally in the analysis of student responses to the problems, two categories were examined: the  length of the reasoning process and the degree to which students went straight to the aspects of the problem relevant to the analysis (Flaherty, 1974). Finally, comparisons are made of the results of the protocol- analysis for the three groups of students.

 

Sample.

 

The results of the analysis of the four problems were obtained for fifteen first-year students. The sampling procedure used in the study is a qouta sample. From the 45 participants of the first study (TOC2), 15 students were selected on the basis of their scores on the OverAll Tests. We devided the 37 participants in three groups: the group of students with the 27 % highest OverAll Test scores, the group of students with the 27% lowest OverAll Test scores and the group in between. Five students were equally selected from these three groups.

 

Results4

 

In general, the students with a high-score on the OverAll Test (high achievers) performed better on the problem tasks. They identified more relevant concepts and clusters of concepts (interrelated concepts). As presented in table 2, the high achievers identified 63% of the relevant concepts, the low achievers 37% and the moderate achievers 38%. Additionally, the amount of correct answers to the analysis tasks formulated for each problem (decisions in the problem solving process), was significantly higher for the high achievers (6.2) than for the low achievers (2.5) and moderate achievers (3.8). The differences between the low achievers and the moderate achievers for the amount of concepts as well as for the correctness of the answers is negligible. If the three groups of students are compared on the amount of partially correct answers, there is an important difference between the three groups. The moderate achievers especially take partly correct decisions (1.4). The low achievers take the most wrong decisions (9.2), although the difference between low achievers and the moderate achievers is small. Table 2 presents the descriptive statistics for the two protocol-analysis criteria: the average amount of concepts used during problem analysis and the correctness of the decisions made.

 

Table 2: The Average Amount of Concepts Used. the Average Amount of Correct Answers. the Average Amount of Partly Correct Answers and the Average Amount of Wrong Answers on a Set of Cases.

 

Results

High achievers

Moderate achievers

Low achievers

 

Amount of concepts

 

63% (3.0)

 

38% (18.7)

 

37% (16.7)

 

Amount of correct answers (max.12)

 

  6.2   (1.3)

 

  2.6   (2.6)

 

  2.5   (3.1)

 

Amount of partly correct answers (max.12)

 

  0      (0.0)

 

  1.4   (1.1)

 

  0.2   (0.5)

 

Amount of wrong answers (max.12)

 

  5.8   (1.3)

 

  8.0   (2.9)

 

  9.2   (2.8)

 

It can be concluded that the preliminary findings of this study provide some evidence for the criterion-related validity of the OverAll Test

 
Study 3: The influence Of Knowledge On Problem-Solving

 

What are the merits of assessing students’ knowledge structures when the main goal of instruction is successful problem-solving? To what extent can student knowledge profiles serve as feedback for their problem-solving abilities? The third study presented investigates the influence of students’ knowledge structure on their performance on problem-solving tasks as measured with the OverAll Test.

 

Rationale

 

Research into the differences between experts and novices in performance on problem-solving tasks, resulted in a profile of successful problem-solvers (Glaser & Chi, 1988; Yekovich, 1993). Smith (1991) summaries the internal factors affecting problem-solving performance. Successful problem solving is enhanced by

C       affective variables, including self-confidence, motivation, beliefs etc

C       the length of prior successful problem-solving experience

C       knowledge of the domain from which the problem is drawn (factual, conceptual, procedural)

C       knowledge of general problem-solving procedures such as means-ends analysis, trial-and-error, etc.

C       knowledge which is adequate, organized, accessible, integrated and accurate (misconception free)

C       other personal characteristics such as cognitive development, personality etc.

 

The importance of an adequate, well-organised and easily accessible conceptual knowledge of the relevant content domain is confirmed by studies of Chi et al. (1981) and Perkins, Schwartz, and Simmons (1988). The knowledge base serves as the basis for the representation, analysis and solving of the problem presented. In addition to this conceptual understanding, the successful problem solver knows what to do, how and when to do it. Problem solving requires procedural knowledge (Smith, 1991). The present study focuses on the influence of the student’s declarative knowledge on his/her performance on problem-solving tasks in the domain of economics. If the study provides empirical evidence for the relevance of an organized knowledge structure for successful problem-solving, examining students’ knowledge profile is a relevant instrument in the regular instructional process aiming at successful problem-solving as well as for remedial purposes.

 

Research method

 

Procedure.

 

The procedure used is sorting concepts. According to Chi et al. (1981) and Shavelson (1974) , this method is a valid way to try to provide an answer to the question of to what degree the student’s knowledge is structured. The respondents are asked to sort the concepts presented in TOC 1 within the eight main themes which are presented (see study 1).The student’s result from the sorting task is compared with his performance on the OverAll Test. For TOC2, the students were not asked to classify but to indicate the level of competency they acquired for each of the presented concepts (see study 1). This variable is correlated with the student’s score on the OverAll Test. Finally, the student’s score on the Knowledge test, covering the same content domain, is compared with the OverAll Test score.

 

Sample.

 

The same sampling procedure is used as in study 1.

 

Results

 

The results of this study confirm previous research results on the influence of an organised knowledge base on problem-solving performance. Correlation coefficients (see table 3) indicate that the better students are able to classify the concepts of the domain of marketing and organisation (TOC1), the better they are able to analyse and solve problems within this domain. The more concepts are wrongly classified, the lower the students’ performance on the OverAll Test.

Correlation of student’s score on the Knowledge Test with the OverAll Test score is even more convincing.

 

Table 3: Pearson’s Correlation Coefficients between Students’ Sorting Performance and the

OverAll Test Scores.

 

 

OverAll Test score

(Total %)

 

CST

WST

NST

KT score (C-I)

 

 0.49*

-0.1338

-0.2452

 0.69**

 

CST= correctly sorted topics

WST= wrongly sorted topics

NST= not sorted topics

KT score (C-I)= Knowledge test score (Correct-minus-Incorrect score)

* statistically significant with a confidence level of 95%

**statistically significant with a confidence level of 99%

 

For the TOC2, students were asked to the level of competency they perceived to have acquired for each of the concepts. Correlation of this variable with students’ scores on the OverAll Test indicate the more concepts are mastered on the level of analysis, the higher student’s score on the OverAll Test (see table 4).

 

 

 

 

 

Table 4: Pearson’s Correlation Coefficients between Students’ Perception of the Level of

Comprehension and the OverAll Test Scores.

 

Level of Competency

OverAll Test Score

(Total %)

OverAll Test Scores/Open-ended Questions

Definition

Comprehension

Analysis

KT-score (C-1)

-0.43

 0.06

 0.29*

 0.45**

-0.12

-0.11

 0.37*

 0.69**

 

In summary, a well-organised knowledge base seems to affect successful problem-solving as measured by the OverAll Test. There is some empirical evidence that student’s perception of mastering the concepts on the level of analysis, relates to his performance on the OverAll Test.

 

 
Conclusions

 

Contemporary cognitive psychology suggested several changes for instruction and assessment (Calfee, 1995). For example, the importance of knowledge application instead of knowledge consumption and .Additionally, assessment and instruction must be contextualized, reflective and social. On the basis of these ideas, a lot of schools are looking for and experimenting with alternative ways to develop their curricula. The Maastricht School of Economics and Business Administration introduced a problem-based curriculum, intending to educate competent problem-solvers. As a lot of schools do, the Maastricht school struggled with the choice and the implementation of a congruent assessment system. We chose for two assessment instruments: a Knowledge Test and an OverAll Test. The present article intended to describe the case of Maastricht assessment system as an example of assessment within an innovative curriculum. One of the main concerns of the faculty was to gain empirical evidence for the quality of the assessment system in its broad sense. In this way, the article presented a second case: a research methodology to search for empirical evidence for the quality of assessment innovations. Finally, with the three studies presented, I hope to contribute to the discussions about the feasibility of alternatives in instruction and assessment.

I addressed three questions. When introducing student-centred programs, there is a lot of concern about student outcomes. Do student in settings such as problem-based programs actually conduct learning activities that correspond with the learning activities that were intended by the faculty (Dolmans, 1994)? Will the students work on the topics the faculty describes as essential for a competent professional in the field? If not, is it fair to assess students on the basis of the formal goals? To investigate these questions, a Topic Checklist was designed as a map of the formal curriculum .This map was presented to students in order to describe the instructional practice. This map was also used as a blueprint to analyse the assessment goals. The study presented suggests there is an important degree of overlap between the formal and the operational curriculum, in terms of concepts studied as well as in terms of the level of mastery intended and achieved. Although learning in the problem-based curriculum is highly self-directed, students address the issues the faculty describes as essential. Additionally, there is a sufficient congruence between the assessment practices in terms of goals assessed and the formal and operational curriculum. This implies that, even when there is a lot of freedom for the students within the program, it seems to be possible to make assessment instruments fair to the student. Additionally, because of the match between the curriculum and the assessment practices, student outcomes are a relevant source of information about the teaching practices.

The second question concerns one of the main issues of performance-based assessment instruments. Even when a the faculty develops case-based assessment instruments, the question remains if a student’s performance on the cases has anything to do with professional problem-solving? The second study addressed the criterion validity of the OverAll Test. Are high achievers successful problem-solvers? The preliminary results of the analysis of the think-aloud protocols suggest there is some empirical evidence to answer this question confirmative. It seems that it is possible to assess students problem-solving with assessment instruments based on a set of authentic cases with analysis tasks.

Finally, one of the basic assumptions of the Maastricht assessment practices was addressed: the influence of student’s knowledge profile on his performance on the OverAll Test. Student’s  performance on a concept-mapping task seemed to relate to his performance on the OverAll Test. Students’ performance on the Knowledge Test indicated the same relation between knowledge profiles and OverAll Test  performances. These findings confirm the results of research in the field of cognitive psychology: the possession of a well-organized knowledge base is important for successful problem-solving.

Considering the findings, some implications for assessment as well as instruction can be formulated. The so-called innovative assessment movement has led to a growing interest in new forms of assessment. Some examples are: open-book exams, take-away exams, projects, real life tasks, simulation exercises, self and peer assessment. Assessment instruments aiming to measure students’ conceptual understanding do not seem to fit in these ideas. They are often condemned as traditional instruments measuring on a low cognitive level such as conceptual understanding. However, the results presented indicate the importance of the measurement of conceptual understanding. They suggest that the breath of the student’s knowledge base and degree of fragmentation and structure is a relevant dimension of assessment. Although the assessment of problem solving skills is the ultimate goal, we should not relinquish the traditional assessment techniques. Alternative assessment techniques such as the OverAll Test should not replace the Knowledge Test. The use of both instruments enables a triangulation based on a wide-range of evidence, thus increasing the quality and the validity of the inferences drawn on the basis of the assessment (Birenbaum, 1996). If diagnosis of the sources of poor problem-solving performance is one goal of assessment, then the assessment should permit identification of the nature and the extent of a student’s knowledge. If assessment can uncover more precise deficits in students’ knowledge bases, then more specific guidelines for instructional remediation can be made for individuals and groups with similar strengths and weaknesses. Knowledge about the processes and products of successful reasoners coupled with the same knowledge about less successful students provides some instructional guidance regarding “what to teach” (Brown, Bransford, Ferrara, & Campione, 1983).

For instruction, our results imply that when problem-solving is a main goal, learning environments should be designed enabling students to acquire a knowledge base which is by its nature and extent a sufficient basis to identify, define, analyze and solve authentic problems. The extent to which students reach this goal is an important indicator for the design as well as the review of the learning environment.

     For assessment, the findings suggest that feedback should follow two dimensions: the breath and the depth of a student’s knowledge profile and the extent to which this knowledge is usable. No single assessment technique can satisfy both assessment dimensions without presenting a distorted view of student’s capabilities (Birenbaum, 1996). Therefore, a variety of assessment tools is preferable to a single tool.

 

Notes

 

1.    The first year comprises four instructional periods, called blocks, each lasting for 8 weeks.

2.    If the psychometric data (item-test correlation coefficients) do not indicate insufficient quality of the test item itself.

3.    After each OverAll Test administration, students fill in a questionnaire asking for their study strategies, the match between instruction and the test and the difficulties they have experienced.

4.    At the time of publication, the protocol analysis is not yet finished. Therefore, only preliminary results are presented.

 

 

 

 

 

 

 

References

 

Anderson, J. R. (1983). The architecure of cognition. Cambridge, MA: Harvard University Press.

 

Birenbaum, M. (1996). Assessment 2000: Towards a Pluralistic Approach to Assessment. In M. Birenbaum, & F. J. R .C. Dochy, Alternatives in assessment of achievements, learning processes and prior knowledge (pp. 3-30). Boston, Dordrecht, London: Kluwer Academic Publishers.

 

Blum, W., & Niss, M. (1991). Applied mathematical problem solving, modelling, applications and links to other subjects. State, trends and issues in mathematics instruction. Educational studies, 22 (1), 7-68.

 

Brown, A. L., Bransford, J. D., Ferrara, R. A., & Campione, J. C. (1983). Learning,         

remembering and understanding. In J. H. Flavell, & E. M. Markman (Eds.), Carmichaels’s manual of child psychology (Vol.1, pp. 77-166). New York: Wiley.

 

Burger, S. & Burger, D. (1994). Determining the Validity of Performance-based Assessment. Educational Measurement: Issues and Practices, Spring 1994, pp. 9-15.

 

Calfee, R. (1983). Establishing instructional validity for minimum competence programs. In G. F. Madaus, The courts, validity, and minimum competence testing (pp. 95-114). Boston: Kluwer-Nijhoff Publishing.

 

Calfee, R. (1995). Implications of Cognitive psychology for Authentic Assessment and Instruction. In T. Oakland, & R. K. Hambleton (Eds), Academic Assessment. Boston/London/ Dordrecht: Kluwer academic Publishers.

 

Chi, M. T. H., Feltovich, P. J., & Glaser, R. (1981). Categorization and representation of physics problems by experts and novics. Cognitive science, 5, pp. 121-152.

 

Chi, M. T. H., & Van Lehn, K. A. (1991). The content of physics self-explanation. Journal of the Learning Sciences, 1, 69-105.

 

Coulson, R. L., & Osborne, C. E. (1984). Insuring Curricular Content in a Student-directed Problem-based Learning Program. In H. G. Schmidt, & M. L. De Volder (Eds.), Tutorial in Problem-Based Learning. A New Direction in Teaching the Health Professions (pp. 225-229). The Netherlands: Van Gorcum.

 

De Haan, D. M. (1992). Measuring test-curriculum overlap. Enschede: Febo.

 

de Lange, J. (1992). Assessing mathematical skills, understanding and thinking. In R. Lesh, & S. Lamon (Eds.), Assessment of authentic performance in school mathematics (pp. 195-214). Washington, D.C.: American Association for the Advancement of Science.

 

Dochy, F. J. R. C., & Alexander, P. A. (1995). Mapping Prior Knowledge: A Framework for Discussion among Researchers. European Journal for Psychology of Education, X, (3),225-242.

 

Dolmans, D. (1994). How students learn in a problem-based curriculum. Maastricht: Universitaire Pers.

English, F. W. (1992). Deciding what to teach and test. Newbury Park California: Sage      Publications Company, Corwin Press, INC.

 

Feller, M. (1994). Open-book testing and education for the future. Studies in Educational Evaluation, 20, pp. 235-238.

 

Feltovich, P. J., Spiro, R. J., & Coulson, R. L. (1993). Learning, Teaching, and Testing for Complex Conceptual Understanding. In N. Frederiksen, R. J. Mislevy, & I. I. Bejar (Eds.), Test theory for a New Generation of Tests. Hillsdale, New Jersey: Lawrence Erlbaum Associates, Publishers.

 

Flaherty, E. G. (1974). The Thinking Aloud Technique and Problem Solving Ability. Journal of Educational research, 68, pp. 223-225.

 

Glaser, R. (1990). Toward new models for assessment. International Journal of Educational Research, 14, 475-483

 

Glaser, R., & Chi, M. H. T. (1988). Overview. In M. H. T. Chi, R. Glaser, & M. J. Farr (Eds.), The nature of expertise (XV-XXVIII). Hillsdale, New Jersey: Lawrence Erlbaum Associates, Publishers.

 

Lawson, C. (1992). On the relation between course structure, teaching methods and evaluation procedures in economics. Assessment and Evaluation in Higher Education, 17, (1), pp. 1-10.

 

Leenders, M. R., & Erskine, J. A. (1989). Case Research: The case writing process. London, Ontario: University of Western Ontario.

 

Leinhardt, G., & Seewald, A. M. (1981). Overlap: What’s Tested, What’s Taught? Journal of Educational Measurement, 18 (2), 85-95.

 

Lesh, R., & Lamon, S. (1992). Assessment of authentic performance in school mathematics. Washington, D.C.: American Association for the Advanced Science.

 

Linn, R .L., & Burton, E. (1994). Performance-Based Assessment: Implications of Task Specificity. Educational Measurement: Issues and Practice, spring 1994, 5-15.

 

Magone, M. E., Cai, J., Silver, E. A., & Wang, N. (1994). Validating the cognitive compexity and content quality of a mathematics performance assessment. International Journal of Educational Research, 21, (4), 317-340.

 

Mallier, T., Morwood, S., & Old, J. (1990). Assessment methods and economics degrees. Assessment and evaluation in Higher Education, 15 (1),  22-44.

 

McClung, M. S. (1979). Competency testing programs: Legal and educational issues. Fordham Law review, 47,  6511-712.

 

Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational Measurement (pp. 13-104). New York: Macmillan.

 

Patel, V. L., & Arocha, J. F. (1995). Methods in the study of clinical reasoning. In J. Higgs, & J. Mark (Eds.), Clinical reasoning in the health professions (pp. 35-48).Oxford: Butterworth-Heinemann.

 

Pelgrum, W. J. (1990). Educational assessment: monitoring, evaluation and the curriculum. Enschede: Febo.

 

Perkins, D. N., Schwartz, S., & Simmons, R. (1988). Toward a unified theory of problem-solving; a view from programming. Paper presented at the meeting of the American Educational Research Association, New Orleans, LA.

 

Schoemaker, P. J. H. (1995).  Scenario Planning: a Tool for Strategic Thinking. Sloan Management Review, pp. 25-39.

Segers, M. S. R., Tempelaar, D., Keizer, P., Schijns, J., Vaessen, E., & Van Mourik, A. (1991). De OverAll Toets : een eerste experiment met een nieuwe toetsvorm. [The OverAll Test: A first experiment]. Maastricht: University of Limburg.

 

Segers, M. S. R. , Tempelaar, D., Keizer, P., Schijns, J., Vaessen, E., & Van Mourik, A. (1992). De OverAll Toets : een tweede experiment met een nieuwe toetsvorm. [The OverAll Test: A second  experiment]. Maastricht: University of Limburg.

 

Shahabudin, S. H. (1987). Content Coverage in Problem-based Learning. Medical Education, 21, 31-313.

 

Shavelson, R. J. (1974). Methods for examining representations of a subject-matter structure in a student’s memory. Journal of Research in Science Teaching, 11, 231-249.

 

Shepard, L. A. (1992). Evaluating test validity. Review of Research in Education, 19, 405-450.

 

Smith, M. U. (1991). A view from biology. In M. U. Smith  (ed.) Toward a Unified Theory of Problem solving. (pp. 1-19). Hillsdale, New Jersey: Lawrence Erlbaum Associated, Publishers.

 

Spiro, R. J., Coulson, R. L., Feltovich, P. J., & Anderson, D. K. (1988). Cognitive flexibility theory: Advanced knowledge acquisition in ill-structured domains. In The tenth annual conference of the cognitive science society ( pp.375-383). Hillsdale, NJ: Lawrence Erlbaum Associates.

 

Swanson, D. B., Case, S. N., & van der Vleuten, C. P. M. (1991). Strategies for student assessment. In D. Boud, & G. Feletti, The challenge of problem-based learning

(pp. 260-274 ). London: Kogan Page.

 

Tans, R. W., Schmidt, H. G., Schade-Hoogeveen, B. E. J., & Gijselaers, W. H. (1986). Sturing van het onderwijsleerproces door middel van problemen: Een veldexperiment. [Directing the Learning Process by Means of Problems: A Field Experiment]. Tijdschrift voor Onderwijsresearch, 11 (1), 35-46.

 

Vilsteren, P. P. M. van, Heijden, M. P. van der, & Arts, A. R. M. (1993). Het gebruik van casussen in cursussen van de Open Universiteit (The use of cases in Open University courses). COP-reeks 9301, Heerlen: Open Universiteit.

 

Yekovich, F. R. (1993). A theoretical View of the Development of Expertise in Credit Administration. Paper presented at the 1993 Annual Meeting of the American Educational Research association, Atlanta, Georgia.

 

 

The Author

 

MIEN SEGERS is Associate Professor Assessment and Evaluation at the Department of Educational Development and Research, School of Economics and Business Administration, Universiteit Maastricht, The Netherlands. She received her PhD in the field of quality assurance in Higher Education. Her current research activities are focussing on the implementation of innovative assessment practices within problem-based curricula.


2. 

 


Maastricht Skills Test

Faculty of Medicine

-        Examples of criteria lists


Criterialist                              :           nr. 07158

 

Field                                       :           Gynaecology/Obstetrics

 

Station                                    :           CERVICAL SMEAR

 

Production date                      :           september 1996

 

Drawn up by                           :           Hieke Kruseman

 

Intended for                            :           4th year students 1996-1997

 

Date of examination                :           april 10th 1997

 

Time                                       :           20 minutes

 

Simulated patient                    :           female

 

Instruments needed                 :


STUDENT'S TASKs

 

In this station medical technical skills are assessed.

 

You are in your clercship General Practice.

Mrs. Brown, 35 years old, visits the surgery because of vaginal discharge, which starts approximately 2 weeks ago.

 

You will receive 4 tasks.

You have 20 minutes to fullfill these tasks.

 

Task I:

Take a relevant history concerning this complaint.

 

Task II:

Carry out speculum examination on the model.

 

Task III:

Make a cervical smear on the model.

 

Task IV:

Interpret the photographs the examiner shows and report the examiner which is the most likely diagnosis.

 

Througout the examination, state what you are doining, to what you are paying special attention, and what your findings are.

 

 

Good luck!


INSTRUCTION FOR THE SIMULATED PATIENT

 

§       You are a 35 year old woman, you have no children and you have never been pregnant.

§       You have a vaginal discharge since approximately 2 weeks, wich is coloured grey-­white, and doesn't smell very well. The discharge is not bloody.

§       You have no itching complaints.

§       Every day you have to change clothes, sometimes twice a day.

§       You never had these complaints before.

§       You don't know what's the cause of the discharge.

§       Miction is not painful, copulation isn't painful either. You have no abdominal complaints.

§       The last menstruation was 3 weeks ago. It was in time, and lasts for 5 days, which is normal for you.

§       You don't use medicins.

§       You are married. Your husband has no complaints.

§       Your husband had a vasectomy, 7 years ago.

§       The last cervical smear was carried out 4 years ago. PAP 1 (normal).


EXAMINER'S INSTRUCTIONS

 

In this criterialist will be used a 6-point scale.

The examiners instruction gives a global description of actions the student has to performe.

 

 

Task-I:

History

 

Item 1 :

Amout of discharge?                                                  (much)

ltching?                                                                      (no)

Stench?                                                                      (yes)

Colour of discharge?                                                 (grey-white)

Blood?                                                                                   (no)

Related to the menstrual cycle?                                (Unknown, started one week after last menstrual period)

Recent changes in sexual behaviour?             (no)

Cohabitation painful?                                                 (no)

Husband complaints?                                                 (no)

Micturition painful?                                                   (no)

Contraception?                                                           (sterilisation 7 years ago)

Last cervical smear? Result?                                     (4 years ago. PAP 1)

Medicins?                                                                  (no)

Had these complaints before?                        (no)

 

 

Task II:

Speculum examination

 

Item 2:

Preparation I (materials).

Students prepares all the material which is needed to carry out a cervial smear

 

Item 3:

Preparation II.

The patients bladder should be empty.

Student sits, enlighted the lamp and direct the shine, puts on gloves and lubricates the speculum.

 

Item 4:

Technique of the speculum examination.

Brings in the speculum:          spreads the labia

puts the speculum in 45 ° in the vaginal axis turns the speculum in neutral position shows portio by slowly opening the speculum

Removes the speculum:           pulls back the speculum slowly

inspects the vagina removes speculum while a little opened, 45 ° in vaginal axis

 

 

 

Item 5:

Findings.

Content vagina:           blood? discharge?

Portio:                        position, seize, surface, erythroplaky, colour?

Ostium:                       closed, opened, discharge?

Vagina:                       colour?

 

(In the model no discharge can be seen.)

 

Task III:

Cervical smear.

Item 6:

Technique cervical smear.

Endocervix:     cytobrush, turns round twice 360 °.

Ectocervix:      places Ayre spatula in ostium en turns round twice 360 °.

 

Item 7:

Technique slides.

Student marks one slide with E, the other one with P for respectivily endo- and ectocervical material.

Fixates slides immediately after speading the material on the slides.

Puts slides in dispatch box.

 

Task IV:

Item 8:

(Examiner shows photographs)

1 > Macroscopical aspect of vaginal discharge. The student has to describe the colour, the

amount and the aspect of the discharge, and the colour of vagina and portio.

 

2 > A microscopic preparation with bacteries, leucocytes and some 'clue cells'.

 

Item 9:

Most likely diagnosis:

Bacterial vaginosis

(or Gardnerella vaginalis, non-specific vaginitis)

 

The examiner collects all the material and prepares the station for the next student.


CHECKLIST

                                                                       year:                           1996-1997

year group:                4

station no:                  07158

no. of items:               9

examiner no

ID no. Student

______________________________________________________________________________

Field:                          gynaecology / obstetrics

 

Station:                       vaginal discharge

_________________________________________________________________________________

                                                                        good   suff.     neutr.   insuff.  poor    absent

 

            Task I:

 

            History

1          Asks the right questions                       0         0          0          0          0          0

 

            Task II:

 

            Speculum examination

2          Preparation I (materials)                     0         0          0          0          0          0

 

3          Preparation II                                      0         0          0          0          0          0

 

4          Technique                                            0         0          0          0          0          0

 

5          Findings                                              0         0          0          0          0          0

 

            Task III:

 

            Cervical smear

6          Technique cervical smear                    0          0          0          0          0          0

 

7          Technique slides                                  0          0          0          0          0          0

 

            Task IV:

 

            Interpretation photographs

8          Photo 1                                                0          0          0          0          0          0

 

9          Photo 2                                                0                                              0          0

_________________________________________________________________________________

 

Evaluation:


Criterialist:                             nr 89008

 

Field:                                      Integrated: abdomen and communication skills

 

Station:                                   PAIN IN THE UPPER ABDOMEN

 

Production date:                     november 1996

 

Drawn up by:                          Jano Havas

 

Intended for:                           6th year students 1996-1997

 

Date of examination:              june 26th 1997

 

Time:                                      30 minutes

 

Simulated patient:                   female (age: 40 years)

 

Instruments needed:                stethoscope


STUDENT'S TASKs

 

In this station medical technical skills aswell as communication skills are assessed.

 

You are in your clercship General Practice.

The patient waiting for you has come to see you for taking medical advice.

 

Kindly perform a consultation with this patient.

Information of the patient you can read at the chart audit.

 

After the physical examination has finished, while the patient get dressed, the examiner will ask you 3 questions:

 

1          Which is the most likely diagnosis?

 

2          Which is the differential diagnosis?

 

Now,   the examiner will give you the right diagnosis.

 

3          What management (therapy and further investigations) you think are needed with this diagnosis?

 

Now, you can finish the consultation with the patient. The patient has been informed about this procedure.

 

You have 30 minutes to fullfill your task.

If time is left, this can be used for feed-back.

 

If you understood this task, please call the patient.

 

Good luck!


INSTRUCTION FOR THE SIMULATED PATIENT


EXAMINER'S INTSRUCTIONS: MEDICAL TECHNICAL PART (1)

 

In this station two examiners will be present: one for the medical technical part (1) and

one for the communication part (II).

 

In this criteria list a 6-point scale will be used.

The examioners instruction gives a global description of actions the student has to

performe.

 

 

Item 27:

Eleboration of chief complaint

-           since one week pain in the abdomen

-           continuous

-           worse half an hour after eating

-           a gnawing pain

-           under the sternum, an area of about 3 centimeters.

 

Item 28:

Associated symptoms

-          nausea                                                             (yes)

-          vomiting                                                         (once yesterday, no blood)

-          pyrosis                                                                       (for years)

-          had these complaints before                           (yes)

 

Item 29:

Past medical history / family history / intoxications

-           operated before

-           family history

-          smoking                                                          (3-5 cigaretts/day)

-           alcohol                                                                        (in weekend 1-2 beers)

-          coffee                                                             (normaly: 5/day, this week:1-2)

-          medicins                                                         (none, only OTC anti-acid)

 

Item 30:

The student pays attention to:

-           general impression: Is patient in pain?

-           colour of the sclerae?

-           colour of the skin?

-           posture of the patient?

 

Item 31:

Inspection

The sudent pays attention to:

-          defects of the skin over  the abdomen (scars, icteric?)

-          shape of the abdomen (symmetrie or not?)

 

 

 

 

Item 32:

Auscultation

The student:

-          listens at least in four regions of the abdomen

-          pays attention to peristaltics and murmers

 

Item 33:

Percussion

The student:

-           performs percussion at least in four regions of the abdomen and of the liver and spleen

-           pays attention to abdominal sounds and percussion pain

 

Item 34:

Palpation

The student:

-           performs superficial and deep palpation, at least in four regions of the abdomen and finds this is painfull in the epigastric area and finds some active defense

-           performs palaption of the colon, liver, galbladder and kidneys

 

Item 36:

The student:

-           palpates the lymphe-nodes of Virchow (supraclavicular) (if needed the examiner asks for it)

 

Item 37:

The student:

-           wants to perform a rectal examination (the examiner gives the fidings: nothing particular)

 

Item 38:

Most likely diagnosis

-           ulcus venticuli/duodeni

 

Item 39:

Differentialdiagnosis

-           non specific gastritis

-           malignant ulcer of the stomach

-           pathology of the pancreas (pancretitis, carcinoma)

-           pathology of the galbladder (cholecystitis, cholelithiasis)

-           pathology of the colon (constipation, colitis, IBS)

 

Now, the examiner gives the right diagnosis to the student.

 

Item 40:

Management

advices

Stop:

-          smoking

-          drinking alcohol

-          drinking coffee

-          other food wich worsened the complaints

 

The patient may not take:

-          medication such as aspirin and NSAID

 

therapy

2-6    weeks:

-        cimetidine 1 d 800 mg or 2 d 400 mg

-        ranitidine 1 d 300 mg or 2 d 150 mg

 

The student:

-          asks patient to visit the surgery after this period

 

If:

-          complaints disappeared: stop medication

-           not: continue medication untill the 8th week, patient has to visit the surgery again after this period

 

Item 41:

Further investigation:

Gastroscopy

Biopsies and search for Helicobacter Pylori


EXAMINER'S INTSRUCTIONS: COMMUNICATION PART (II)

 

In this station two examiners will be present: one for the medical technical part (1) and

one for the connnunication part (II).

 

In this critria list a 6-point scale will be used.


CHECKLIST                                       year                                1996-1997

                                                            year group                      6

                                                            station no                       89008

                                                            no. of items

                                                            examiner no

                                                            ID no. student

_________________________________________________________________________________

Field                                        abdomen/communication

 

Station                                     pain in the upper abdomen

_________________________________________________________________________________

                                                                        good   suff.     neutr.   insuff.  poor    absent

 

History

 

v    Chief complaint                                         0         0          0          0          0          0

 

v    Associated symptoms                                0         0          0          0          0          0

 

v    History                                                      0         0          0          0          0          0

 

Physical examination

 

v    General inspection                                    0         0          0          0          0          0

 

      Exmination of the abdomen:

 

v    Inspection                                                  0         0          0          0          0          0

 

v    Auscultation                                              0         0          0          0          0          0

 

v    Percussion                                                 0         0          0          0          0          0

 

v    Palpation                                                   0         0          0          0          0          0

 

      Right order:

      insp.- auc.- perc.- palp.                             0         0          0          0          0          0

 

v    Special examination                                  0         0          0          0          0          0

 

v    Rectal examination                                    0         0          0          0          0          0

 

Diagnosis

 

v    Most likely diagnosis                                0                                            0          0

 

v    Differential diagnosis                              0          0          0          0          0          0

 

Management

v    Advices and therapy                                  0          0          0          0          0          0

 

v    Further investigation                                 0          0          0          0          0          0

 

______________________________________________________________________________

Evaluation:


     MAAS-Global, score list                                                                                                   November 1, 1992

     © J. van Thiel, H. Kraan, J. van Dalen

          Univexsity of Limburg, Mautricht, 7be Netherlands                                                         EVALUATIOI <FORMATION                                                                1 2 3 4 5 (> 7

Doctor:           name

                        group

                        number

 

case

patient

observer

 

Interpretation scal 1 through 7:

 

1 = absent or very bad                        4 = doubtful                5 = sufficient

2 = bad                                                                       6 = good

3 = insufficient                                                           7 = excellent

 

Consult the criteria list MAAS-R2 and MAAS-Global (not yet available in Englisch) if you are not sure about the interpretation of an item. Score-boxes serve only as memory aid. The ultimate scoring is by global judgement.

 

SKILLS PER PHASE

 

FOLLOW-UP CONSULTATION                                          1  2  3  4  5  6  7

Recapitulates complaintsand

       questions of last consultation                              o

recapitulates managament plan                                  o

checks fot performance of plan                                  o

checks for effect on course                                        o

 

ENTRY                                                                                 1  2  3  4  5  6  7

tells name and function                                              o

asks for or verifies personalia                                   o

 

GLOBAL ORIENTATION                                                    1  2  3  4  5  6  7

short oriantation on complaint

            and degree of suffering                                   o

questions other reasons for visit                                o

 

REQUEST FOR HELP                                                          1  2  3  4  5  6  7

mentions/explores reuest for help,

            Wishes or expectations                                  o

inducement of visit now                                             o

explores open in frame of reference of patient           o

responds to cues                                                        o

 


QUESTIONNING DURING HISTORY-TAKING                1  2  3  4  5  6  7

variety in questions                                                    o

relevancy of questions (made) clear                          o

is allert for following by patient                                o

 

PHYSICAL EXAMINATION                                               1  2  3  4  5  6  7

instructs about undressing                                          o

informs about examination                                         o

treats patient with care and respect                            o

 

EVALUATION INFORMATION                                          1  2  3  4  5  6  7

informs about findings ans (provisional) diagnosis   o

tells aetiology and prognosis                                     o

 

MANAGEMENT PLAN                                                       1  2  3  4  5  6  7

deliberation as well as proposal                               o

alternatives, advantages and disadvantages               o

feasibility ans compliance                                         o

arrangements who, what, when                                  o

 

 

EVALUATION OF THE CONSULTATION             1  2  3  4  5  6  7

general question, answering the request for               G

        help, discussing our own working method

 

GENERAL SKILLS

 

PROVIDING INFORMATION                                             1  2  3  4  5  6  7

announcement, catagorizing                                       o

in small amounts, concrete explanation                      o

comprehensible language                                           o

ask about reaction and comprehension                       o

 

EMOTIONS                                                                          1  2  3  4  5  6  7

ask for/ explores feelings                                           o

reflections of feelings (incl. nature and intesisty)      o

assimilative reactions: deals first with feelings         o

sufficiently in entire consultation                               o

 

SUMMARIES                                                                        1  2  3  4  5  6  7

(recapitulations, paraphrases or summaries)

concise, in own words                                               o

correct for content, complete                                     o

checking                                                                     o

sufficiently in entire consultation                               o

 

 

 

 

 

ORDERING                                                                          1  2  3  4  5  6  7

announcemnts (diagnostic procedure,            o

   history-taking, examination, other phases)

distributes available time well-balanced                   o

explores request for help mainly in the beginning      o

management plan after evaluation/information           o

 

NATURALNESS                                                                   1  2  3  4  5  6  7

flexible communicative behaviour                             o

no diruptive hesitations                                              o

spontaneous and natural                                             o

attunes own style to patient                                        o

 

EMPATHY                                                                            1  2  3  4  5  6  7

attitude emphatic, attentive, inviting

  by word, behaviour and eye contact             o

when conflict: room for patient’s arguments

   as well as for own arguments                                 o

 

FURTHER FEEDBACK:


 

Criterialist                   :           nr 02195

 

Field                            :           Therapeutic Skills

 

Station                        :           SUTURING

 

Production date           :           september 1996

 

Drawn up by               :           Bert Zonneveld

 

Intended for                :           4th year students 1996-1997

 

Time                           :           10 minutes

 

Instruments needed     :


STUDENT'S TASK

 

In this station medical technical skills are assessed.

 

On the cushion you see a wound. The wound has been inspected and cleaned.

 

Perform infiltration anaesthesia.

Put on the sterile gloves and apply two superficial sutures.

 

Throughout the procedure mention aloud what you are doing.

 

Good luck!

 


EXAMINER'S INSTRUCTION

 

Please remove the sutures before the next student enters the room. Display all instruments

in the same way as at the start of the test.

 

In this criterialist a 3-point scale will be used.

Judge whether the student performs well or wrong, or if he/she didn't perform at all.

 

Item 16

Two sintels back, one forward and again one sintel back.

 

Item 18 and 19

The student performs in such a way that there is no chance to contaminate him/herself.

 


CHECKLIST

                                                                       year                 :           1996-1997

                                                                       yeargroup        :           4

                                                                       station no        :           02195

                                                                       no. of items     :           19

                                                                       examiner no.   :

                                                                       ID no. student  :          

 

Field:                          therapeutical skills

 

Station:                       suturing

 

                                                                                   good                wrong              didnot do

 

            Material for anaesthetics

1          5 ml syringe                                                   0                      0                      0

 

2          2 injection needles 0.8 x 40 mm.                    0                      0                      0

 

3          desinfectant                                                    0                      0                      0

 

 

            Technique

4          The student controls the injection fluid          0                      0                      0

 

5          The top of the desk is disinfected                   0                      0                      0

 

6          The fluid is drawn into the syringe                 0                      0                      0

 

7                 The needle is replaced by a sterile needle

Before injection                                             0                      0                      0

 

8          The syringe is clear of air-bubbels                0                      0                      0

 

The administration of the anesthetics

9          The skin is disinfected                                   0                      0                      0

 

The examiner now tells the student that he/she can contine suturing supposing that the anaesthesia is administrered

 

 

Suturing

10              The student aseptically puts on

the sterile gloves                                            0                      0                      0

 


11              The wound is spread to determine                 0                      0                      0

its depth

 

12        The student graps the edge of the wound        0                      0                      0

            with the tissue forceps at the piont where

            the needle is to be insert

 

13        The student inserts the needle peripend-        0                      0                      0

icularly to the skin

 

14              Across the bottom of the wound the student   0                      0                      0

Inserts the needle through the opposite edge

of the wound

 

15        The second piont of insertion lies directly     0                      0                      0

            opposite the first, at a similar distance of

            the edge of the wound (4 mm)

 

16*      The suture is made with use of the                 0                      0                      0

            fixation forceps

 

17        The knot is located on the piont of                 0                      0                      0

            insertion

 

Results

18*      The sterile materials have remaind                0                      0                      0

            uncontaminated

 

19*      The wound has remaind uncontaminated        0                      0                      0

            by clothing or skin

 

20        The edges of the wound have approximated  0                      0                      0

            properly

 

 

Feedback:

 


3.        

 


Maastricht Progress Test

Faculty of Medicine

-        Extracts from progress test September 1990


STUDENT ASSESSMENT PROJECT (SAP)

 

Progress Test September 1990

 

UNIVERSITY OF LIMBURG

 

Faculty of Medicine


 

INSTRUCTIONS

 

·        Read these instructions carefully before you start.

 

·        Check if there are any pages missing in your copy of the test and ask for a new copy if necessary.

 

·        Each question comprises one or more statements which must all be answered SEPARATELY. The numbers of the statements correspond to the numbers on the answer form.

 

·        In questions 162 and 163, reference is made to photo no. 199. This photo is given on a supplementary sheet.

 

·        Some questions contain a piece of text between brackets which serves to clarify the question. This is meant as supplementary and ALWAYS CORRECT information and as such does NOT need to be evaluated.

 

·        The answer form contains your name, examination number and year of study. Please do not make any alterations! Any mistakes should be reported to the supervisor of the Office of Educational Administration (Bureau Onderwijs).

 

·        Answer the questions by filling in one option box per question. This should be done with an HB (= soft) pencil. Read the relevant instructions on the answer form.

 

·        The answer form should be handed in no later than 1:00 p.m.

 

·        The result is calculated by subtracting the number of incorrect answers from the number of correct answers, with question marks being counted as zero; in other words:

correct = + 1

incorrect = - 1

? = 0.        Result = correct minus incorrect.

·        Comments on INDIVIDUAL QUESTIONS should be handed in on a separate sheet. These comments will be taken into account in deciding whether or not to cancel a particular question before the final results are computed. They should be well legible and should be handed in as soon as possible, but no later than next TUESDAY AFTERNOON at the office of the:STUDENT ASSESSMENT PROJECT (Project Evaluatie van Studieresultaten)

 

·        The answer key can be obtained at the information desk of the Office of Educational Administration (Randwijck) from 1:00 p.m..

 

·        To ensure that the examination runs smoothly, the relevant regulations are included on the final page of the booklet.

 

GOOD LUCK!


RESPIRATORY SYSTEM - CATEGORY I

questions 1 - 28

 

1.         Narrowing of the middle meatus of the nose (e.g. as a result of swelling of the mucosa) leads to obstruction of drainage from the maxillary sinus.     lit.:       Moore, Clin.Or.Anatomy, 2nd ed., pp. 957-958

 

Metal-fume fever is a not uncommon disorder in which the person affected complains of dry cough, nausea, sweating, shivering, malaise and fever and which is usually self- limiting.

2.         This disorder more frequently occurs after exposure to zinc fumes than after exposure to mercury vapour.

            lit.:       W.R. Parker, Occupational Lung Disorders, 1982, p. 454

 

Every T1-2 N0M0 staged squamous-cell carcinoma of the lung should be treated surgically, provided that the patient's condition allows the operation.

3.         In order to increase the chance of cure, pneumonectomy should, in the majority of cases, be given preference to lobectomy.     lit.:       de Boer, chirurgie, 1988, p. 636

 

            Possible treatments for acute airway obstruction in asthmatic patients are:

4.         inhalation of cromolyn;

5.         inhalation of a beta-2-adrenergic agonist;

6.         intravenous injection of aminophylline.          lit.:       Wesseling en Neef, Algemene Farmacotherapie, 1985. p. 527

 

A decrease in the elasticity of the lungs (compliance 4 L/kPa; normal value 2 L/kPa) leads to:

7.         an increase in functional residual capacity;

8.         an increase in tidal volume.

            lit.:       Bernards & Bouman, Fysiologie van de mens, 1988, chapter 16

 

A patient with a renal disorder shows metabolic acidosis.

9.         Hypoventilation contributes to compensating for thisacidosis.

            lit.:       Bernards & Bernards, Fysiologie van de mens, 1988, chapter 16

 

In most cases, acute bacterial inflammation of the nasal sinuses is accompanied by pain. This is usually pain above, behind or under the eye.

10.       Maxillary sinusitis is, in the majority of cases, accompanied by pain above the eye.

            lit.:       Jongkees, Keel-, Neus- en Oorheelkunde, 1983, p. 84

 

A five-year-old boy is brought to his family doctor's surgery with fever (39.5 oC*), swelling of the right upper eyelid and swelling round the nose. An inflammation of one of the nasal sinuses is a likely diagnosis.

11.       Frontal sinusitis is more probable than ethmoid sinusitis.

            lit.:       Jongkees, Keel-,Neus- en Oorheelkunde,1983, p. 88

 

A thirty-year-old man tells his family doctor that he suddenly got the shivers, followed by a rapid rise in temperature and within a few hours severe pain, connected with breathing, over the left side of his chest. He also complains of coughing and bringing up rust-coloured sputum. Physical examination reveals a very ill man with rapid, shallow respiration. On examination of the thorax, there is dullness of percussion at the lower left side of his back and there is a pleural friction rub.

12.       These symptoms are more likely a manifestation of bacterial pneunomia* than of pulmonary infarction.

            lit.:       Harrison's Principles of Internal Medicine, 7th ed., pp. 767, 936

 

The compliance of the lungs is defined as the ratio of the change in total lung capacity to the change in intrathoracic pressure.

13.       The compliance of the lungs of normal babies is greater than that of normal adults.

            lit.:       Nelson's Textbook of Paediatrics, p. 926

 

14.       Coxsackie viruses more frequently cause infections of the upper respiratory tract than of the lower respiratory tract.

            lit.:       Nelson's Textbook of Paediatrics, 1979, p. 1172

 

15.       In children with a cleft palate there is an increased risk of functional disorders of the auditory (Eustachian) tube.    lit.:       Gerlings, Keel-, Neus- en Oorheelkunde, 1979, p. 182

 

            Spirometry is used to measure certain lung volumes. The total lung capacity is defined as:

16.       the sum of residual volume plus inspiratory reserve volume;

17.       the sum of functional residual capacity plus tidal volume and inspiratory reserve volume;

18.       the sum of residual volume plus inspiratory vital capacity*.   lit.:       Bernards & Bouman, Fysiologie van de mens, 1977, p. 352

 

An acute form of extrinsic allergic alveolitis (e.g. farmer's lung) is accompanied by shortness of breath.

19.       Pulmonary function tests are more likely to reveal an obstructive functional disorder than a restrictive functional disorder in this case.      lit.:       Sluiter, Leerboek Longziekten, 1985, p. 413

 

Certain lung disorders can be accompanied by an increase in the level of angiotensin-converting enzyme (ACE) in the blood. Such disorders include:

20.       sarcoidosis;

21.       mucoviscidosis (cystic fibrosis).         lit.:       Sluiter, Leerboek Longziekten, 1985, p. 711

 

Primary bronchial carcinomas are often accompanied by infiltration of adjacent structures in the thoracic cavity. Involvement of the sympathetic trunk leads to a complex of signs and symptoms known as Horner's syndrome. These signs and symptoms include:

22.       ptosis of the eyelid;

23.       hoarseness;

24.       miosis. lit.:       Sluiter, Leerboek Longziekten, 1985, p. 258

 

A 24-year-old man is hospitalized because of an acute, severe attack of bronchial asthma which does not respond to his regular therapy (administration of salbutamol and beclomethasone by aerosol). On admission, the patient is agitated, his pulse rate is 120/min and the peak expiratory flow rate measures 80 l/min (normal value 600-650 l/min). Blood gas analysis reveals a normal pH and a normal arterial PO2 and PCO2.

25.       These findings form an indication to start artificial respiration.         lit.:       Sluiter, Leerboek Longziekten, 1985, p. 239

 

The position of the oxygen dissociation curve is determined by the P50, the oxygen tension needed to achieve 50% saturation of haemoglobin. A normal P50 value is 3.6 kPa. A P50 lower than the normal value indicates a shift of the oxygen dissociation curve to the left; a P50 higher than the normal value indicates a shift of the oxygen dissociation curve to the right.

26.       A left shift of the oxygen dissociation curve promotes oxygen release at tissue level more than a shift of this curve to the right.     lit.:       Sluiter, Leerboek Longziekten, 1985, p. 54

 

           

Vascular resistance is influenced by the arterial carbon dioxide tension.

27.       A decrease in the arterial carbon dioxide tension (hypocapnia) leads to increased vascular resistance in the cerebral circulation.          lit.:       Sluiter, Leerboek Longziekten, 1985, p. 61

 

Arachidonic acid is converted into prostaglandins and thromboxane A2 by the enzyme cyclooxygenase and into leukotrienes by the enzyme lipoxygenase.

28.       Administration of nonsteroidal anti-inflammatory drugs inhibits the synthesis of prostaglandins.           lit.:       Sluiter, Leerboek Longziekten, 1985, p. 121

 


4.        

 


The Thesis Supervision Experiment

           

Extract from Redistributing power in the classroom: the missing link in Problem-based learning. (see reference nr.7)
 

 

 


THE THESIS SUPERVISION EXPERIMENT

 

In:

 

 

 

REDISTRIBUTING POWER IN THE CLASSROOM: THE MISSING LINK IN PROBLEM-BASED LEARNING

 

 

A. Georges L. Romme

 

Maastricht University

Dept. of Management Sciences

P.O. Box 616, 6200 MD Maastricht, The Netherlands.

E-mail: s.romme@mw.unimaas.nl

 

 

 

                                                                   Forthcoming in:

Troy, J. et al. (eds.), Learning in a Changing Environment

Dordrecht/London/Boston: Kluwer Academic Publishers, 1998.

 

 

In order to find out whether applying the notion of circularity of power would stimulate active learning by students, an experiment was set up in the area of thesis supervision. From an experimental point of view thesis supervision involves an interesting setting for exploring the relationship between power and learning. It typically involves a formal relationship between student and supervisor in which the supervisor has all formal authority regarding the final assessment of the thesis. More specifically, the main proposition at the time was that a thesis circle, in which a group of students working on their masters thesis and their supervisor(s) collaborate on the basis of equivalence in decision-making (or circularity of power), will provide a learning system in which active learning, dialogue and collective problem-solving prevail.

The experiment was started when several students in the winter of 1995/96 expressed their preference to do a masters thesis and/or internship project in the area of circular organizing. These students were largely motivated by their experiences in an intensive skills course on Circular Organizing (given by the author) which is part of the 3rd year curriculum. Key steps in the experiment were the adoption of a set of rules for organizing the circle and circle meetings, and the development of an evaluation procedure based on the consent or “no objection” principle.

In April 1996 seven interested students were invited by the author (as the supervisor of their thesis projects) to participate in a start-up meeting. During this meeting the potential objectives and procedures of the circle were discussed. The decision was taken to focus on the supervision of thesis projects on the basis of circular principles, which basically implied students and the supervisor would share the responsibility for supervising, evaluating and grading a number of undergraduate thesis projects.

Since its start, the circle has met regularly over a total period of more than two years (with about fifteen meetings per year, and each meeting taking about 3,5 hours). Starting with eight members (including one supervisor) in April 1996, the circle grew to a membership of eighteen to twenty members (incl. two supervisors) in April 1998. In the first year of its existence, three students have completed their thesis (and masters degree). In the second year six students have completed their thesis projects. The formal arrangements of the circle are currently as follows:

·        In addition to the supervisors as permanent members, the membership of the circle includes students who are doing (or intend to do) a thesis project in the area of the theory and practice of circular organizing, under supervision of the circle.

·        In order to get access to the circle, the student should have acquired knowledge and skills in circular organizing on the level of a two-week intensive skills training (which is part of the 3rd year curriculum) or a similar course elswhere.

·        New members can be proposed by each current member of the circle.

·        The membership of the circle ends with the completion and final assessment of the masters thesis, except decided otherwise.

·        Each circle meeting proceeds according to a standard format involving four parts: (1) an opening round, (2) determination of the agenda (in principle on the basis of a proposal sent out to all members prior to the meeting by the secretary), (3) discussion and decisions on the agenda issues, and (4) a closing round.

·        Decisions are taken by way of the consent (“no argued objection”) principle; that is, a decision is taken when all participants have no argued objection against the proposed decision. Decision issues include, for example: the circle’s objectives, its work procedures (e.g., preparation of agenda), the election of the chairperson or secretary, proposals for research projects, the general criteria for assessment of a thesis, and the final assessment and grading of a masters thesis.

·        In the case of the final assessment of a masters thesis, a decision procedure is followed in which the student who wrote the thesis has no consent right; that is, (s)he can participate in the discussion but not in the formal decision round(s) in which the chair asks consent of each other member of the circle.

·        Both the chair and secretary are chosen for a limited period (e.g., three months) by way of an election procedure based on the consent principle.

·        The supervisors act as functional leaders of the circle, within certain boundaries that are determined by the university’s exam regulations and decisions taken by the circle. For example, the circle can delegate the authority to accept new members to the supervisor.

 

Results of the Experiment: Experiences and Critical Incidents

 

In this section the experiences with and (preliminary) outcomes of the thesis circle experiment are described and evaluated. The observations described here are those of researchers who are also actors in the system they are studying. In order to generate valid data, we have tried to publicly test all observations and conclusions presented. Part of this process was a special session of the circle in which the first year of the circle was evaluated, and critical episodes and situations were identified. This session focused on questions such as ‘What is essential to the meetings of this circle?’ and ‘Which incidents or events in the first year of this circle are noteworthy or important?’

One of the observations made was that the actual discussions on thesis projects during circle meetings developed into an ongoing dialogue between the members of the circle, without any extra effort needed to move in that direction. In the context of this dialogue, the traditional difference between the roles of supervisors and students to some extent disappeared. The conventional idea of the relationship between (thesis) supervisor and student can be described as an expert who is leading the student through her or his individual learning process. In the thesis circle this traditional notion of the role of the leader/supervisor was abandoned almost instantly, in favor of a role as educator, coach or facilitator. In other words, the supervisors acted as a useful resource rather than as someone in charge. Moreover, in this respect the supervisors appeared to act as a role model for student members who also started to learn how to use these kinds of skills, particularly in the area of coaching and facilitating problem solving by other students. Evidently, some students very quickly began to learn how to use the advocacy and inquiry skills of the supervisors.

Advocacy and inquiry in this respect particularly involves interpersonal communication skills that serve to stimulate the individual and the group to explore the deeper issues, images and problems inhibiting the learning process of this particular student. For example, a typical inquiry might be: “I’ve heard you talking about several ideas you have in mind here, but I would like to know what really motivates you in this project?”. A typical advocacy might involve recommendations for a certain methodological approach, theoretical idea, case study, or time schedule. Note that all kinds of defensive behaviours came into play during meetings: face saving, stereotyping, intellectualization, victimization, etcetera. But the point is that these defensive behaviours did not appear to inhibit learning, but rather they appeared to stimulate the learning process because the anxiety, and more general the emotional side of learning, could be discussed openly. Effective inquiry and advocacy here also involves remaining open, authentic and vulnerable, and in this way serves to create awareness and disarm these defensive behaviours.

 

Assessment of the Thesis

 

During the first year of the circle’s existence, a number of critical incidents can be identified. The most important critical incidents involved the assessment and grading of the first and second thesis projects that were presented to the circle in their final version. These first assessments by consent rule can be seen as the first critical tests of the extent to which the equivalence in decision-making would not be undermined by, for example, group pressure or differences in expertise or (informal) authority. In this respect, the experimental procedure we used in the first assessment (in August 1996) was not perceived as adequately, particularly regarding the interdependency and interaction between individual assessments at an (too) early stage in the discussion. In order to create more clarity in this area, we adapted the assessment procedure for the assessment of the second thesis (two months later). Our main reference point here was the election procedure developed for circular organizations (e.g., Endenburg, 1992).

This assessment procedure is based on the following general ideas. First, the assessment should be based on arguments and dialogue rather than authority. Second, the process leading to the final assessment should start as open as possible; that is, interdependency and interaction between the initial individual assessments should be diminished as much as possible. Third, a process of argumentation and dialogue should subsequently serve to increase interdependency and interaction in order to come to a final grade that is well-argued and broadly accepted. The procedure involves the following steps:

·        Before the meeting in which the assessment takes place, all circle members are informed in writing about the assessment (as an item on the agenda); they all receive a copy of the thesis at least one week before the meeting.

·        The first step during the meeting is to establish the criteria on which the assessment will be based; these criteria should conform to the university’s formal regulations, although they will tend to be more specific than the latter. In principle, a set of criteria has been established before the individual student starts working on his or her thesis, but in assessing the final version of a particular thesis these criteria may be supplemented with more specific criteria related to the nature of this thesis project.

·        Subsequently, each participant is asked to write his/her own name and the proposed grade (the quantitative assessment) of the thesis on a piece of paper. The formulation of this proposal is, for example: “I, Paul propose a 7.” This step is a crucial one, because it reduces the initial interdependency among individual assessments to a minimum.

·        All proposal forms are handed in with the chairperson, who then starts asking each participant to state the arguments which prompted him or her to propose this particular grade. For example, the chair will ask: “John, you proposed a 6 for Maria’s thesis, could you motivate this proposal?”. The chair makes sure there are no discussions when each participant argues for his or her proposal.

·        The next step is for the chair to go around the group once again, now asking whether anyone would like to change his proposal in view of the arguments heard in the previous round. This step is the first one where interaction between arguments and proposals is deliberately allowed, and the discussion is led by the chair in order to make it as open and visible as possible. Typically, most participants will stick to their initial proposal, but some may want to change their proposed grade, for example, because “having heard the arguments on the readability of Maria’s thesis given by John, I would like to change my proposed grade from 7 to 6.”

·        Now an open, relatively unstructured discussion may develop, in which arguments are tested, questioned, clarified, compared, ranked, and so forth. This step is in practice sometimes skipped, particularly when the chair feels the arguments and proposals are converging to a large extent. (In that case, the chair moves on to the next step.)

·        At some point, the chair will propose to decide on a certain grade motivated by a summary of the main arguments raised thus far. For example, the chair may propose to assess Maria’s thesis with a “6 as final grade, in view of its high practical relevance and well-designed structure as strong points, but its readability as the main weak point.” The chair will go around the group to ask each individual participant to give consent to this proposal. At this stage, at least one participant typically withholds his or her consent. Depending on the (additional) arguments given and any subsequent discussion(s) the chair will then either adapt his/her earlier arguments for the same proposed grade or move to another proposal. This process continues until the circle agrees on a proposal by consent of all those participating in the same (or possibly the next) meeting.

 

This assessment procedure appeared to work very well in case of the second thesis evaluated, and since then we have been using it in all subsequent assessments. Note that most students, when entering the circle, already were familiar with this kind of procedure, as a result of having participated in a course in Circular Organizing in which a similar procedure is studied and used, for example, to choose a chairperson. Moreover, the latter procedure was also used several times during the first year of the circle for choosing its (new) chairperson and secretary. Thus, by the time we started developing and applying this assessment procedure, most participants already were familiar with the key ideas behind this assessment system.

It should be recognized that, given the fact that most students working on their thesis tend to complete their project close to a certain deadline (e.g., determined by bursary arrangements or university restrictions, such as the maximum period of enrollment), there may be a lot of “external” pressure for convergence toward a consent decision. In fact, this kind of personal interest of the student submitting the thesis for evaluation has played a substantial role in at least one masters thesis that was evaluated in the circle’s first year.

 

Other Critical Incidents and Observations

 

Another kind of critical incident involved a number of situations in which a student member criticized certain initiatives or interventions of one of the supervisors. For example, one of these incidents involved a student who recently started exploring the area where he wanted to do research for his masters thesis, but who felt that the supervisor tried to demotivate him from going a certain direction he himself strongly preferred, whereas at the same time the same supervisor appeared to be moving in that same direction outside the circle! When this student raised this issue during a meeting, the student and supervisor got the opportunity to clarify, explore and adjust their intentions and expectations. What appears to be essential for students in this and several other cases, was the real opportunity they had (and felt) to express their anxiety, doubts, critical questions, etcetera. For the supervisors, these situations serve to built awareness of the implications of their actions inside and outside meetings, particularly regarding the effectiveness of their interventions in the learning processes of students. In more general terms, the experience of all participants in this circle’s meetings is perhaps best described in the words of one of the student members:

“In all tutorial groups I did, I was always struggling with two problems: ‘how to get my ideas or points across’, and ‘how to make sure I’m being heard?’; the latter issue is no longer of concern to me in this circle, I don’t have to worry about having and using opportunities to participate in the discussion, so I can concentrate entirely on the contents.”

Finally, it should be acknowledged that this experiment may have profited from several beneficial conditions and circumstances. For example, the experiment almost immediately raised the interest of the community of practitioners, in the form of suggestions and proposals for research projects and internships. In addition, the supervisors had been experimenting with circular organizing and advocacy and inquiry skills long before the start of this project. These conditions may have provided extra support in trying to overcome the effects of a limited learning system in order to move toward a learning system in the true sense of the word.

 

 

 

Concluding Remarks

 

To a large extent both students and teachers are apparently thrown in at the deep end in the PBL-based curriculum (cf. Keizer, 1995). That is, students as well as teachers are far from well-prepared for the fundamental transformation that should take place when adopting a problem-based learning approach based on the idea of self-directed learning, both at the individual and the group level. The work of Argyris and others suggests that this transformation from a limited to an effective learning system requires shared leadership and control as well as skills in the area of advocacy and inquiry which serve to confront and remove defensive routines inhibiting learning.

Moreover, on the basis of ideas about the circularity of power we argued that sharing leadership and control between students and professors would reinforce the self-directed nature and thus effectiveness of PBL, particularly in the area of dialogue and problem-solving. This initial hypothesis was not falsified by the outcomes of the thesis circle experiment described in the previous section. The discussions during meetings can be described in terms of an ongoing flow of ideas, problems and solutions, which is characteristic to dialogue (Senge, 1990). The nature of collective problem-solving was perhaps the most unexpected outcome. Of crucial importance appears to be that several students quickly learned to use inquiry and advocacy skills, and also that a final assessment approach consistent with the notion of circularity of power was developed.

Thus, the effectiveness of PBL (and perhaps other learning approaches) as an effective learning system would probably benefit from shared control over the learning process including assessment and evaluation. In this respect, an effective learning system is based on shared leadership and control and core values such as valid information, free and informed choice, and internal commitment. The principle of shared control and leadership easily becomes an ideological device rather than a practical tool to improve the effectiveness of the learning system, particularly if it is not organized in a straightforward manner. The notion of circularity of power, and more broadly the circular organization approach, appears to provide such a practical tool.

Note in this respect that assessment is a controversial and somewhat neglected issue in self-directed and problem-based education (Rogers, 1969; Williams, 1992). Some programmes tend to rely on self-evaluation as an input for the final decision of the instructor or tutor (Rogers, 1969). In other programmes, there are no examinations at all, and tutors carry out the assessment on the basis of direct observation of the student’s learning process (Williams, 1992). There are also programmes that apparently rely on examinations in the traditional sense, and again others have developed examinations that try to test the problem-solving abilities students have developed (Williams, 1992). Because the goal of PBL is discovery on the part of the students, a self-managed approach which incorporates assessment by all parties involved appears to be essential. The experiences in the thesis circle show that an assessment procedure which incorporates elements from self-assessment, peer assessment and assessment by the supervisor may provide an instrument which reinforces rather than inhibits self-directed learning.

In sum, the thesis circle experiment suggests redistributing formal power on the basis of circularity can be an important next step in the development of the problem-based learning method. However, it should be noted that the outcomes of this single experiment may also have been produced by a number of other (beneficial) conditions. Therefore, a new thesis circle in another department of Maastricht University was recently set up, and in addition, experiments in other parts of the undergraduate curriculum in Maastricht are currently being conducted.


5.       Reference list to the assessment procedures used in Maastricht

 

 


1.             Albano, M. G., Cavallo, F., Hoogenboom, R., Magni, F., Majoor, G., Manenti, F., Schuwirth, L., Stiegler, I., & & Van der Vleuten, C. (1996). An internatinal comparison of knowledge levels of medical students : the Maastricht Progress Test. Medical Education(30), 239-245.

 

2.             Driessen, E. W., Van der Vleuten, C. P. M., & Van Berkel, H. J. M. (1998 (in press)). Beyond the Multiple-choice v. Essay Questions Controversy: combining the best of both worlds. Journal of Legal Education. .

 

3.             Jansen, J. J. M., Tan, L. H. C., Van der Vleuten, C. P. M., Van Luijk, S. J., Rethans, J. J., & Grol, R. P. T. M. (1995). Assessment of competence in technical clinical skills of general practitioners. Medical Education, 29, 247-253.

 

4.             Muijtjens, A. M. M., Hoogenboom, R. J. I., Verwijnen, G. M., & Van der Vleuten, C. P. M. (1998). Relative or absolute standards in assessing medical knowledge using progress tests. Advances in Medical Sciences Education, 3(2), 81-87.

 

5.             Newble, D., Dawson, B., Dauphinee, D., Page, G., Macdonald, M., Swanson, D., Mulholland, H., Thomson, A., & Van der Vleuten, C. P. M. (1994). Guidelines for assessing clinical competence. Teaching and Learning in Medicine, 6(3), 213-220.

 

6.             Perrenet, J. (1997). Between Aalborg and Maastricht: student assessment at knowledge engineering. In M.Wassenberg & H. Philipsen (Eds.), Placing the student at the centre: current implementations of student-centred education (pp. 143-148). Maastricht: Maastricht University.

 

7.             Romme, G. A. L. (1998 (in press)). Redistributing power in the classroom: the missing link in Problem-based learning. In J. Troy & e. al (Eds.), Learning in a changing environment . Dordrecht/London/Boston: Kluwer Academic Publishers.

 

8.             Segers, M. S. R. (1997). An alternative for assessing problem-solving skills : the overall test studies in educational evaluation. Educational Evaluation, 23(4), 373-398.

 

9.             Segers, M. S. R., Dochy, F. J. R. C., & Sluijsmans, D. (1999 (accepted)). The use of self-, peer- and co-assessment in higher education: a literature review. Studies in Higher Education. .

 

10.         Van der Vleuten, C. P. M. (1996). The assessment of professional competence: developments, research and practical implications. Advances in Health Sciences Education, 1(1), 41-67. .

 

11.         Verhoeven, B. H., Verwijnen, G. M., Scherpbier, A. J. J. A., Holdrinet, R. S. G., Oeseburg, B., Bulte, J. A., & Van der Vleuten, C. P. M. (1998). An analysis of progress test results of PBL and non-PBL students. Medical Teacher, 20(4), 310-316.

 

12.         Van der Vleuten, C. P. M., & Swanson, D. B. (1990). Assessment of clinical skills with standardized patients: State of the Art. Teaching and Learning in Medicine, 2(2), 58-76. .

 

 

6.       Recommended reading

 

 


1.     Van der Vleuten, C. P. M., Scherpbier, A.J.J.A., Wijnen, W.H.F.W., & Snellen, H.A.M. (1996). Flexibility in learning: a case report on problem-based learning. International Higher Education(2), 17-24.

 

2.     Van der Vleuten, C. P. M. (1996). The assessment of professional competence: developments, research and practical implications. Advances in Health Sciences Education, 1(1), 41-67.

 


 

FLEXIBILITY IN LEARNING:

 

a case report on problem-based learning

 

 

C.P.M. van der Vleuten, A.J.J.A. Scherpbier, W.H.F.W. Wijnen, H.A.M. Snellen

 

University of Limburg, Maastricht, The Netherlands

 

 

Abstract

 

The need for change in higher education has had quite some attention in recent years. Societal needs require educational systems to produce graduates better equipped with highly specialized and qualitatively superior professional skills. Economical needs require educational programmes to be efficient and cost-effective. The vanishing boundaries between countries requires educational systems to be transparent and internationally orientated. Developments in science has led to an explosion of knowledge forcing educational systems to be dynamic and flexible. The rapid change in knowledge forces educational systems to emphasize learning skills and maintenance of competence, rather than the provision of knowledge alone. Educational technology obliges educational programmes to use multimedia and computer technology. Progression in educational theory requires educational systems to activate the learner and to critically reflect upon traditionally accepted adagia of educational practice.

 

In this context it is argued that flexible educational systems require a shift from teaching programmes towards learning programmes. The distinctive characteristics of both approaches will be outlined. Subsequently, these principles will be illustrated by an explanation of an existing learning programme. This programme uses problem-based learning as an instructional method. A week of a medical student will be taken to explain a number of educational principles, including self-directed learning, choice of teaching and learning formats, assessment of achievement, and curricular and organisational management.

 

Introduction

 

Educational programmes in higher education, particularly in Europe, have a long standing tradition. The fundamental bricks of the teaching methods used in these programmes have not changed since several hundred years and perhaps even longer than that. Teaching is an activity which has been modeled by our own teachers, has been copied to our own teaching activities and will serve again as a model to our students. Many teachers actually have not been trained or specifically prepared for their teaching roles. With a certification in their discipline most teachers are assumed to be qualified, usually for life, for their teaching tasks. It is therefore not surprising that educational programmes and teaching activities are mainly governed by tradition. As far as changes occur in educational programmes, they are usually restricted to changes of content, but are hardly ever related to the underlying concepts of teaching.

 

The question to be raised is whether this situation is desirable. In this article we will reflect upon a number of reasons for the necessity of change in education and challenge some of the more fundamental assumptions of regular teaching programmes. We will subsequently discuss a new educational method called problem-based learning (Barrows & Tamblyn, 1980). By no means this new model should be considered as the golden standard for innovative education, but as one attempt to change educational programmes. The purpose is to critically reflect upon education and not to 'sell' the educational model. It is truly a case report in order to demonstrate the viability of educational innovation, and the reader should realize that the model described is probably one among the many options. Before describing the model we will review some of the reasons for educational change and discuss general characteristics of existing and desirable educational programmes.

 

Reasons for educational change

 

In this century, and particularly after the second World War, virtually all Western countries have undergone the same change in higher education for obvious political and societal reasons: a larger part of the population has taken part in higher education training. The number of students has therefore dramatically increased. Not only did this increase require a further change from the classic apprenticeship model in teaching used in the last centuries, it also required a substantial investment of resources. Governments nowadays have a problem of trying to control the continuously growing budget for (higher) education. These economical reasons have led many governments to urge for reduction of cost and of quality control in education. Economical pressure forces educational programmes to consider change from a production perspective rather than an academic perspective: how many graduates of a particular quality can we produce in a particular amount of time? The consideration of efficiency and effectiveness is a completely new issue for most educational programmes. Although the various European countries still greatly differ in the respect that economic reasons affect education, it is a matter of time that the effect will be universal. The exclusive reliance on academic criteria for defending the quality of educational programmes will become increasingly difficult to maintain.

 

In a similar way changes in the society induces changes in education. The independent academic position of universities and other institutions of higher education is increasingly challenged. In business, engineering, health care and other fields particular expertise profiles and skills are emphasized, adapted to modern needs in these fields. These needs are often badly met by educational programmes. These societal reasons will have an increasing impact on change in education.

 

An important reason for change in education is the advancement of science and the explosion of knowledge. The problem of selection and coverage of content is an emerging one and many educational programmes suffer from 'overload'. Moreover, progressive advancement of scientific knowledge will make any curriculum outdated within a few years. Therefore, life-long learning skills are more essential than the consumption of temporary knowledge. The fostering of these skills must be a task of educational programmes.

 

The European community allows any graduate to work within any other country of the community. The vanishing boundaries will require educational programmes to change. This will require critical appraisal of licensure requirements. The preparation of professionals capable of operating in an international context will become more important. Hence, internationalisation will require education to change.

 

The rise of information technology has an overall effect on society in general and will provide particular challenges for education. Information technology provides new carriers of information and can make learning less location and time dependent.

 

Finally, the necessity of change in education is induced by progression in educational theory. Quite some knowledge has accumulated with regard to what conditions facilitate learning and how individuals mature from novices to professional experts. The need of meaningful contexts for storing and retrieving information, the importance of repetition of content, the recognance of student learning strategies, the educational impact of examinations, the tools developed for quality control, the utility of organisational strategies for managing educational programmes are just a few areas where educational theory has something to offer. Teachers being professional educators should be aware of this kind of information. Its use should be part of their professionalism and scholarship.

 

Teaching and learning

 

As the above makes clear, we take the position that there are sufficient reasons that point to the necessity of change in education. However, the question coming to mind is the direction of that change. Where should it lead to; what is the target or objective? In addressing that question we would like to make a fundamental assertion. We would argue that a distinction is in order between teaching and learning. We notice that both concepts are used interchangeably: we tend to take for granted that teaching leads to learning. In discussing educational programmes we automatically speak of teaching activities. Yet we would like to argue that both concepts are quite different and that the mission of educational change should emphasize the learning aspect rather than the teaching aspect. After all, learning is what educational programmes should be about, teaching is a vehicle, or better one of the vehicles, to achieve learning.

 

The educational programme of the future should be a learning programme rather than a teaching programme. To describe what we mean by a learning programme a number of descriptors related to teaching and learning programmes are contrasted in figure 1. We will not discuss each entry in the figure but will restrict ourselves to an overall characterisation.

 

Figure 1: Characteristics of teaching programmes versus learning programmes.

 

Teaching Programmes                                        Learning Programmes

 

! Knowledge transfer                                            ! Knowledge acquisition

! Teacher centered                                               ! Student centered

! Static and rigid                                                   ! Dynamic and flexible

! Teaching objectives                                           ! Learning objectives

! Uniform                                                               ! Individual

! Reinforces passiveness                                     ! Reinforces activeness

! Students are led                                                 ! Students may discover

! Learning paths are described                             ! Learning paths are offered

! Teachers provide answers                                 ! Teachers ask questions

! Teachers direct students                                    ! Teachers guide students

! Teaching is essential                                         ! Learning is essential

! Lectures are essential                                        ! Assessment is essential

! Lecture halls are essential                                 ! Library and learning facilities are essential

! Supply is essential                                             ! Demand is essential

! Location dependent                                            ! Location independent

! Time dependent                                                 ! Time independent

! Uniform study pace                                            ! Individual study pace

! Uniform study sequence                                     ! Variable study sequence

! Uniform content                                                  ! Variable content

! Teachers work in isolation                                 ! Teachers work in collaboration

 

In a learning programme the centre of the universe is the student. The key issue is to create an environment that stimulates the student to actively acquire knowledge (and skills, attitudes, etc.). In stead of being a (passive) consumer of learning material prescribed by the teacher, the student should become responsible for seeking information offered by the teacher. An active learning attitude is essential in order to achieve self-directed learning skills. This should be the basis for life-long learning. After graduation no teacher will be available to provide further directions, while current knowledge will rapidly decay and professional skills need to be further developed. In stead of lectures, individual learning and learning in peer groups becomes important. In stead of lecture halls library and learning facilities become essential. Rather than stacking memorised information to pass the next examination, information should be used to understand phenomena or problems and knowledge should be not displayed but applied to relevant contexts.

 

Many of our current educational programmes are very distant from a learning programme as envisioned here. Most of our programmes are a concatenation of topics prescribed by teachers and consumed by the students. Not uncommonly, little communication exists between teachers, sections or departments on the content provided. Usually teachers or disciplinary units are fully autonomous. Yet it is hard to believe that individual teachers can overview the educational programme as a whole. Moreover, teachers being specialists in their field are inclined, quite understandably, to over-emphasize the importance of their discipline in relation to the integral objectives of an education programme. In a system with many individual autonomous elements there is little space for monitoring, quality control, flexibility or, more importantly, synergy between elements. Moreover, the attitude towards professional quality in education is remarkably different from other academic areas. Professional quality in research, for example, is defined, and unequivocally accepted, through rigorous peer review. Quality of education, on the other hand, is left to the professional integrity of the individual.

 

Until now our discussion of the need and direction for change has been quite theoretical and perhaps perceived as somewhat utopistic. To make some of these issues more concrete we will subsequently describe an existing programme where an attempt has been made to apply some of the learning environment characteristics.

 

Problem-based learning

 

As a case report we will describe an educational method applied at the University of Limburg in Maastricht, The Netherlands. Although all faculties of this university use this educational method with variations adapted to the needs of individual disciplines, we will describe one faculty in which discipline problem-based learning originated: medicine. The medical school applies problem-based learning since the faculty started in 1974 (Van der Vleuten & Wijnen, 1990). Medicine in the Netherlands is a six-year programme in which the last two years are spent in clinical attachments in ambulatory and non-ambulatory settings. We will focus on the system as it is used in the first four preclinical years of the study.

 

We will do so by describing an exemplary week in the life of a student and discussing the principles behind this program. This week is schematically represented in figure 2.

 

Figure 2: A week of a student in a medical problem-based learning programme.

 

 

 

      Monday

 

     Tuesday

 

  Wednesday

 

    Thursday

 

      Friday

   am

Skills training

 

 

 

Tutorial group

Communica-tion and attitude training

 

Tutorial group

   pm

 

 

 

 

Lecture

 

Health practice contact

 

 

The tutorial group

The heart of the matter is the meeting of the tutorial group. Twice a week a group of approximately 8 students and one staff member, called the tutor, meet. They have a so called blockbook consisting of a number of problems related to the content of that unit of the curriculum. Figure 3 provides a sample problem.

 

 

Figure 3: A sample problem as used in a tutorial group.

 

Mr. Brown, aged 68, comes to your surgery and tells you that he has been feeling dizzy recently. He is seriously worried because he has always been healthy; he has never had any medical problems. But the complaints, which he has had for a few months, are now getting worse and worse. The dizziness occurs when he gets out of bed in the morning, but it can also be provoked by a sudden movement of his head. "When it happens, everything swims before my eyes and I feel unwell, light in the head and a little queasy. When I sit down for a moment, the dizziness slowly disappears."

1

 

Problems are used to ensure a meaningful context for learning. By providing this context knowledge can be integrated with previous knowledge, and knowledge can be better retrieved and applied when necessary (Schmidt, 1983; Norman & Schmidt, 1992). The problems also lead to an integration of disciplines. For the problem presented in figure 3 the students may for example study the anatomy of the brain as well as neurological aspects of dizziness. The study of basic sciences and applied sciences are integrated.

 

In one tutorial session the students will analyse a single problem and discuss their prior knowledge related to the problem. They will subsequently define what they need to know to tackle the problem; they will define the learning objectives. In this group discussion one of the students acts as a chairman and one keeps minutes on the whiteboard. These tasks rotate within the group with every session. The task of the tutor is to monitor the group process. He may for example intervene when the discussion is unclear or when individual students do not contribute to the discussion or when the objectives are too vaguely defined. Often the tutor is not even an expert to the particular problem at hand. The tutor is not teaching, but guides the students: he may ask specific questions, probe particular topics, etc. After having defined the learning objectives as a group, the students will pursue the required information individually. They are learned to use multiple sources of information and to compare and synthesize that information (e.g., different handbooks, recent articles). In the next tutorial session they will discuss what they found. They are required to report in such a way that they demonstrate understanding of the material learned, e.g. not by reading their notes, but by presenting an overview or a schematic summary. Unclear concepts are discussed. If necessary new learning objectives are defined. A tutorial session lasts two hours, usually one hour for reporting back and one hour for discussing a new problem. Tutorial group sessions are held twice a week. A curriculum unit usually consists of six weeks. Every unit new tutorial groups are formed through randomisation: the students have no choice in the composition of the group. This forces the students to work effectively in any team, as they will also have to do in their later career.

 

Each unit is interdisciplinary in nature and addresses a particular theme, such as for instance fatigue or blood loss. The units are scheduled according to a master plan in which curricular objectives are defined in content areas deliberately arranged in such a way that a number of desirable principles could be achieved. The curricular architecture includes an increasing complexity, a spiral hierarchy of recurring topics, and a transition from normal to abnormal functioning.

 

To foster internationalisation three units are fully in English (other units are in Dutch) and students are encouraged to spend some study-time abroad. The English units allow exchange with foreign students, for instance through the Erasmus programme. A wide network is established with other schools for sending our students abroad.

 

Practical skills

The intention of the programme is to integrate theory and practice as tight as possible. Therefore an elaborate skills training programme is arranged starting right in the beginning of the first year.The skills programme is integrated with the content discussed in the tutorial groups. In our illustrative week two trainings are scheduled. For example, for the sample problem in figure 3 the skills training on Monday morning could consist of practising the neurological examination on each other or on a patient. Attitude and communication skills, a pressing societal demand for doctors, is also considered important in skills training. In each curricular unit every student will have an encounter with a (simulated) patient. In a safe laboratory environment the student may practice his social skills, and, as the curriculum progresses, can practice to apply knowledge in relation to a real (or simulated) patient.The training on Wednesday morning could for example encompass the bringing of bad news to a (simulated) patient with a neurological problem.

 

Health practice contact

The same integrative objective is pursued with the health practice contact in the week of our student. Throughout the curriculum a number of these contacts are organised. They may include a tour on an ambulance, a week nursing patients in a hospital, a day in a midwifery practice, etc. The health practice contacts and the skills programme contributes highly to the motivation of the student. Directly from the start they can act as 'real' professionals, and in the process they obtain an accurate view on the demands of their later profession, allowing them to make an informed choice to continue their training in the field.

 

Lectures

Traditional lectures are also part of the curriculum. However, they are carefully planned and should have a specific additive function to the learning programme. They are used to

introduce a curriculum unit, to activate prior knowledge, to help students on difficult topics, to provide unique information (e.g. from an invited speaker in the field), etc. On average, approximately two lectures are held per week.

 

Non-scheduled time

The open space in the week of our student is significant. Problem-based learning requires students to work independently. To facilitate self-study a substantial investment is made in providing facilities for students. Next to a library a so called 'study-landscape' has been created. This facility provides a library (although books cannot be loaned) with multiple copies of all current handbooks, a video and slide library, computer facilities for computer-assisted learning and for other information technology applications (access to library files, CD-rom, word-processing and statistical facilities, the Internet, etc.), copying facilities, and ample space to sit quietly for studying. Invariantly throughout the curriculum approximately 10 to 12 hours per week are scheduled activities; the remaining time is for the student to fill in.

 

In summary, problem-based learning requires students to acquire knowledge by using problems as a learning context, stimulates self-directed learning for life-long learning and integrates disciplines both horizontally (multiple disciplines integrated with one unit) and vertically (basic and applied sciences; theory and practice are integrated).

 

Assessment

 

The way student achievement is assessed is quite important in a problem-based learning programme. Tests and examinations have a tremendous impact on how students learn. A discipline oriented assessment programme would be detrimental in a problem-based programme. Similarly, a classical system consisting of course related examinations in which students go from hurdle to hurdle would not be beneficial for problem-based learning. In a course related examination system students work to pass for the test. Students in a problem-based programme are expected to define their own (or group) learning objectives, i.e. their self-directed learning is paramount. Test-directed studying is the opposite from that. Moreover, the focus is on functional knowledge and little value is attributed to the momentary knowledge of a student cramming for a test.

 

Next to integrated unit-related tests, the assessment programme heavily rests on a different format of testing called progress testing. A progress test is a comprehensive test (250 test items) covering the end objectives of the curriculum just like a final examination, including all disciplines within the programme. The same test is administered to all the students in the curriculum (year 1 to 6) at the same time. Every three months a new test is constructed and administered. First year students are not able to answer many questions (approximately 20%), second year students somewhat more and so on. A single student will make 24 (6 times 4) progress tests during his study and will find himself growing gradually in different areas. The average overall growth shows a near perfect linear incline until graduation. Test directed studying is difficult since it is not known to the student what to expect; any question can be asked. Conversely, by simply working continuously on their own objectives students will see automatic growth of knowledge. There is no need for cramming or particular anxiousness.The progress test allows students to concentrate on their tutorial group work. Moreover, the test reinforces functional knowledge. Instead of passing from one examination to the other, a progress testing system continuously assesses the previously learned material. For example, when biochemistry is learned in the first year the students are still required to answer biochemistry questions upon graduation.

 

Other parts of the assessment programme include performance-assessments of students actually interacting with (simulated) patients using direct observation under standardized conditions, and written or computer-based exercises and tests using problems and patient cases. All tests are submitted to a careful review procedure by interdisciplinary test review committees. Every test is public after administration and open for critique from the students. Their comments are reviewed by these test committees and final scores for students are calculated after this process has taken place. Special attention is given to the feedback function of tests either by providing detailed information on profile scores, by peer reference information, and by providing literature references and suggestions. Achievement testing as a learning resource, i.e. as an integral aspect of the educational process, is highly emphasized.

 

The educational organization

 

The task of the teacher in this programme is clearly different from a traditional programme. There is relatively little classical teaching such as lecturing. The role of the staff is more the role of the provider, the developer, the organiser and facilitator. Examples of teaching roles include being a tutor in a tutorial group, member of a unit planning group, member of a test review committee, developer of a training programme in practical skills, trainer in a faculty development programme, etc. All new staff members are required to take a number of educational courses on problem-based learning and its specific teaching skills before they are allowed to participate in the programme. The different teaching roles have a certain hierarchy. For instance, to become a unit coordinator one must have extensive experience as a tutor and as a member of a planning group. When there are openings for teaching roles staff has to apply. Part of the selection decision is the quality of past performance in previous teaching tasks. In promotion decisions teaching performance is an important criterion. Some of the teaching roles are formally evaluated by students. The evaluations are brought to the attention of department chairs and are used in yearly staff evaluation rounds.

 

To manage all these activities a matrix-management system is used. The matrix is defined by two axes: disciplines (departments) and educational activities. Depending on the activity, a number of disciplines are involved and staff members of multiple departments are allocated or linked to that activity. Planning an educational unit is an illustration of one educational activity and a planning group will typically consist of six to nine representatives of departments. There is a wide variety in educational activities, including a number of educational support activities. For instance, a group of people is responsible for library and study facilities, another for systematic programme evaluation, another for faculty development, etc. The roles of teachers can be quite diverse.

 

All educational roles are quantified in educational hours. Different roles are differently rewarded depending on their time involvement. Therefore, it is relatively easy to monitor the contribution of departments. The summation of educational credits per department should match the number of staff labelled on teaching activities in that department. If that is not the case the department will lose staff in the longer run. If poor quality is delivered individual staff members will have difficulty in competing for educational roles, which will burden the department because they do not achieve sufficient input in the curriculum overall. On the other hand, the credit system provides flexibility for departments because it allows the planning of variability in teaching load across individual members within the department.

 

The coordination of the curriculum as a whole is organised at a central level. An educational committee with elected members from departments and students determines to a large extent the overall educational policy. The operational management is in the hands of a separate committee chaired by the dean for educational affairs. Educational input and educational quality is the basis for a yearly review session with all departments.The curriculum is systematically monitored using student evaluation questionnaires reflecting all educational activities. These evaluations are fed back to the responsible educational project groups and changes are monitored. Review groups within the educational committee periodically make an in-depth evaluation of educational activities. In this way quality control and educational innovation is tried to be built in within the programme; i.c. an attempt is made to achieve a 'learning organisation'.

 

Conclusion

 

Problem-based learning intends to create a flexible learning environment. It tries to meet the demands of change as they were discussed in our introduction. As will be clear now, learning can be much more than teaching. It is the learning which we try to foster and teaching is only part of it. The teachers are the architects, the managers, the controllers, and the helpers. It is intended as a dynamic and flexible process: quality control, rationality, change and innovation are vital elements of this approach to education.

 

Two questions come naturally into ones mind: is it any better, and what does it cost? However, the effectiveness question is difficult to answer. If the spread of problem-based learning is used as a criterion, then it is quite effective. Virtually all over the world problem-based learning is introduced in many schools of higher education and universities, both in western as in developing countries. The answer is more difficult if outcome is the criterion. A number of review articles have recently been published (Albanese & Mitchell, 1993; Berkson, 1993). In general, knowledge examinations do not demonstrate systematic differences between students in problem-based learning programmes and conventional programmes. On the one hand this is a reassuring finding, on the other hand one may alternatively question the need for all the effort. On specific skills problem-based learning students are often rated to be superior. These include for example library skills and practical professional skills. However, the most consistent and conclusive finding in favour of problem-based learning is 'fun': students in a problem-based learning programme have more pleasure in studying and are more motivated. A final difference concerns attrition rates. In the Netherlands, problem-based learning programmes show consistently smaller drop out rates and the discrepancy between nominal and actual study-time is smaller. In other words, more graduates are produced in shorter time with at least equal proficiency. Problem-based learning may be economically more efficient.

 

There are no studies published comparing the resource requirements of problem-based learning and conventional programmes. However, within our situation in the Netherlands the situation is quite simple: there are no differences in funding across universities. The problem-based learning programmes are carried out with the same budget as the other universities. The current popularity of problem-based learning in so many institutions is another token of its feasibility.

 

The case-report here concerned medicine, problem-based learning has been successfully applied in many other disciplines (Boud & Feletti, 1991; Gijselaers et al., 1995; Bouhuijs, Schmidt & Van Berkel, 1995). Naturally, the method may not work identically for every discipline and changes may be in order. We would like to stress that the method itself is not so important. More important is the creation of an adequate and flexible learning environment and there may be many ways to achieve that.

 

Literature

 

Albanese, M.A. & Mitchell, S. (1993). Problem-based learning: a review of literature on its outcomes and implementation issues. Academic Medicine, 68, 52-81.

 

Barrows, H.S. & Tamblyn, R.M. (1980). Problem-based learning: an approach to medical education. New York: Springer.

 

Berkson, L. (1993) Problem-based learning: have the expectations been met? Academic Medicine, 68 (Supplement), S79-S88.

 

Boud, D. & Feletti, G. (Eds.) The Challenge of Problem-based Learning. London: Kogan Page, 1991.

 

Bouhuijs, P.A.J., Schmidt, H.G., Van Berkel, H.J.M. (Eds.) Problem-based learning as an Educational Strategy. Maastricht: Network Publications, 1995.

 

Dolmans, D.H.J.M. (1994) How students learn in a problem-based curriclum. Ph. Dissertation, Maastricht: University of Limburg.

 

Gijselaers, W.H., Tempelaar, D.T., Keizer, P.K., Blommaert, J.M. & Kasper, H. (Eds.) Educational Innovation in Economics and Business Administration: The Case of Problem-Based Learning. Dordrecht: Kluwer Academic Publishers, 1995.

 

Norman, G.R. & Schmidt, H.G. (1992). The psychological basis of problem-based learning: a review of evidence. Academic Medicine, 67, 557-565.

 

Pochet, B. Le “Problem-based Learning”, une révolution ou un progrès attendu? Revue Française de Pédagogie, 111,  95-107.

 

Schmidt, H.G. (1983).  Problem-based learning: rationale and description. Medical Education, 17, 11-16.

 

Van der Vleuten, C.P.M. & Wijnen, W.H.F.W. (Eds.) Problem-based Learning: Perspectives from the Maastricht Experience. Amsterdam: Thesis-publ.


The assessment of professional competence: developments, research and practical implications

 

C.P.M. van der Vleuten

 

University of Limburg, Maastricht, The Netherlands

 

 

Introduction

 

Educational achievement testing is an area of turmoil in the health sciences. Examinations are a constant source of problems for many teachers, curriculum designers and educationalists. The evaluation of student achievement is continuously debated at educational meetings, conferences and workshops. It is also an area in which tradition, personal values and experiences tend to dominate discussions. On the other hand, the number of scientific publications on assessment over the last decade has exploded. The number of proposed instruments, each preferably using an intriguing acronym, are countless. The literature is however often difficult to access, since the psychometrics usually involved in educational testing discourages the average health professions reader. Assessment in the health professions education is nevertheless an area which is fortunately well-researched and has delivered a number of well documented outcomes. The purpose of this article is to highlight these outcomes and to attempt to translate them into practical implications and research suggestions. We will not review individual methods and instruments in detail, but will describe some classes of methods contingent on a (supposed) theoretical framework. We will delineate what we consider paramount findings within and across these classes and discuss their theoretical implications and its effect on the evolution of new testing instruments. By using a simple conceptual framework the utility of assessment methods will subsequently be defined in a generic sense meant to be helpful in deciding how to compromise and make trade-offs in assessment practice. To improve the utility of assessment a number of practical suggestions and research recommendations are proposed using this framework.

 

The traditional view of competence

 

The search for instruments to assess clinical competence has been stimulated through emerging logistical constraints and the dissatisfaction with ongoing assessment practice in the health sciences. Particularly in the era after the Second World War the number of students in higher education has grown exponentially which poses problems of logistics, since assessment (and teaching) was largely based on, or derived from, the apprenticeship model (implicit assessment, holistic judgements, unstandardized tests). The subjectivity and the poor measurement characteristics of this approach was probably another factor to strive for new pathways of assessment.

 

While searching for new instruments an implicit conception of the nature of professional competence was used. Competence was seen as an aggregate of different components or latent attributes, which were seen as relatively distinct from each other. The development of competence was contemplated as being equal to the development in each of the components, with growth defined as a monotonic process resulting from learning experiences.The components were also considered to be relatively stable across (clinical) situations and time. Expertise in a component allows a person to act professionally regardless of the particular nature of the situation or circumstances. In essence, the implicit conception of clinical competence reflected a trait-conception as was quite prevalent in contemporary personality and educational psychology. It is a very intuitive approach in which professional behavior is attributed to be caused by a set of latent factors: they are within the person and cannot be directly observed, but must be inferred from observed behaviour.

 

The trait approach was also implicitly applied in the use of methods for measuring the components: each of these components could (or should) be measured separately and different methods or formats are appropriate to test different traits. The validity of an assessment method would be demonstrated if low correlations were found between scores on methods measuring different traits, while high correlations were required between methods measuring similar traits. The agenda became to develop methods appropriate for each of the components of clinical competence such as is schematically outlined in figure 1: the 'jig saw puzzle' was 'merely' to find the right pieces. And again, in the course of the history, very many instruments have been proposed each supposedly tapping into different competency areas.

 

After these many years of research and development, one should expect a review article to present a grid as in figure 1, completed with a consensus list of well-defined components of competence or traits and lists of instruments to be used, preferably with extensive 'how-to-do information' on each of them. Unfortunately this is not the state of the art. Despite the many proposed typologies, no consensus exists on any taxonomy of clinical competence and none of the 'traits' are well-defined. Even simple constructs such as 'knowledge' can be interpreted and subdivided in infinite ways, with more complex constructs such as 'attitudes' and 'humanistic skills' being total mind-breakers. Particularly when the conceptual level is translated into operational terms, i.e. into test material, even the smallest definition of a construct being measured tends to vary as much as people involved in the process. Similarly, there is no consensus on 'best' methods of assessment. Although new measures of clinical competence were often eagerly presented with an aura of a panacea, empirical evidence usually tempered the original enthusiasm.

 

The by now disappointed reader needs not, however, be discouraged entirely. There are good reasons for the deficiency of the conventional model and, compared with a few decades ago, we know substantially more about assessment, both from a theoretical and a practical perspective. Although we cannot present an overview of best methods which would drown the turmoil and reduce the frustrations of test development, we will try to clarify the difficulties in competence assessment and perhaps provide some practical suggestions. In order to do this, it is helpful to first review some significant developments which progressed our understanding. To describe these we will review a few cells of figure 1 and describe four classes of methods, each attempting to measure different components of competence. To some extent they also reflect the history of the research. They will not, by any means, cover all significant developments in competence assessment, but they will disclose some major issues in educational testing in the health professions. The four classes of methods are: multiple choice questions, written simulations, learning process measures and live simulations.

 

Multiple choice questions

 

The introduction of multiple choice questions (MCQs) accommodated the need to cope with the increased logistical demands for educational testing in higher education . Not surprisingly multiple choice tests were massively introduced after World War Two, additionally boosted by the introduction of computer technology. Many different forms have been developed (single best answer, multiple best answers, true/false questions, matching questions, short and long menu's of options, et cetera)[i] [ii]. In all probability, all readers are familiar with them and there is probabaly not a single educational institution which does not use MCQs in the assessment program. Although multiple choice tests are time consuming to construct, they are efficient in handling large numbers of examinees, and their reliability is excellent. There are no subjective influences of scoring answers (as a result of which they are called objective, which might however be a misleading term as we will see later) and the content of interest in a particular domain can be efficiently sampled, since a single test can easily contain many items.

 

The MCQ test is significant in this context because it was and is the subject of many criticisms[iii] [iv] [v]. The MCQ  is designed to measure aspects of knowledge. However, according to the critics, selecting options from a list of options is considered as trivial knowledge. Instead of requiring active generation of responses, such as in free response tests, examinees in MCQ-tests are only required to recognize the correct answer or to eliminate the incorrect ones (cuing effect). The MCQ is therefore supposedly only suitable to measure lower taxonomic levels of cognitive functioning.

Despite the criticisms the use of MCQ-tests remains widespread. Although it did not allay the critique, research has shown that the effect of cueing is marginal. Cueing mainly has an overall effect on the mean score, and may hence have consequences for standard setting, but the rank-ordering of examinees usually remains unaffected[vi] [vii]. However, the acceptability of MCQ-tests has nevertheless always been its Achilles' heal. It has stimulated test developers to look for alternatives, particularly to devise tests to address higher cognitive taxonomic levels and tests more closely linked to professional reality[viii].

 

Written simulations

 

In the sixties, attempts were made to measure clinical reasoning ability or problem-solving.[ix] The typical approach was to present an examinee with a patient problem and then ask for management decisions. The decisions and answers to questions were taken as an index of an examinee's problem-solving ability. Sometimes ingenious technical devices (invisible ink, latent image printing) were used to simulate a dynamic and realistic discourse of a patient problem. The most prominent example of this approach was the Patient Management Problem (PMP)[x]. The examinee was required to collect data on history, physical examination, and investigations. Some PMPs allowed a branched pathway through the problem depending on the choices being made. Other instruments with a similar purpose were also introduced such as, among others, the Modified Essay Question (MEQ)[xi] [xii], and less known measures such as the 'P4-deck'[xiii] and the 'Film Test' and 'Programmed Test'5.  With emerging computer technology some of the drawbacks of the paper-and-pencil formats could be circumvented and computer simulations were introduced[xiv] [xv] making the simulations even more realistic. The common denominator in all these instruments was the utilization of a realistic (patient) problem to simulate reality in order to assess the process of problem-solving.

Through their realistic nature written simulations became quite popular, despite their cost of production. Because of their high acceptance they became rapidly part of many examination programs, including some national licensing examinations.

Apart from the scoring problems involved in these simulations (disagreement on correct options, complexity of pathways, differential weighing of responses)[xvi] [xvii] three significant consistent empirical outcomes were found which casted doubt on the existing conceptual framework of problem-solving. The first consistent outcome was that a score derived from one problem was not very predictive for a score on another problem. Apparently the ability to solve problems was dependent on the (clinical) content of the problem. Even changes of content within limited content areas or smaller contextual changes yielded different outcomes. The typical found correlation between scores of different problems varied between 0.10 and 0.30[xviii]. This finding was quite a surprise and puzzled many researchers and test developers since it contradicted the (implicit) trait conception of problem-solving as generic attribute: the transfer of ability from one problem to another turned out to be very low. The phenomenon became to be known as 'case-specificity' or 'content-specificity' of problem-solving[xix].  A second surprising outcome was the finding that experienced clinicians scored hardly better, and sometimes worse, than less experienced clinicians or students[xx] [xxi] [xxii] [xxiii]. Apparently, a monotonic growth of competence with increasing expertise, as was hypothesized from the trait conception, did not exist. A third unexpected finding was that once reliable scores on problem-solving tests were obtained (either in reality or by statistical correction), very high correlations were found with other measures including multiple choice tests[xxiv] [xxv]. Problem-solving appeared to be much more closely linked to knowledge (and other constructs) and is not as independent a construct as was originally supposed.

Apart from the theoretical implications, these empirical findings posed major practical problems. In educational testing we prefer to make an inference of an examinee's ability independent of the particular sample of items (questions, patient cases, problems) used in the test. The items merely constitute a random sample from a large domain of possible items. In a next test, or in later practice, this sample will be different. When the ability to solve problems generalizes poorly across problems one is required to incorporate many problems in a test before a sample-independent conclusion can be drawn. In other words, the test length needs to be increased. With a favorable (not often found) average inter-problem correlation of 0.30, at least 10 problems are necessary for achieving minimal reliability (i.e., to reach an arbitrary alpha of 0.80); with a correlation of 0.10 more than 35 cases are required. These lengthy tests would naturally have major resource implications, both in terms of testing time as well as in terms of cost to produce these tests. From a purely decision making perspective high correlations imply redundancy of information: a score on one test is highly predictive of a score on another method. Limiting the assessment to the most resource-saving method is a logical consequence and it is not surprising that most licensing institutes removed these expensive simulations from their examination program.

As a reaction to the content specificity problem a new direction was suggested in the mid-eighties. It was argued that any (clinical) problem has one or more essential elements crucial to the management of the problem. The other elements in the problem follow from these key elements or are less important. For assessment purposes it was suggested to limit the assessment to the key elements in order to use the saved time for testing additional problems and to improve the reliability18. This has been called the 'key feature approach' to the assessment of problem-solving and several instruments were proposed using this approach[xxvi] [xxvii].

The problems encountered in the development of written simulations are illustrative of the pitfalls of the implicit and intuitive approach to assessment of professional competence. They caused quite some disturbance in the conception of problem-solving and the way competence should be assessed in general, including unbelief and/or mistrust of psychometric data-analysis and their producers. It has, however, contributed significantly to the understanding of competence assessment.

 

Learning process measures

 

In the seventies and eighties educational reform urged for new approaches to teaching and learning. Instead of passive consuming of learning material, students were supposed to take a more active role in acquiring knowledge. Instead of using rote learning strategies students should learn to understand, synthesize and apply learning material. A concern was expressed that most conventional methods of assessment and assessment programs tend to reinforce unwanted learning behavior (where indeed the MCQ is often named). A need was expressed to develop assessment instruments which measured the process of learning more directly.

A number of these instruments were proposed. A prominent example is the Triple Jump Exercise (TJE) which was intended to measure problem-solving skills, and to evaluate the quality of information gathering[xxviii].  The TJE consists of three steps (jumps): a structured oral examination based on one or more patient problems, a time-limited study assignment (mostly 24 hours) in relation to the patient problems in the first oral, and a repeat oral examination in which the quality of self learning around the assigned topics is assessed. In a similar way a problem-based learning exercise has been proposed more recently which assesses the quality of solving a task using a problem-based learning strategy: a  tutorial group meeting for generation of learning objectives and one week of self-study, followed by a written individual report[xxix]. Other process evaluation  measures included self assessment measures, peer ratings and faculty ratings[xxx] [xxxi] [xxxii]. They evaluated competencies such as group interaction skills, task orientatedness, leadership skills, communication skills and community interaction skills.

Except for a few innovative schools experimenting with their assessment program, learning process measures have never been widely introduced. Probably this was due to their unfamiliar nature as well as to their poor measurement characteristics[xxxiii] [xxxiv] [xxxv]. They are important here because they explicitly highlight the educational value of  assessment. Over the years, the dramatic impact of examinations on learning became increasingly clear. The learning process measures explicitly acknowledged this relationship by attempting to use it strategically: they communicated to students the importance of a number of  educational objectives through assessment.

 

Live simulations

 

A new development emerged at the end of the seventies and 0the eighties when the previous 'in vitro' simulations were advanced one more step by assessing actual performance of examinees in standardized live simulations of clinical situations. Examinees were brought into a simulated clinical situation called a 'station', where an assignment was given to perform a particular skill or to manage a patient. The skills may be demonstrated on real or simulated patients or on special technical devices such as gynecological and cardio-pulmonary rescucitation models. The performance of examinees is recorded by faculty staff examiners or by trained (simulated) patients. Some stations have post-encounter written stations, with written tests probing the previous clinical situation. A single test typically consists of a number of different stations and examinees rotate in a round-robin format through each of these. In order to achieve maximal standardization, examiners and (simulated) patients are usually extensively trained in preparation of their roles. The performance of examinees is scored on precoded checklists and/or rating scales. Therefore these tests were called Objective Structured Clinical Examinations (OSCE)[xxxvi], however several other names are also used (standardized patient-based testing, performance-based testing, authentic assessment).

Since its introduction the multiple station examination has dramatically conquered the world. Medical schools on all continentsuse some kind of station examination in their assessment program[xxxvii]. In Canada multiple station examinations are nowadays part of the national licensing examination[xxxviii] and actually applied on a national scale to over 1500 candidates per year tested across the country in a single weekend. Their popularity is probably due to the combination of the close approximation to the real world and the use of standardized testing procedures at the same time.

The multiple station examination is intensively researched. Overall the outcomes quite parallel the findings in relation to the written simulations: content specificity is the major concern for reliability, high (true) correlations are usually found with other measures, and, depending on the checklists used, absence of differences between groups of expertise is not an uncommon finding[xxxix] [xl] [xli]. However, the importance here is that they represent a next step to standardized professional testing approximating real life.

 

Research consistencies

 

The above developments show a few significant evolutions in the history of clinical competence assessment and, more importantly, disclose some general and consistent findings in the research associated with the developments. They have both practical and theoretical implications. We will discuss the research consistencies in more detail below.

 

Reliability issues

The variability of performance of candidates across tasks originally found in the problem-solving research appeared to be one of most consistent findings in all measurements of clinical competence. Except for some very basic communication skills[xlii], it has been found in all measurements of professional competence, including oral examinations[xliii], essay tests[xliv], chart-audits[xlv], multiple station examinations39, and in practice performance[xlvi]. It appears not unique for the health professions, since task variability has also been found a dominant source of variability in mathematics and science[xlvii], law[xlviii], and in military jobs[xlix]. As we indicated, the direct practical consequence is that tests containing a small sample of items (essays, stations, patient problems, tasks)[6] produce unstable or unreliable scores. Naturally, this will also vary with the size of the domain being tested, but even in smaller domains the required sample size of test items is usually high. Sample size requirements also vary with the efficiency of testing methods. In general, more efficient testing methods which need less time to sample a single item will be more reliable per unit of testing time than tests requiring more time per item. For instance, the MCQ is efficient in this respect and can sample a few hundred questions in a relatively short time span, whereas a computer simulation might require thirty minutes or longer to test a single patient problem and will therefore need very long overall testing time to produce reliable scores. To produce adequate reliabilities (i.e. 0.80 or more) one should take into account that even efficient tests usually require several hours of testing time25. Less efficient methods such as multiple station examinations most often require more than four hours of testing time or (much) more depending on the context, purpose and interpretation of scores of the examination17 39. 

Other sources of variability challenging the reliability of examinations such as rater, patient or examiner variability are usually either less important or can be better controlled. In general, some standardization and structuring of the assessment may have an adequate impact on the improvement of reliability. For example, when (simple) scoring keys are used to score essay tests adequate levels of reliability can be achieved as compared to the use free judgements[l]. By providing (simple) protocols to structure and score oral examinations they can become significantly more reliable[li]. Even when less analytical methods are used reliable scores (or at least as reliable as their 'objective' counterparts) can be obtained when the sample size of test items is sufficiently large and the test design is adequate. The test design is important. In general, the test design should be arranged in such a way that potential sources of variability (e.g. of examiners or patients) are adequately sampled in order to diminish or neutralize their effect on the precision of the measurement.

 

Table 1: Reliability of role-playing oral examinations as a function of  patient-case and examiner sample-size using different examiner and case allocation strategies.

 

Testing time in hours

Number of patient cases

Same examiner for all cases

New examiner for each case

Two new examiners for each case

 

 2

 4

 6

 8

10

20

 

 4

 8

12

16

20

40

 

0.45

0.47

0.47

0.48

0.48

0.48

 

0.69

0.82

0.87

0.90

0.92

0.96

 

0.76

0.86

0.90

0.93

0.94

0.97

 

As an illustration, table 1 contains generalizability coefficients as a function of the number cases in an oral examination using different test designs as reported by Swanson43. When the same examiners are used to test all cases for each examinee the reliability remains poor. By using a different new examiner for each case the final judgement over an examinee is based on more raters and the bias introduced by examiners will average out across cases. Adding a second examiner per case is hardly worthwhile. One might argue that the reliability is still poor, unless large samples of cases and raters are used, but this is no different for many other testing methods, including the in vivo and vitro simulations.

The illustration in table 1 also shows that objectivity of testing methods as a means to classify test methods can be misleading. While one view is that objectivity is equivalent to reliability, objectivity as a demonstration of subjectivity may be different from objectivity as a set of strategies to reduce measurement error (such as the use of checklists). Depending on the sampling strategy applied, like checklists, objective measures may produce unreliable test scores and subjective measures such as more holistic and global professional judgements may yield reliable test information[lii].

Assessment methods which are both unstandardized and global are hopelessly unreliable[liii]. A prototypical example is the clinical rating as they are used in clerkships in many medical schools. They usually consist of a number of ratings on global categories of clinical performance, and often judged over a lengthy period of non-standardized performance of examinees, obtained from other sources and seldomly based on direct observation. It is not only difficult to pass judgement on a candidate who one has (closely) worked with for a period of time, but it also difficult for any person to make a judgement on performance which covers an extensive period in the past, particularly when unstandardized or unstructured[liv]. Psychological research indicates that the human mind is easily led by what we think we have seen, usually based on gross generalizations of a few cues or samples of performance, which not necessarily coincides with reality[lv] [lvi]. For any measure to become reliable we need a sufficient sample of performance gathered and scored with at least minimal standardization and structure.

 

Validity Issues

To determine whether educational tests measure what they intend to measure, criteria or standards are necessary. The validity research has always been plagued by the absence of good criteria and gold standards simply do not exist (otherwise they would be used in the assessment). Validity research, including our own, more often than not contains methodological weaknesses: absence of a theoretical framework, explicit hypotheses about expected results, strength of relationships or differences to be accepted or rejected, a lack of information on the reliability of the instruments used, etecetera.[lvii] As a result validity research in educational testing contains a plethora of correlational studies, replete with mid-range correlations, which are more like Rorschach tests for the creative researcher to interprete favorably regardless of the outcome (glasses are always half empty or half full anyway). Validity research is therefore characterized by variable and often uninterpretable outcomes. Some remarkable trends are nevertheless worth mentioning.

The trait-approach suggesting divergence and convergence of scores of tests measuring respectively different and similar constructs has not yielded very encouraging empirical support. Although comprehensive studies using multiple measures of the same and different components of competence[7] are extremely rare and apart from the methodological weaknesses of these studies the conclusion seems warranted that the communality between different methods of assessment found is usually larger than we intuitively are inclined to think. The high correlations found between problem-solving tests and other measures has been a recurrent finding with many methods of assessment (provided that the tests involved were reliable). It has been found between scores of free-response tests with MCQs7 44 [lviii] [lix], PMPs and MCQs25, between oral examinations, computer simulations and written tests[lx] [lxi], and between written tests and multiple station examinations[lxii]. The finding is again not restricted to the health sciences[lxiii] [lxiv].

Using a corollary of testing methods a number of studies have found clear relationships between certification examinations and performance in practice, suggesting validity of examination methods in relation to later performance[lxv] [lxvi] [lxvii], but there were no differential method interactions[lxviii].

Causal inferences are always difficult to make with correlations and high correlations do not necessarily imply that the same construct is being measured. However, these findings indicate that method characteristics do not inherently determine what is being measured. At least the attributed uniqueness of the methods of assessment for measuring particular and unique aspect of competence is challenged. McGuire has actually called the conception of methods dictating what is being measured as one of the most damaging myths in competence assessment which has significantly delayed progress[lxix]. What is being measured depends more on the content of the method or the task posed to the examinee, than any characteristic of the method itself. The validity more likely depends on the stimulus format of  a test item rather than the response format with which the answer is captured (and the cognitive process involved as we will see below). An MCQ does not measure factual knowledge because it requires a selection from a list of options, but may measure factual knowledge when the question probes for factual knowledge. It may however also measure aspects of problem-solving if it, for example, provides a patient case scenario and prompts for a management decision. The same holds for essay tests, oral examinations or any other test format: what is in the method is more important than its wrapping. This is not to say that any method may measure any component of competence or skill and naturally some methods more easily assess some competencies. For example, measuring communication skills will be hard without some form of direct observation. The opposite reasoning is however challenged here: what is being measured is not dictated by the method but rather what is put into the method.

Several authors have cautioned for sacrificing validity as a compromise to objectivity, particularly in complex professions such as the health sciences[lxx] [lxxi]. Assessment techniques that avoid professional judgement in the name of objectivity may lead to an atomization of complex skills thereby trivializing the content of the assessment. For example, to break down communication skills into its smallest possible behavioral components in order to be able to check them better on a performance list may enhance objectivity but will not reflect the intended complexity of the skill[lxxii]. Trade-offs in validity are clearly to be made, and there are clear pitfalls involved in fragmenting complex skills that require some holistic and professional judgement71.

 

Educational issues

The concern about the driving force of examinations on the learning and the curriculum that stimulated the development of the learning process measures has increasingly become an issue of test developers. Many authors have emphasized or documented the tremendous impact that the assessment program has on the learner[lxxiii] [lxxiv] [lxxv] [lxxvi] [lxxvii] [lxxviii]. At the risk of adhering to a naive behavioristic view on learning[lxxix], there is some hauntingly truth in that students do whatever they are tested on and are not likely to do what they are not tested on. Regardless of the curriculum objectives, students in a learning program will follow the examination program. This is the heart of the 'hidden curriculum'[lxxx]: examinations define academic success and the students cannot be blamed for optimizing their chances to achieve success. The challenge for test developers is to use this phenomenon strategically and to reinforce desirable learning behavior. This strategy, also referred to as 'measurement-driven instruction'[lxxxi] may have powerful educational consequences. However, there are a couple of pitfalls involved.

First, there are risks involved in mere test-directed studying. Many assessment programs are structured in such a way that examinations are in competition with each other and invite students to peak from hurdle to hurdle. Particularly if the contents of the examinations reward rote learning one can seriously question the retention-rate of the information gathered and the ability to apply the information appropriately[lxxxii].

Second, the effects of assessment are often difficult to predict and sometimes even opposite to original expectations. For example, Van Luijk et al. reported a study in which a multiple station examination after a number of years of usage deteriorated the competency of students because they started a blossoming trade in previously used checklists which were subsequently memorized by the students when preparing for the examination[lxxxiii]. This was enhanced by the detailed of the checklist (for objectivity reasons) of largely cognitively oriented items. Another study has shown that an intended switch from multiple choice tests to free-response tests to avoid memorization led to an expectation of students to actually memorize more44. Yet another study reported that teachers being directly involved in small-group learning may theoretically be the best resource persons for information on the students' progress, but their judgmental role may conflict their facilitating role35. In conclusion, any assessment action will result in an educational reaction. The unpredictability, however, of these educational effects require careful and continuous follow-up analysis of the side-effects.

 

A modern view of competence

The research evidence has clearly challenged the appropriateness of the trait-model of professional competence. Components of competence show great variability across tasks, they cannot be well differentiated empirically, and growth in competence is more capricious than expected. This is not so much a surprise, since the notion of inherent and robust traits has been similarly challenged (and abandoned) in psychology already some time ago[lxxxiv]. In the health sciences, the empirical disillusions have stimulated more fundamental cognitive psychological research into the nature of clinical competence and development of expertise. In recent years major progress has been made in this area which may explain a number of phenomena in the assessment research with implications for the future[lxxxv] [lxxxvi] [lxxxvii] [lxxxviii].

Expertise development of professionals appears strongly connected to knowledge. However, the way in which knowledge is stored, used and retrieved characterizes differences between novices and experts. The accumulation of knowledge is necessary to be able handle concrete problems and to be able to reason; i.e. to explain (clinical) phenomena by their underlying (pathophysiological) mechanisms. Knowledge being accumulated (stored) in a relevant (problem) context provides the best chance for retrieval (Ref Needham) when faced with a new problem. However, with accumulating experience explicit reasoning as a cognitive process diminishes in importance because it is no longer instrumental. In stead, clinical situations, specific signs and symptoms - or at first sight irrelevant details, specific cues or patient characteristics -  are recognized immediately. The reasoning process becomes automated and is condensed into 'chunks' or 'scripts' of clinical and contextual information which are activated instantaneously at an appropriate moment. Experienced doctors often formulate their diagnostic hypotheses in the very first few instances with a patient, and they are usually correct. In summary, the cognitive psychological model views professional expertise developing as a transition from a conceptually rich and rational knowledge base (acquired from educational experiences) to a non-analytical ability to recognize and handle situations efficiently and effectively (acquired from clinical experiences). The ability is not easily transferred from one problem or situation to another, but remains relatively dependent on the specific situation. One person may therefore function at several cognitive levels at same time depending on the problem at hand. With increasing experience and specialization, this expertise will be further individualized.

This new theoretical framework may explain a number of the encountered  unexpected findings. It provides a logical explanation for the dominance of task variability influencing test scores. Instead of generic underlying constructs responsible for consistent professional actions across tasks, expertise is characterized by 'states' of development restricted to specific content areas. They are based on previous personal experience, they hardly generalize across situations or tasks and they change continuously as a result of new experiences. In educational tests this is subsequently reflected in substantial task variability.

What is being measured by an individual item in a test will depend on the cognitive processes involved when answering the question and this will not only vary from item to item but also from person to person for a single item: a response to an item may be the result of pure recognition for one person or the result of a reasoning process for the other. A summation of item scores to a test score must therefore yield a very heterogeneous aggregate. Perhaps this aggregate, once sufficiently sampled, reflects a G-factor (general factor, such as claimed in intelligence research) potentially responsible for the correlations across test methods59 60 62.

The unexpected absence of differences between expertise groups is probably the result of different cognitive processes. Both written and live simulations often reward thoroughness rather than efficiency and efficacy, which may again lead to higher scores for the less competent and less efficient examinees. In a study comparing multiple station examination scores (assessing what doctors can do) with similar cases tested in clinical practice by using hidden simulated patients (to assess what doctors actually do) no correlation was found using the raw scores. However, after a correction for efficiency and time needed to obtain vital information a substantial association was found[lxxxix].

The difficulty of assessment programs in fostering retention of knowledge or the inability to apply knowledge to new situations is not a surprise when realizing that retention is stimulated by a meaningful context, repetition and resemblance to the original situation. By contrast, tests often consist of  decontextualized test items, and total examination programs usually involve little repetition and integration. More often than not, 'wiping the hard disk' to prepare for the next examination increases the chances of success.

The cognitive psychological model may be useful for new directions in assessment. Perhaps it will provide new measures of assessing expertise for the future[xc] [xci] (although by now the reader is probably aware of the relativity involved in the promises of new measures), but at least the model may help us to better understand some assessment phenomena.  It may also provide new pathways for research and test development.

 

The utility of assessment methods

The intention was to clarify a number of issues in the assessment of professional competence. The discussion so far has made clear that assessment of professional competence has indeed stumbled on many difficulties and that perfect assessment is an illusion.Trade-offs between what is desirable and achievable are therefore inevitable.

In order to derive at some practical implications and to clarify the compromises involved we will use a very simple model to define the utility of assessment methods. So far we have discussed three variables more extensively which should be part of the model: reliability (R), validity (V) and educational impact (E). Two additional important variables were implicitly addressed: acceptability (A) and cost (C). In educational practice decisions are rarely based on research outcomes[xcii] and particularly in assessment one has to deal with opinions, sentiments and traditions of teachers, students and institutions. The extent to which an assessment procedure is accepted by the people involved in the assessment is a crucial element for consideration. The mere existence of so many examination procedures with severe shortcomings in reliability is evidence of the phenomenon[8]. The cost of assessment is an obvious variable hardly needing further explanation: resource limitations are universal, even more so for single institutions or individual test developers.

We will define the utility (U) of an assessment method as a multiplicative function of these variables with differential weights (w) associated with each of them:

 

U = RWr  x VWv x EWe x AWa x CWc

 

It should be noted that this definition is purely intended as a conceptual model and not meant as an actuarial  algorithm since it is clear that most of the elements can never be quantified. However, as a model it makes the trade-offs clear. Perfect utility is a utopia. In practice we will always be required to compromise and assign different weights in different individual situations, depending on the context and purpose of the assessment. For example, in a situation where the assessment involves a high-stake examination with decisions having marked consequences on the future of examinees, reliability will probably have a heavier weight in the decision to use an assessment method. On the other hand, in the context of in-training assessment, where the final decision is based on many assessments, one probably is prepared to compromise more on reliability in favor of educational impact of the assessment. The relationship among variables is however deliberately conceived as multiplicative. If one of the elements is zero the utility will be zero. A reliable, valid and feasible test will have a short life if its accepted by no one.

 

Implications for practice and research

Having defined the elements of the utility of assessment methods the question now is what we can do to improve it. Using the research outcomes and the theoretical developments described above we will derive a number of practical suggestions for each of the variables involved and delineate some research requirements where applicable.

 

Reliability suggestions

The obvious practical implication of the content specificity problem is that one cannot rely on tests containing few cases. Traditional clinical examinations such as the 'clinical viva' or 'long case' which often consists of no more than a single patient are totally unreliable because of their limited sampling of content (even when the examiner influence has been ruled out). Regardless of the testing method, wide sampling of content across the area of interest is imperative to allow for stable and reproducible scores on educational tests. Several hours of testing time are usually required to sufficiently reduce the error introduced by task variability. As an alternative to increasing the content sample per test one may consider increasing sampling across time by using multiple test occasions. However, (some) compensation of test scores across occasions are then in order to allow decision errors per test - which are sizable due to unreliability - to average out across tests. A third alternative to increase reliability is to combine methods into a larger battery of different subtests[xciii].

To contain costs, efficiency is the hallmark. This may be achieved through the selection of efficient testing methods, through considering efficiency per test item and through careful test administration procedures. MCQ-tests are very efficient for sampling across content whereas simulation-based instruments are less efficient. The choice of method will be depending on the trade-offs to be made with the other utility-variables. Efficiency per test item may be achieved by using a key-feature approach: assess the key elements of a task only and use the saved time to assess more tasks. Again, this will depend on the willingness to compromise on the other variables.

As an illustration of compromises to be made, the University of Limburg used multiple station examinations testing skills in isolation (examination of the knee in one station, interviewing skills in another, etcetera) each tested with detailed checklists. The examinees started to memorize the checklists and students complained that the examination did not reflect clinical reality ("monkeys doing tricks" as they expressed their feelings). In reaction, stations were integrated, at the cost of an increase in station time, and checklists were globalized using items that judge the quality of integral elements of skills on rating scales, at the cost of a decreae in inter-rater reliability[xciv].

Efficiency may also achieved through alternative test administration strategies by adjusting the test length to ability of the examinees[xcv]. In 'sequential testing' a short test is given to all examinees as a screening assessment. Examinees scoring distant from the cut-off score are excused from further testing. The assessment is subsequently continued only for the remaining examinees. In 'adaptive or tailored testing' each subsequent test item presented to an examinee is dependent on the performance on the previous one. Tailored testing may be quite efficient for saving testing time, but poses however very strict psychometric demands on test material.

Introducing standardization and structure will improve reliability considerably. It is however not always necessary to totally standardize the testing situation or to use analytical scoring methods only52. In general, factors introducing error in a measurement require more sampling within that factor as was illustrated in table 1. Careful test designs with efficient sampling strategies may substantially improve reliability while saving resources at the same time43.

From a research perspective, reliability studies are continuously needed for support of individual testing methods and their particular contexts of use. In general, however, most sources of unreliability are well documented. Perhaps a research area of interest is the closer study of the content specificity problem. When personality psychology abandoned the trait approach it was replaced by the person-by-situation interaction paradigm[xcvi],as a way to better understand the variability across situations phenomenon. In educational testing for the health sciences the question could be posed whether all examinee by task interactions are simply error variance, or whether certain non-random consistencies exist. For example, if growing expertise is characterized by accumulating individual experience the person by task interaction could be correlated with level of training and experience (some first evidence suggests it is not[xcvii]). Similarly, type of scoring method could be expected to interact with task variability: analytical methods are anchored to the specific task situation whereas more holistic methods are not, therefore yielding more or less task variability variance respectively (some first evidence suggests this is true[xcviii]).

 

Validity suggestions

No single method provides a panacea to competence assessment. In educational practice we tend to occupy ourselves a great deal with the method of assessing, but we should rather concern ourselves more with what we put into the method. Similarly, we tend worry about the kind of competency we are measuring and the theoretical validity of our instruments, while it is probably better to worry more about the content and the relevance of the tasks we are posing to the examinees. The historical developments in competence assessment could be summarized as the continuous search for approximating professional or educational reality as close as possible while applying standardized test conditions. It is this concern which we should translate to any measurement procedure, irrespective of the method used.

The critique against MCQ-tests is critique against badly written multiple choice questions. If the knowledge to be asked is placed in an appropriate context, (e.g. a patient problem), the MCQ might have considerably more acceptability. The response format may still require to recognition of an option rather than recall (although there is no impediment to provide longer menus of options), but the impact of this cuing effect is only marginal. Similarly, essay tests may assess bare factual knowledge, a multiple station examination, clinical rituals and oral examinations memorization. It all depends what was in the test. Providing professionally or educationally valid challenges to examinees is a general requirement for any format of assessing professional competence.

To discuss relevant tasks rather than the theoretical competency being measured is a strategy which will also work more effectively in test development practice. It is quite difficult to reach consensus about the definition of competency and its subsequent operationalization into test material. However, agreement is more easily reached when professionals discuss the kind of (clinical) problems that examinees should be able to handle or which element of a (clinical) problem can be identified as a key feature[xcix] [c]. Providing appropriate context in test material is also fully in line with the cognitive psychological framework. Storage and retrieval is contextually driven. Recognition of information or patterns can only be achieved by providing a relevant professional or educational context.  By bringing these relevant contexts into test items higher cognitive abilities are more likely to be addressed and expertise differences will emerge.

Once the tasks have been defined it is important to select the most appropriate format, and here again compromises must be made, where elements of the other variables will play a major role. For example, one could argue that since validity research demonstrated that scores on MCQ-tests are able to predict scores on multiple station examinations, the cheaper and more efficient MCQ-test is to be preferred. However, the application of such an MCQ in a medical school will undoubtedly have undesirable consequences on how students will prepare themselves. When however the purpose of the test is to screen a large group of professionals (e.g., to determine needs for CME) the MCQ might be best.

Validity is strongly enhanced when test material is scrutinized by a review process (including test and item analysis afterwards). It is virtually impossible to write flawless test material regardless of the method. Even the simplest reviewing process will have beneficial effects. It requires however a preparedness of item-writers to submit their products to the critique of others. Although quite usual in research, this willingness is not so common in education.

In 1961 Ebel wrote about validity research that it is "universally praised, but the good works done in its name are remarkably few" [ci]. Unfortunately, this observation is still true. In our view, the type of validity studies required for the future need to be different. Except for predictive studies, the typical correlational research between different measures of competence to infer conclusions on their validity is neither compelling nor informative. A more fruitful line of inquiry may result from adoption of a cognitive psychological framework (or any theory-based framework) in order to study the theoretical validity of educational tests. This led to  validity questions like "what is the relationship between various stimulus formats and their psychometric characteristics"; "does contextual information influence test scores and how does it affect groups at differing levels of expertise"?

There is, however, another type of research needed.  Ebel has pleaded for researchers to apply more direct validation procedures to achievement testing (as opposed to 'derived validation' using the theoretical, correlational approach)101 [cii] [ciii]. Unlike psychological aptitude and personality tests, educational tests reflect directly meaningful tasks and allow rational analysis in relation to the domain of interest. Direct validation investigates the extent to which the tasks posed by the test represent the real-world tasks of interest. Ebel suggested that validity can be "built into" a test through careful operational definition of the tasks and content to be assessed.  Although several other authors have expressed similar views in relation to educational testing39 75 [civ], the literature mainly reports derived validation studies. Yet, as we have concluded, the content of an achievement test and the kind of tasks posed to the examinee, the exact focus of direct validation studies, will primarily dictate what is being measured.  Therefore, with Ebel we believe that direct validation studies are more needed.

Direct validation studies need not to be restricted to descriptive or qualitative studies into the content validity of tests. Empirical studies are required to validate the process between task given to the examinee and the test score reflecting the quality of performing the task. Particularly where this relationship is more complex, such as in the tests using written and live simulations, studies of this process are needed. The validity of the resulting score will depend on the appropriateness of this process:  does the scoring system reward efficiency or thoroughness; are non-indicated actions penalized; how should scores be aggregated to a total; how do the raters/examiners/patients influence the scoring, do questions sample the domain of interest, etc. When the item scores are valid the total test scores should also be valid. This approach has been called 'microscopic approach' to validity39.  I am not suggesting a discontinuance of studies into the theoretical nature of tests for professional competence with a macroscopic focus, (for instance of the kind suggested above in relation cognitive psychological issues), but we need more studies at the microscopic level. Continuing the correlational studies in the absence of a sound theoretical framework with usually flawed methodological designs have been and will be of little use for research and test development.

 

Educational  suggestions

The assessment objectives should clearly match educational objectives. When they are not, the assessment objectives will prevail. The implication for practice is to be of constant vigil of the educational effects of assessment and to try and use the driving force of assessment to achieve desirable educational effects. This is more easily said than realized, however, because there are (again) no fixed rules and there are, as we discussed, pitfalls involved. Assessment may drive learning in at least four ways.

 

a)     Assessment drives learning through its content. If we want students to be able to manage problems we should not give them tests of memory reproduction. The illustration above of the multiple station exam of the University of Limburg shows that isolated skills-testing achieves fragmented competence only: you will get out of it what you put into it. This conclusion is perhaps trivial through its simplicity, however looking at educational practice it apparently is not. Once more, the tasks should reflect professional or educational reality as close as possible.

b)     Assessment drives learning through its format. The earlier reported unexpected negative effect of using detailed checklists is an illustration how format may influence learning. Another illustration is an assessment procedure deliberately developed for its educational effect: the progress test35 [cv] [cvi], a comprehensive test using MCQs covering the integral end-objectives of a curriculum across all disciplines involved (including basic sciences). It is periodically (e.g. every three or four months) administered to all students in the curriculum regardless of their point in training.  Since the test is not directly tailored to course objectives it is difficult for students to prepare themselves specifically for the test. It has proven to be effective by not interfering with ongoing learning such as in problem-based learning programs35 [cvii].

c)     Assessment drives learning through the information given. Instead of a decision tool, assessment should also be a learning exercise. The providing of information is a key to achieve that. Feedback of assessment results, profile scores, literature references, debriefing meetings, appeal procedures are elements which enhance the information flow and increase the formative value of  assessment. Similarly, assessment results can be fed back to test developers, departments, educational committees and other institutional bodies. Test results also reflect the quality of the training program and may be used for quality monitoring and control, both at the micro-level (i.e. the evaluation of courses) and at the macro-level (i.e. the evaluation of complete programs or instructional methods).

d)     Tests drive learning through their programming. The frequency, the timing, the number of repeat examinations, the regulation of student promotions are elements of how the programming of assessment drives learning. Examinations are often in continuous competition with each other and with the ongoing educational program, and students jump from hurdle to hurdle. To organize repeat examinations may seem quite fair to examinees, but at the same time they encourage examinees to adopt minimal learning strategies. They allow examinees to 'scout' at first attempts or invite students to prepare minimally: there is always a chance of succeeding (particularly with unreliable tests) and if not, there is always a next chance. The student promotion regulations directly define the academic success and students will react strategically (which are the most important exams, what is to done first, what can we skip?). Particularly in many European countries university programs have problems with their attrition rates: unrealistic numbers of students drop out or get substantially delayed. In part, this problem is an assessment problem: too many hurdles of too high standards, few compensatory rules across assessments, etcetera. A recent study showed that variations in attrition rates in a medical school across 25 years were directly linked to their examination rules while their was no evidence of variations in ability between cohorts of students[cviii].

 

As was suggested and illustrated earlier, educational effects of assessment are often unexpected. Moreover, the dangers of mere test-directed studying were pointed out. An additional complexity is the fact that the strategic use of assessment surmounts the level of the individual test developer, teacher or department. The integrated assessment program as a total system will drive learning. Therefore, changes at the micro level in parts of the system will not have major effects. Strategic use of assessment is most effective at the macro-level. School-wide assessment or centralized assessment programs are however very rare in educational programs[cix].

In our view the impact of assessment on the educational process is a variable which allows little compromise. We would argue that educational impact is the heart of educational achievement testing: assessment should be part of learning process in order to achieve educational objectives set out in the training program. Any compromise here directly affects the quality of the educational training program.

The educational use of assessment should also have higher priority in research. Despite the wide recognition of its importance, the empirical work reported is scarce. Methodologically it will be difficult to carry out research. The complexity of context-bound interactions may limit the relevance of an experimental approach. Survey research and case-studies are more likely indicated.

 

Acceptability suggestions

Just like students or examinees adhere to understandable behavioral patterns, faculty have similar human patterns. For example, examiners usually hate strongly structured assessments. It does not exploit their expertise and restricts their freedom as professionals. Similarly, examiners usually value direct contact with examinees, even through their written responses, rather than impersonal and mechanistic judgements. These are important factors to be considered (and to be used). More difficult is the set of values which are often brought to assessment. They are based on personal experience, beliefs and (mis)conceptions. Although using research results and empirical evidence is considered as professional behavior in health practice and in research work, this attitude does not easily generalize to education. Faculty are usually unaware of educational research or do not consider it very important92. In addition, educational traditions are often  deeply anchored in countries and institutions. Students naturally have similar beliefs and attitudes towards assessment.

The practical implication is that, regardless of their justification, elements influencing the acceptability of achievement testing need to be considered in the choice and design of an assessment procedure or program. Assessment not accepted by staff or students will not survive. The issue is to attempt to strategically use the information on faculty and student beliefs in order to get their commitment. Provision of information is a key element in this strategy, but the willingness to compromise is definitely another.

 

Cost

Good assessment is definitely costly. Test construction with built-in review and control processes, development of high fidelity simulations, training of examiners and patients, test administration, data processing, feedback to students, staff and the institution, monitoring of effects, are all resource intensive activities. The cost of assessment requires compromises in practice. There are three remarks however in this respect.

First, investing in assessment is investing in teaching and learning. Given the lawful relationship between assessment and learning, good assessment will facilitate good learning. In other words, an investment in educational achievement testing is worthwhile and will pay off. Second, a different perspective emerges when cost of assessment is related to the cost of teaching. The expenditure for teaching is more easily accepted than for assessment, but it remains a matter of priorities and allocation of resources. With relatively small shifts in this balance substantial improvement in test development could be achieved. Third, perceived resource intensive assessment methods turn out to be feasible in practice. The widespread use of multiple station examinations, including initiatives for nation-wide introduction38 [cx], is proof. However, more studies would be useful reporting the precise costs involved in assessment procedures[cxi].

 

Conclusion

The current state of the art in the assessment of professional competence is unfortunately more complex than a recipe book of agreed testing technology options. Many intuitive beliefs about assessment appeared naive or incorrect. On the other hand, clear progress has been made. The history of assessment is characterized by continuous attempts to approximate the real professional or educational world as close as possible, while maintaining standardized test-taking conditions. This is the essence of professional competence assessment, and should be applied to any assessment of professional competence, regardless of the format. Numerous assessment technology has been developed in the course of time and is available. However, there is more than the technology of assessment. Assessment as an educational strategy should become more of a concern of test developers and training institutions. Extending assessment technology towards maximal fidelity and its planned educational use will be the challenge for the future.

 

References

 



[1]' In 1986.  USSI = L 1490

[2] In 1985, the lowest rate was US$ 1 = ¥ 263.65: the highest rate was US$ 1= ¥ 199.80. In 1986. the lowest rate was US$ 1 = ¥ 203.30; the highest rate was US$1 = ¥ 152.55 in Tokyo.

 

[3]From Fuji to Everest, Forbes. May 2, 1988.

 

[4] Harvard Business School case study, Canon Inc., World-wide Copier Strategy, 1983. page 2

 

[5] InfoSource's classification scheme was- generally used as a way to segment the market, as follows: Category 1 - less than 20 copies per minute (cpm): Category 2 - 20-39 cpm; Category 3 - 40-59 cpm; Category 4 - 60-89 cpm: Category 5 - 90 + cpm. The  Personal Copier category was subsequently added for copiers generating less than 10cpm.

 

     [6]A test item is the smallest independent test unit, and may consist of multiple sub-units. For instance, a checklist for a particular station will probably contain a number checklist-items, but they are clustered together through the content of the station. If an examinee has no knowledge of the content area of the station, he or she will have more chance to fail all items, i.e. the items are dependent on each other. The station score is therefore the 'item' in the test.

     [7]This is the mulitrait-multimethod approach to validity considered to be the strongest design in classical trait research, using multiple measures assessing multiple traits in a fully crossed way in order to separate trait variance form method variance (Campbell DT, Fiske DW. Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin 1959; 56: 81-105.)

     [8]Some may call this as face validity. Face validity is often used to indicate to the validity of tests at first impression or at face value. Face validity will evidently influence acceptability, but the latter it is meant here in a broader sense and includes the entire belief system of people in relation to assessment or an assessment method.



[i]. Ebel RL, Frisbie DA. Essentials of educational measurement. Englewood Cliffs, New Jersey: Prentice-Hall, 1986.

[ii]. Ory JC, Ryan KE. Tips for improving testing and grading. Newbury Park, Califorina: Sage Publications, 1993

[iii]. Pickering G. Against multiple-choice questions. Medical Teacher 1979; 1: 84-6.

[iv]. Newble, DI, Baxter A, Elmslie G. A comparison of multiple choice and free response tests in examinations of clinical competence. Medical Education 1979; 13: 263-8.

[v]. McGuire C. Perspectives in Assessment. Academic Medicine (Supplement) 1993; 68: S3-8.

[vi]. Case SM, Swanson DB. Extended-matching items: a practical alternative to free-response questions. Teaching and Learning in Medicine 1993; 5: 107-15.

[vii]. Schuwirth LWT, van der Vleuten CPM, Donkers HHLM Open ended questions versus multiple choice questions: An analysis of cueing effects. In: Harden RM, Hart IR, Mulhol­land H (Editors) Approaches to assessment of clinical competence - Part II. Norwich: Page Brothers, 1992: 486-91.

[viii]. Linn RL. Educational assessment: expanded expectations and challenges. Educational Evaluation and Policy Analysis 1993: 15; 1-16.

[ix].Van der Vleuten CPM & Newble DI How can clinical reasoning be tested? The Lancet 1995; 345: 1032-1034.

[x]. McGuire CH, Babbott D. Simulation technique in the measurement of problem solving skills. Journal of Educational Measurement 1967; 4: 1-10

[xi]. Hodgkin K, Knox JDE. Problem centered learning: The modified essay question in medical education. Edinburgh: Churchill-Livingstone, 1975.

[xii]. Feletti GI, Saunders NA, Smith AJ. Comprehensive assessment of final‑year medical student performance based on undergraduate programme objectives. Lancet 1983; 2(8340): 34-7.

[xiii]. Barrows HS, Tamblyn RM. The portable patient problem pack (P4). A problem-based learning unit. Journal of Medical Education 1977; 52:1002-4.

[xiv]. Wiliams RG, Vu NV, Barrows HG, Verhulst S. Profile of the Clinical Reason­ing Test (CRT): An objective measure of problem solving skills and proficiency in using medical knowledge. In: Schmidt HG, De Volder ML (Eds.) Tutorials in Problem-Based Learning. Assen: Van Gorcum, 1984; 81-90.

[xv]. Norcini JJ, Meskauskas JA, Langdon LO, Webster GD. An evaluation of a computer simulation in the assessment of physician competence. Evaluation in the Health Professions 1986 ; 9: 286-304.

[xvi]. Bligh TJ. Written simulation scoring: comparison of nine systems. [dissertation] Urbana-Champaign (IL), University of Illinois, 1980.

[xvii]. Swanson D, Norcini J, Grosso L Assessment of clinical competence: Written and computer-based simulations. Assessment and Evaluation in Higher Education 1987; 12: 220-46.

 

[xviii]. Norman G, Bordage G, Curry L et al. A review of recent innovations in assessment. In: Wakeford RE, ed. Directions in clinical assessment. Report of the First Cambridge Conference. Cambridge: Cambridge University School of Clinical Medicine, 1985; 9-27.

[xix]. Elstein A, Shulman LS, Sprafka SA. Medical problem solving: An analysis of clinical reasoning. Cambridge Massachusetts: Harvard University Press, 1978.

[xx]. Friedman R, Korst D, Schultz J, Beatty E, Entine S. Experience with the simulated patient physician encounter. Journal of Medical Education 1978; 53-825-30.

[xxi]. McLeskey C, Ward R. Validity of written examinations. Anesthesiology 1978; 49: 224.

[xxii]. Marshall J (1977) Assessment of problem-solving ability. Medical Education 1977: 11; 329-334.

[xxiii]. Newble DI, Hoare J, Baxter A. Patient management problems: issues of validity. Medical Education 1982; 16: 137-42.

[xxiv]. Norman GR, Feightner JW A comparison of behaviour on simulated patients and patient management problems. Journal of Medical Education 1981; 55: 529-37.

[xxv]. Norcini JJ, Swanson DB, Grosso LJ, Shea JA, Webster GD Reliabili­ty, validity and efficiency of multiple choice question and patient management problem item formats in the assessment of physician competence. Medical Education 1985; 19: 238-47.

[xxvi]. Bordage G, Page G An alternative approach to PMPs: The "key features" concept. In: Hart IR, Harden RM, eds. Further Developments in Assessing Clinical Competence. Montreal: Heal-Publications, 1987; 59-75.

[xxvii]. De Graaff E. Post G. Drop M. Validation of a new measure of clinical problem­-solving. Medical Education 1987; 21: 213‑218.

[xxviii]. Powles ACP, Wintrup N, Neufeld VR,Wakefield JH, Coates G, Burrows J. The triple jump exercise: Further studies of an evaluative technique. Proceedings of the 20th Annual Conference on Research in Medical Education, Washington: American Association of Medical Colleges, 1981: 74-9.

[xxix]. Fiedman CP, Murphy GC, Smith AC, Mattern WD. Exploratory study of an examination format for problem-based learning. Teaching and Learning in Medicine 1994; 6: 194-8.

[xxx]. Boud D. The role of self-assessment in student grading. Assessment and Evaluation in Higher Education 1989; 14: 20-30.

[xxxi]. De Grave W, De Volder M. Peer-evaluation and problem-based learning. In: Schmidt H, De Volder M. (Editors) Tutorials in Problem-based learning. Assen: Van Gorcum, 1984: 116-122.

[xxxii]. Magzoub M. Studies in Community-based Education [dissertation]. Maastricht: University of Limburg, 1994.

[xxxiii]. Case SM, Swanson DB, Van der Vleuten CPM Student assessment in problem-based learning curricula. In: Boud D, Feletti G. (Editors) The Challenge of Problem-based Learning. London: Kogan Page, 1991: 260-73.

 

[xxxiv]. Gordon MJ. A review of the validity and accuracy of self-assessments in health professions training. Academic Medicine 1991; 66: 762-69.

[xxxv]. Blake JM, Norman GR, Smith EKM. Report card from McMaster: student evaluation at a problem-based medical school. Lancet 1995; 345: 899-902.

[xxxvi]. Harden R, Gleeson F. Assessment of clinical competence using an objecti­ve structured clinical examination (OSCE). Medical Education 1979; 13: 41‑54.

[xxxvii]. Rothman AI, Cohen R. (Editors) Proceedings of the Sixth Ottawa Conference on Medical Education. Toronto: University of Toronto Bookstore Custom Publishing, 1995: 292-321.

[xxxviii]. Reznick R, Blackmore DE, Cohen R et al. An objective structured clinical examination for the licentiate of the Medical Council of Canada: From research to reality. Academic Medicine [Supplement] 1993; 68: S4-6.

[xxxix]. Van der Vleuten CPM, Swanson DB. Assessment of clinical skills with stand­ardized patients: State of the art. Teaching and Learning in Medicine 1990; 2: 58-76.

[xl]. Vu NV, Barrows HS. Use of standardized patients in clinical assessments: recent developments and measurement findings. Educational Researcher 1994; 23: 23-30.

[xli]. Swanson DB, Norman GR, Linn RL. Performance-based assessment: Lessons from the health professions. Educational Researcher 1995; 24: 5-11,35.

[xlii]. Van Thiel J. Kraan HF, Van der Vleuten CPM Reliability and feasibility of measu­ring interviewing skills using the revised Maastricht History Taking and Advice Checklist. Medical Education 1991; 25: 224-9.

[xliii]. Swanson DB. A measurement framework for performance-based tests. In: Hart IR, Harden RM, editors. Further developments in assessing clinical competence. Montreal: Can-Heal, 1987: 13-45.

[xliv]. Stalenhoef-Halling BF, Van der Vleuten CPM , Jaspers TAM, Fiolet JFBM. The feasibility, acceptability and reliability of open-ended questions in a problem-based learning curriculum. In: Bender W, Hiemstra RJ, Scherpbier AJJA, Zwierstra RP (Editors) Teaching and Assessing Clinical Competence. Groningen: Boekwerk Publ, 1990: 552-7.

[xlv]. Erviti V, Templeton B, Bunce J, Burg F. The relationships of pediatric resident recording behavior across medical conditions. Medical Care 1980, 18, 1020‑31.

[xlvi]. Rethans JJ, Sturmans F, Drop MJ, Van der Vleuten CPM (1991) Assessment of perfor­mance in actual practice of general practitioners by use of standardized patients. British Journal of General Practice 1991; 41: 97-9.

[xlvii]. Shavelson RJ, Baxter GP, Gao X. Sampling variability of performance assessments. Journal of Educational Measurement 1993; 30: 215-32.

[xlviii]. Klein as cited in: Linn RL. Educational assessment: expanded expectations and challenges. Educational Evaluation and Policy Analysis 1993: 15; 1-16.

[xlix]. Shavelson RJ, Mayberry P, Li W, Webb NM. Generalizability of military performance measurements: Marine Corps rifleman. Military Psychology 1990; 2: 129-44.

 

[l]. Frijns PHAM, Van der Vleuten CPM, Verwijnen GM,  Van Leeuwen YD The effect of structure in scoring methods on the reproducibility of tests using open-ended questions. In: Bender W, Hiemstra RJ, Scherpbier AJJA, Zwierstra RP (Editors.) Teaching and Assessing Clinical Competence. Groningen: Boekwerk Publ, 1990: 466-471.

[li]. Van Ham I, Gerritsma J. The assessment of clinical competence in general practice with chart stimulated recall. In: Bender W, Hiemstra RJ, Scherpbier AJJA, Zwierstra RP, editors. Teaching and assessing clinical competence. Groningen: Boekwerk, 1990: 306-9.

[lii]. Van der Vleuten CPM , Norman GR, De Graaff E. Pitfalls in the pursuit of objecti­vi­ty: Issues of reliability. Medical Education 1991; 25: 110-8.

[liii]. Streiner DL. Global Rating Scales. In: Neufeld VR, Norman GR (Editors) Assessing Clinical Competence. New York: Springer, 1985: 119-41.

[liv]. Streiner D. Clinical ratings - ward evaluation. In:Shannon S, Norman G. (Editors) Evaluation methods: A resource handbook. Hamilton: The Program for Educational Development, McMaster University, 1995: 29-31.

[lv]. Hastorf AH, Schneider DJ, Polefka J. Person perception. Reading, Massachusetts: Addison-Wesley, 1970.

[lvi]. Ross M. Relation of implicit theories to the construction of personal histories. Psychological Review 1989; 96: 341-57.

[lvii]. Norman GR, Swanson DB, Case SM. Conceptual and methodological issues in studies comparing assessment formats. Teaching and Learning, in press.

[lviii]. Norman GR, Smith E, Powles A, Rooney P, Henry N, Dodd P. Factors underlying performance on written tests of knowledge. Medical Education 1987; 21: 297-304.

[lix].  Jean P, Schuwirth L, Van Santen M, Van der Vleuten C. Do problem analysis questions (PAQs) and true/false questions (TFQs) measure different skills? Medical Education, in press.

[lx]. Maatsch J, Huang R. An evaluation of the construct validity of four alternative theories of clinical competence. Proceedings of the Twenty-fifth Annual Conference on Research in Medical Education, American Association of Medical Colleges. Washington, DC, 1986.

[lxi]. Maatsch J. Model for a criterion-referenced medical specialty test. Final Report Grant No. HS-02038-02, Office of medical Education Research and Development Michigan State University, 1980.

[lxii]. Van der Vleuten CPM, Van Luijk S, Beckers HJM. A written test as an alter­native to performance testing. Medical Education 1989; 23: 97-107.

[lxiii]. Ward W. A comparison of free-response and multiple choice forms of verbal aptitude tests. Applied Psychological Measurement 1982; 6: 1-11.

[lxiv]. Thissen D, Wainer H, Wang X. Are tests comprising both multiple-choice and free-response items necessarily unidimensional than multiple-choice tests? An analysis of two tests. Journal of Educational Measurement 1994: 31; 113-23.

 

[lxv]. Ramsey PG, Carline JD, Inui YS et al., Predictive validity of certification by the American Board of Internal Medicine. Annals of Internal Medicine 1989; 110: 719-26.

[lxvi]. Solomon et al 1990 as cited in Norman GR. Can an examination predict competence? The role of recertification in maintenance of competence. Annals of the Royal College of Physicians and Surgeons of Canada 1991; 24: 121-124.

[lxvii]. Norman GR, Davis DA, Painvin A, Rath D, Ragbeer M. Comprehensive assessment of clinical competence of family-general physicians using multiple measures. Proceedings 28th Conference on Research in Medical Education. Washington: American Association of Medical Colleges, 1989.

[lxviii]. Norman GR. Can an examination predict competence? The role of recertification in maintenance of competence. Annals of the Royal College of Physicians and Surgeons of Canada 1991; 24: 121-124.

[lxix]. McGuire C. Written methods for assessing clinical competence. In: Hart IR, Harden RM, editors. Further developments in assessing clinical competence. Montreal: Can-Heal, 1987: 44-58.

[lxx]. Hager P, Gonczi A, Athanasou J. General issues about assessment of competence. Asses­sment & Evaluation in Higher Education 1994; 19: 3-16.

[lxxi]. Norman GR, Van der Vleuten CPM, De Graaff E. Pitfalls in the pursuit of objecti­vi­ty: Issues of validity, efficiency and acceptability. Medical Education 1991, 25, 119-126.

[lxxii]. Van Thiel J, van der Vleuten, CPM, Kraan H. Assessment of medical interviewing skills: Generalizability of scores using successive MAAS-versions. In: Harden RM, Hart IR, Mulholland H. (Editors) Approaches to assessment of clinical competence - Part II. Norwich: Page Brothers, 1992: 536-540.

[lxxiii]. Newble D, Jaeger K, The effect of assessments and examinations on the learning of medical students. Medical Education 1983; 17: 165-71.

[lxxiv]. Popham WJ. Measurement as an instructional catalyst. In: Ekstrom RB. (editor) Measurement, technology and individuality in education. San Francisco: Jossey-Bass, 1983: 87-103.

[lxxv]. Frederiksen N. The real test bias: influences of testing on teaching and learning. American Psychologist 1984; 39: 193‑202

[lxxvi]. Entwistle N. Styles of Learning and Teaching. Chichester: John Wiley & Sons, 1981.

[lxxvii]. Stillman P, Swanson D. Ensuring the clinical competence of medical school graduates through standardized patients. Archives of Internal Medicine 1987; 147: 1049‑52.

[lxxviii]. Gibbs G. Improving the quality of student learning. Bristol: Technical & Educational Services, 1992.

[lxxix]. Shepard LA, Psychometrician's beliefs about learning. Educational researcher 1991; 20: 2-16.

[lxxx]. Snyder BR. The hidden curriculum. New York: Knopf, 1971.

[lxxxi]. Popham WJ, Cruse KL, Rankin SC, Sandifer PD, Williams PL. Measurement-driven instruction: It's on the road. Phi Delta Kappan 1985; 66: 628-34.

 

[lxxxii]. Semb GB, Ellis, JA Knowledge taught in school: What is remembered?  Review of Educational Research 1994; 64: 253‑86.

[lxxxiii]. Van Luijk SJ, Van der Vleuten CPM, Schelven RM. The relation between content and psychometric characteristics in performance-based testing. In: Bender W, Hiemstra RJ, Scherpbier AJJA, Zwierstra RP (Editors) Teaching and Assessing Clinical Competence. Groningen: Boekwerk Publications, 1990: 202-207.

[lxxxiv]. Mischel W. Personality and assessment. New York: John Wiley, 1968.

[lxxxv]. Schmidt H, Norman G, Boshuizen HA. cognitive perspective on medical expertise: Theory and implications. Academic Medicine 1990; 65: 611-21.

[lxxxvi]. Norman G, Allery L, Berkson, L et al. Research in the psychology of clinical reasoning: implications for assessment. Paper from the Fourth Cambridge Conference, Cambridge University School of Clinical Medicine, Cambridge, 1989.

[lxxxvii]. Higgs J, Jones M. (editors). Clinical reasoning in the health professions. Oxford: Butterworth/Heine­mann, 1995.

[lxxxviii]. Norman G, Regehr G. Contempory issues in cognitive psychology: Implications for professional education. In: Rothman AI, Cohen R. (Editors) Proceedings of the Sixth Ottawa Conference on Medical Education. Toronto: University of Toronto Bookstore Custom Publishing, 1995: 17-25.

[lxxxix]. Rethans JJ, Sturmans F, Drop MJ et al. Does compe­tence of general practi­tioners predict their performance. British Medical Journal 1991; 303: 1377‑80.

[xc]. Norman GR. Reliability and construct validity of some cognitive measures of clinical reasoning. Teaching and Learning in Medicine 1989; 1: 194-9.

[xci]. Newble DI, Raymond GA. The Pattern  Completion Item (PCI): A potential measure of clinical problem-solving skills. In: Rothman AI, Cohen R. (Editors) Proceedings of the Sixth Ottawa Conference on Medical Education. Toronto: University of Toronto Bookstore Custom Publishing, 1995: 191-2.

[xcii]. Nelson MS, Clayton BL, Moreno R. How medical school faculty regard educational research and make pedagogical decisions 1990; 65: 122-6.

[xciii]. Hays RB, Fabb WE, Van der Vleuten CPM. Reliability of the fellows­hip exami­nation of the Royal Australian College of General Practitioners. Teaching and Learning in Medi­cine 1995: 7; 43-50.

[xciv]. Van Luijk SJ, Van der Vleuten CPM. A comparison of checklists and rating scales in performance-based testing.  In: Hart IR, Harden RM, Des Marchais J, editors. Current Developments in Asses­sing Clinical Competence. Montreal: Can-Heal, 1992: 357-62.

[xcv]. Newble, D., Dawson, B., Dauphinee D, et al. Guidelines for asses­sing clinical compe­tence. Tea­ching and Learning in Medicine 1994: 6; 213-220.

[xcvi]. Endler NS, Magnusson D, editors. Interfactional psychology and personality. Washington DC: Hemisphere, 1976. 

 

[xcvii]. Van der Vleuten CPM, Schuwirth LWT, Ronteltap CFM. A cognitive psychological interpretation of a few remarkable psychometric findings. In: Rothman AI, Cohen R. (Editors) Proceedings of the Sixth Ottawa Conference on Medical Education. Toronto: University of Toronto Bookstore Custom Publishing, 1995: 506-8.

[xcviii]. Frijns PHAM. Scoringsmodellen voor open-vraag vormen (Scoring models for free-response formats) [dissertation].  Maastricht: University of Limburg, 1992.

[xcix]. Brailovsky C, Bordage G, Carretier H, Page G. Content validity of the key features' approach of the Medical Council of Canada's Exam. In: Harden RM, Hart IR, Mulhol­land H (Editors) Approaches to assessment of clinical competence - Part II. Norwich: Page Brothers, 1992: 476-7.

[c]. Bordage G, Brailovsky C, Carretier H, Page G, Content validation of key features on a national examination of clinical desicion-making skills. Academic Medicine 1995; 70: 276-81.

[ci]. Ebel RL. Must all tests be valid? American Psychologist 1961; 16: 640-7.

[cii]. Ebel R. Measuring Educational Achievement. Englewood Cliffs: Prentice‑Hall Inc, 1965.

[ciii]. Ebel R. The practical validation of tests of ability. Educational Measurement: Issues and practice 1983; 2: 7-10.

[civ]. Kane M. The validity of licensure examinations. American Psycholo­gist 1982: 37; 911‑918.

[cv]. Arnold L, Willoughby TL. The Quarterly Profile Examination. Academic Medicine 1990; 65; 515-6.

[cvi]. Van der Vleuten CPM, Verwijnen GM, Wijnen WHFW. Fifteen years of experience with progress-testing. Medical Teacher, in press.

[cvii]. Van Berkel HJM, Nuy HJP, Geerligs T. The influence of progress tests and block tests on study behavior. Instructional Science 1995; 22: 315-331.

[cviii]. Cohen-Schotanus J. Effecten van curriculumveranderingen. [dissertation, with English summary]. Groningen: University of Groningen, 1994.

[cix]. Van der Vleuten CPM, Verwijnen GM. A system for student assessment. In: Van der Vleuten CPM, Wijnen WHFW, editors. Problem-based learning: Perspectives from the Maastricht experience. Amster­dam: Thesis-publ., 1990: 27-49.

[cx]. Klass D, Clauser B, Fletcher E et al. Progress in developing a standardized patient test of clinical skills at the National Board of Medical Examiners: Prototype two. In: Rothman AI, Cohen R. (Editors) Proceedings of the Sixth Ottawa Conference on Medical Education. Toronto: University of Toronto Bookstore Custom Publishing, 1995: 324-6.

[cxi]. Reznick RK, Smee S, Baumber JS, et al. Guidelines for estimating the real cost of an Objective Structured Clinical Examination. Academic Medicine 1993; 68: 513-17.