Rating the Wine Rating Systems
People turn to wine critics to tell them what’s really inside that expensive bottle (or that cheap one) and how various wines compare. Some critics are famous for their detailed wine tasting notes (Michael Broadbent comes to mind here) that provide comprehensive qualitative evaluation of wines, but with so many choices in today’s global market it is almost inevitable that quantitative rating scales would evolve. They simplify wine evaluation, which is what many consumers are looking for, but they have complicated matters, too, because there is no single accepted system to provide the rankings.
I’m interested in the variety of wine rating systems and scales that wine critics employ and the controversies that surround them. This blog entry is a intended to be a brief guide for the perplexed, an analysis of the practical and theoretical difficulties of making and using wine ranking systems.
Wine Rating Scales: 100-points, 20-points, Three Glasses and More
The first problem is that different wine critic publications use different techniques to evaluate wine and different rating scales to compare them. Click on this image to see a useful comparison of wine rating systems compiled by De Long Wine(click here to download the pdf version, which is easier to read).
Robert Parker’s Wine Advocate, the Wine Spectator and Wine Enthusiast all use a 100-point rating scale, although the qualitative meanings associated with the numbers are not exactly the same. It is perhaps not an accident that these are all American publications and that American wine readers are familiar with 100-point ratings from their high school and college classes.
In theory a 100-point system allows wine critics to be very precise in their relative ratings (a 85-point syrah really is better than an 84-point syrah) although in practice many consumers may not be able to appreciate the distinction. Significantly, it is not really a 100-point scale since 50 points is functionally the lowest grade and it is rare to see wines rated for scores lower than 70, so the scale is not really as precise as it might seem. ( Any professor or teacher will tell you, there has been both grade inflation and grade compression in recent years and this applies to wine critics too, I believe.)
The 100-point scale is far from universal. The enologists at the University of California at Davis use a 20-point rating scale, as does British wine critic Jancis Robinson and Decanter, the leading global wine magazine. The 20-point scale actually corresponds to how students are graded in French high schools and universities, so perhaps that says something about its origins.
The Davis 20-point scale gives up to 4 points for appearance, 6 points for smell, 8 points for taste and 2 for overall harmony, according to my copy of The Taste of Wine by Emile Peynaud. The Office International du Vin’s 20-point scale has different relative weights for wine qualities; it awards 4 points for appearance, 4 for smell and 12 for taste. Oz Clarke’s 20 point system assigns 2, 6 and 12 points for look, smell and taste. It’s easy to understand how the same wine can receive different scores when different critics used different criteria and different weights.
A 20-point scale (which is often really a 10-point scale) offers less precision in relative rankings, since only whole and half point ratings are available, but this may be appropriate depending upon how the ratings are to be used. Wines rated 85, 86 and 87 on a 100-point scale, for example, might all receive scores of about 16 on a 20 point scale. It’s up to you to decide if the finer evaluative grid provides useful information.
Decanter uses both a 20-point scale and as well as simple guide of zero to five stars to rate wines, where one star is “acceptable”, two is quite good, three is recommended, four is highly recommended and five is, well I suppose an American would say awesome, but the British are more reserved. Dorothy J. Gaiter and John Brecher (who write an influential wine column for the Wall Street Journal) also use a five point system; they rates wines from OK to Good, Very Good, Delicious and Delicious(!).
The five point system allows for less precision but it is still very useful – it is the system commonly used to rate hotels and resorts, for example. ViniD’Italia, the Italian wine guide published by Gambero Rosso, uses a three-glasses scale that will be familiar to European consumers who use the Michelin Guide’s three-star scale to rate restaurants.
Which System if Best?
It is natural to think that the best system is the one that provides the most information, so a 100-point scale must be best, but I’m not sure that’s true. Emile Peynaud makes the point that how you go about tasting and evaluating wine is different depending upon your purpose. Critical wine evaluation to uncover the flaws in wine (to advise a winemaker, for example) is different in his book from commercial tasting (as the basis for ordering wine for a restaurant or wine distributor or perhaps buying wine as an investment) which is different consumer tasting to see what you like.
Many will disagree, but it seems to me that the simple three or five stars/glasses/points systems are probably adequate for consumer tasting use while the 20- and 100-point scales are better suited for commercial purposes. I’m not sure that numbers or stars are useful at all for critical wine evaluation – for that you need Broadbent’s detailed qualitative notes. Wine critic publications often try to serve all three of these markets, which may explain why they use the most detailed systems or use a dual system like Decanter.
In any case, however, it seems to me that greater transparency would be useful. First, it is important that the criteria and weights are highlighted and not buried in footnotes. And I don’t see why a 20-point rating couldn’t be disaggregated like this: 15 (3/6/6) for a 20-point system that gives up to 4 points for appearance, 6 for smell and 10 for taste. That would tell me quickly how this wine differs from a 15 (4/3/8). Depending upon how much I value aroma in a wine and what type of wine it is, I might prefer the first “15″ wine to the second.
Wine and Figure Skating?
So far I’ve focused on the practical problems associated with having different evaluation scales with different weights for different purposes, but there are even more serious difficulties in wine rating scales. In economics we learn that numerical measures are either cardinal or ordinal. Cardinal measures have constant units of measurment that can be compared and manipulated mathematically with ease. Weight (measured by a scale) and length (measured in feet or meters) are cardinal measures. Every kilogram or kilometer is the same.
Ordinal measures are different – they provide only a rank ordering. If I asked you to rate three wines from your most preferred to your least favorite, for example, that would be an ordinal ranking. You and I might agree about the order (rating wines A over C over B, for example), but we might disagree about how much better A was compared to C. I might think it was a little better, but for you the difference could be profound.
To use a familiar example from sports, they give the Olympic gold medal in the long jump based upon a cardinal measure of performance (length of jump) and they give the gold medal in figure skating based upon ordinal judges’ scores, which are relative not absolute measures of performance (in the U.S. they actually call the judges’ scores “ordinals”). Figure skating ratings are controversial for the same reason wine scores are.
So what kind of judgment do we make when we taste wine — do we evaluate against an absolute standard like in the long jump or a relative one like the figure skating judges? The answer is both, but in different proportions. An expert taster will have an exact idea of what a wine should be and can rate accordingly, but you and I might only be able to rank order different wines, since our abilities to make absolute judgements aren’t well developed.
This is one reason why multi-wine social blind tasting parties almost always produce unexpected winners or favorites. The wines we like better [relative] are not always the ones we like best [absolute] when evaluated on their own.
Ordinal and cardinal are just different, like apples and oranges (or Pinot Gris and Chardonnay). Imagine what the long jump would look like if ordinal “style points” were awarded? Imagine what figure skating would look like if the jumps and throws were rated by cardinal measures distance and hang time? No, it wouldn’t be a pretty sight.
Economists are taught that it is a mistake to treat ordinal rankings as if they are cardinal rankings, but that’s what I think we wine folks do sometimes. I’ve read than Jancis Robinson, who studied Mathematics at Oxford, isn’t entirely comfortable with numeric wine ratings. Perhaps it is because she appreciates this methodological difficulty.
Lessons of the Judgment of Paris
Or maybe she’s just smart. Smart enough to know that your 18-point wine may be my 14-pointer. It’s clear that people approach wine with different tastes, tasting skills, expectations and even different taste buds, so relative rankings by one person need not be shared by others. This is true of even professional tasters, as the Judgment of Paris made clear.
The Judgment of Paris (the topic of a great book by George M. Taber – see below – and two questionable forthcoming films) was a 1976 blind tasting of French versus American wines organized (in Paris, of course) by Steven Spurrier. It became famous because a panel of French wine experts found to their surprise that American wines were as good as or even better than prestigious wines from French.
A recent article by Dennis Lindley (professor emeritus at University College London – see below) casts doubt on this conclusion, however. Read the article for the full analysis, but for now just click on the image above to see the actual scores of the 11 judges. It doesn’t take much effort to see that these experts disagreed as much as they agreed about the quality of the wines they tasted. The 1971 Mayacamas Cabernet, for example, received scores as low as 3 and 5 on a 20-point scale along with ratings as high as 12, 13 and 14. It was simultaneous undrinkable (according to a famous sommelier) and pretty darn good (according to the owner of a famous wine property). If the experts don’t agree with each other, what is the chance that you will agree with them?
Does this mean that wine critics and their rating systems are useless and should disappear? Not likely. Wine ratings are useful to consumers, who face an enormous range of choices and desperately need information, even if it is practically problematic and theoretically suspect. Wine ratings are useful commercially, too. Winemakers need to find ways to reduce consumer uncertainty and therefore increase sales and wine ratings serve that purpose.
And then, of course, there is the wine critic industry itself, which knows that ratings sell magazines and drive advertising. Wine ratings are here to stay. We just need to understand them better and use them more effectively.
Dennis V. Lindley, “Analysis of a Wine Tasting.” Journal of Wine Economics 1:1 (May 2006) 33-41.
George M. Taber, Judgment of Paris: California vs. France and the History 1976 Paris Tasting that Revolutionized Wine. Scribner, 2005.