Wine by the Numbers

Rating the Wine Rating Systems

People turn to wine critics to tell them what’s really inside that expensive bottle (or that cheap one) and how various wines compare. Some critics are famous for their detailed wine tasting notes (Michael Broadbent comes to mind here) that provide comprehensive qualitative evaluation of wines, but with so many choices in today’s global market it is almost inevitable that quantitative rating scales would evolve. They simplify wine evaluation, which is what many consumers are looking for, but they have complicated matters, too, because there is no single accepted system to provide the rankings.

I’m interested in the variety of wine rating systems and scales that wine critics employ and the controversies that surround them. This blog entry is a intended to be a brief guide for the perplexed, an analysis of the practical and theoretical difficulties of making and using wine ranking systems.

Wine Rating Scales: 100-points, 20-points, Three Glasses and More

winescales.jpgThe first problem is that different wine critic publications use different techniques to evaluate wine and different rating scales to compare them. Click on this image to see a useful comparison of wine rating systems compiled by De Long Wine(click here to download the pdf version, which is easier to read).

Robert Parker’s Wine Advocate, the Wine Spectator and Wine Enthusiast all use a 100-point rating scale, although the qualitative meanings associated with the numbers are not exactly the same. It is perhaps not an accident that these are all American publications and that American wine readers are familiar with 100-point ratings from their high school and college classes.

In theory a 100-point system allows wine critics to be very precise in their relative ratings (a 85-point syrah really is better than an 84-point syrah) although in practice many consumers may not be able to appreciate the distinction. Significantly, it is not really a 100-point scale since 50 points is functionally the lowest grade and it is rare to see wines rated for scores lower than 70, so the scale is not really as precise as it might seem. ( Any professor or teacher will tell you, there has been both grade inflation and grade compression in recent years and this applies to wine critics too, I believe.)

The 100-point scale is far from universal. The enologists at the University of California at Davis use a 20-point rating scale, as does British wine critic Jancis Robinson and Decanter, the leading global wine magazine. The 20-point scale actually corresponds to how students are graded in French high schools and universities, so perhaps that says something about its origins.

The Davis 20-point scale gives up to 4 points for appearance, 6 points for smell, 8 points for taste and 2 for overall harmony, according to my copy of The Taste of Wine by Emile Peynaud. The Office International du Vin’s 20-point scale has different relative weights for wine qualities; it awards 4 points for appearance, 4 for smell and 12 for taste. Oz Clarke’s 20 point system assigns 2, 6 and 12 points for look, smell and taste. It’s easy to understand how the same wine can receive different scores when different critics used different criteria and different weights.

A 20-point scale (which is often really a 10-point scale) offers less precision in relative rankings, since only whole and half point ratings are available, but this may be appropriate depending upon how the ratings are to be used. Wines rated 85, 86 and 87 on a 100-point scale, for example, might all receive scores of about 16 on a 20 point scale. It’s up to you to decide if the finer evaluative grid provides useful information.

Decanter uses both a 20-point scale and as well as simple guide of zero to five stars to rate wines, where one star is “acceptable”, two is quite good, three is recommended, four is highly recommended and five is, well I suppose an American would say awesome, but the British are more reserved. Dorothy J. Gaiter and John Brecher (who write an influential wine column for the Wall Street Journal) also use a five point system; they rates wines from OK to Good, Very Good, Delicious and Delicious(!).

The five point system allows for less precision but it is still very useful – it is the system commonly used to rate hotels and resorts, for example. ViniD’Italia, the Italian wine guide published by Gambero Rosso, uses a three-glasses scale that will be familiar to European consumers who use the Michelin Guide’s three-star scale to rate restaurants.

Which System if Best?

It is natural to think that the best system is the one that provides the most information, so a 100-point scale must be best, but I’m not sure that’s true. Emile Peynaud makes the point that how you go about tasting and evaluating wine is different depending upon your purpose. Critical wine evaluation to uncover the flaws in wine (to advise a winemaker, for example) is different in his book from commercial tasting (as the basis for ordering wine for a restaurant or wine distributor or perhaps buying wine as an investment) which is different consumer tasting to see what you like.

Many will disagree, but it seems to me that the simple three or five stars/glasses/points systems are probably adequate for consumer tasting use while the 20- and 100-point scales are better suited for commercial purposes. I’m not sure that numbers or stars are useful at all for critical wine evaluation – for that you need Broadbent’s detailed qualitative notes. Wine critic publications often try to serve all three of these markets, which may explain why they use the most detailed systems or use a dual system like Decanter.

In any case, however, it seems to me that greater transparency would be useful. First, it is important that the criteria and weights are highlighted and not buried in footnotes. And I don’t see why a 20-point rating couldn’t be disaggregated like this: 15 (3/6/6) for a 20-point system that gives up to 4 points for appearance, 6 for smell and 10 for taste. That would tell me quickly how this wine differs from a 15 (4/3/8). Depending upon how much I value aroma in a wine and what type of wine it is, I might prefer the first “15” wine to the second.

Wine and Figure Skating?

So far I’ve focused on the practical problems associated with having different evaluation scales with different weights for different purposes, but there are even more serious difficulties in wine rating scales. In economics we learn that numerical measures are either cardinal or ordinal. Cardinal measures have constant units of measurment that can be compared and manipulated mathematically with ease. Weight (measured by a scale) and length (measured in feet or meters) are cardinal measures. Every kilogram or kilometer is the same.

Ordinal measures are different – they provide only a rank ordering. If I asked you to rate three wines from your most preferred to your least favorite, for example, that would be an ordinal ranking. You and I might agree about the order (rating wines A over C over B, for example), but we might disagree about how much better A was compared to C. I might think it was a little better, but for you the difference could be profound.

To use a familiar example from sports, they give the Olympic gold medal in the long jump based upon a cardinal measure of performance (length of jump) and they give the gold medal in figure skating based upon ordinal judges’ scores, which are relative not absolute measures of performance (in the U.S. they actually call the judges’ scores “ordinals”). Figure skating ratings are controversial for the same reason wine scores are.

So what kind of judgment do we make when we taste wine — do we evaluate against an absolute standard like in the long jump or a relative one like the figure skating judges? The answer is both, but in different proportions. An expert taster will have an exact idea of what a wine should be and can rate accordingly, but you and I might only be able to rank order different wines, since our abilities to make absolute judgements aren’t well developed.

This is one reason why multi-wine social blind tasting parties almost always produce unexpected winners or favorites. The wines we like better [relative] are not always the ones we like best [absolute] when evaluated on their own.

Ordinal and cardinal are just different, like apples and oranges (or Pinot Gris and Chardonnay). Imagine what the long jump would look like if ordinal “style points” were awarded? Imagine what figure skating would look like if the jumps and throws were rated by cardinal measures distance and hang time? No, it wouldn’t be a pretty sight.

Economists are taught that it is a mistake to treat ordinal rankings as if they are cardinal rankings, but that’s what I think we wine folks do sometimes. I’ve read than Jancis Robinson, who studied Mathematics at Oxford, isn’t entirely comfortable with numeric wine ratings. Perhaps it is because she appreciates this methodological difficulty.

Lessons of the Judgment of Paris

paris2.jpgOr maybe she’s just smart. Smart enough to know that your 18-point wine may be my 14-pointer. It’s clear that people approach wine with different tastes, tasting skills, expectations and even different taste buds, so relative rankings by one person need not be shared by others. This is true of even professional tasters, as the Judgment of Paris made clear.

The Judgment of Paris (the topic of a great book by George M. Taber – see below – and two questionable forthcoming films) was a 1976 blind tasting of French versus American wines organized (in Paris, of course) by Steven Spurrier. It became famous because a panel of French wine experts found to their surprise that American wines were as good as or even better than prestigious wines from French.

A recent article by Dennis Lindley (professor emeritus at University College London – see below) casts doubt on this conclusion, however. Read the article for the full analysis, but for now just click on the image above to see the actual scores of the 11 judges. It doesn’t take much effort to see that these experts disagreed as much as they agreed about the quality of the wines they tasted. The 1971 Mayacamas Cabernet, for example, received scores as low as 3 and 5 on a 20-point scale along with ratings as high as 12, 13 and 14. It was simultaneous undrinkable (according to a famous sommelier) and pretty darn good (according to the owner of a famous wine property). If the experts don’t agree with each other, what is the chance that you will agree with them?

Does this mean that wine critics and their rating systems are useless and should disappear? Not likely. Wine ratings are useful to consumers, who face an enormous range of choices and desperately need information, even if it is practically problematic and theoretically suspect. Wine ratings are useful commercially, too. Winemakers need to find ways to reduce consumer uncertainty and therefore increase sales and wine ratings serve that purpose.

And then, of course, there is the wine critic industry itself, which knows that ratings sell magazines and drive advertising. Wine ratings are here to stay. We just need to understand them better and use them more effectively.

References:

Dennis V. Lindley, “Analysis of a Wine Tasting.” Journal of Wine Economics 1:1 (May 2006) 33-41.

George M. Taber, Judgment of Paris: California vs. France and the History 1976 Paris Tasting that Revolutionized Wine. Scribner, 2005.

Globalization, Wine Value and the Two Buck Chuck Index

Has the globalization of the wine industry given us the best of wines, as many wine drinkers believe, or the worse of wines, as the film Mondovino suggests?

Two economists from the Whitehead School of Diplomacy at Seton Hall University address this question in the December 2007 issue of the Journal of Wine Economics (see full reference below). Their conclusion? Globalization has benefited American wine drinkers, who have a broader choice of quality wines at lower prices.

That’s pretty much what my supermarket empiricism leads me to conclude, but can it be proven scientifically? Here’s how the article’s authors arrived at their results.

First you need to define what it is that American wine drinkers are buying. The authors decided to focus on the Wine Spectator annual Top 100 list of wines. This has the advantage of limiting the study to a reasonable number of widely available wines. The Top 100 list is chosen each year on the basis of price, wine rating, availability and “excitement.” Many people use rankings like the WS 100 to guide their purchases, so I suspect that there really is some correlation between what is on the list and what is on store shelves and restaurant wine menus. The disadvantage of limiting the study to the Top 100 is of course that most of the wine sold in America — the inexpensive Gallo, Yellow Tail and Two Buck Chuck wine — does not make it to this or any other “top” list. If we want to know if globalization has improved choice at the middle and bottom of the market we will need more research.

The authors examined the WS 100 lists from 1988 – 2005 to determine (1) where the wines came from, (2) how much they cost and (3) their quality as measured by the WS ratings. They then calculated measures to determine changes in the geographical concentration of the wines (more or less choice in terms of countries of origin), the average quality rating and the relative value to consumers as measured by rating points per dollar.

What we learn from this is that the overall quality of the top wines has stayed relatively constant over the years, but the real price has fallen and the range of offerings has increased. It cost $4313 (in today’s dollars) to purchase the entire WS100 in 1988, for example, but just $2622 to buy the Top 100 wines in 2005. The cost per “point” of ratings in 1988 was 46 cents, so a hypothetical average 90-point wine cost $41.40. The per point cost was 28 cents in 2005 and so a hypothetical average 90-point wine cost just $25.20.

The top wines came from just six countries in 1998 versus 11 countries in 2005, an indication of the globalization effect. A great majority of WS100 over the years have come from four core wine countries: Australia, France, Italy and the U.S., but the proportion of non-core wines has increased, too, from just 5 percent in 1988 to 24 percent in 2005.

The authors divide the wine world of this study into Old World (France, Italy), New World (Australia and the U.S.) and “New-New World” (New Zealand, South Africa, Argentina and so on). Globalization has brought American wine drinkers more and more excellent New World and now especially New-New World wines that provide the same quality at lower average prices, according to the study.

Research like this is interesting both for the questions that it answers and for the new questions that are raised. It would be interesting, for example, to find how important the four criteria for selection are — price, rating, availability and “zing” — and if the relative weight they are given has changed. As the wine market has expanded, for example, greater emphasis may have been put on price and availability, leading to a Top 100 that leans more toward (global) good value wines.

It would also be interesting to see if the editors respond in any way to external forces. A lot of people read and study the Top 100 list, so perhaps they use it as a way to build the wine market (and thereby indirectly build their potential subscriber base). A focus on value would be consistent with this goal. A Top 100 list that you can’t find or can’t afford doesn’t build the wine market and won’t sell many magazines. The fact that there are more New-New World wines might reflect rising quality and availability of these wines or it could indicated that the WS editors desire to add these wines to keep costs down, value up and the market growing. In other words, the WS100 might show more choice and continuing good value because that’s what the WS editors want it to show. I suspect that the truth is that the market has evolved toward global good value and that WS has been part of that process, encouraging people to try New-New World wines by putting them on the Top 100 list.

Exchange rates could also play a role here. The dollar has fallen against most currencies (increasing the cost of imported wine), but the depreciation is not uniform. The Euro is much more expensive but the Argentine peso has not changed as much. If would be interesting to see to what extent the WS100‘s New-New World globalization has offset exchange rate driven increases in Old World wine costs.

Another interesting question relates to the idea of value in wine purchases. It does seem to me that people often find themselves buying WS points or Parker points more than the wine itself because they are unsure of their ability to judge quality. One local wine merchant had a sale of wines rated 90 points or more for $20 or less. The idea was that the wines must be good value because of the low cents per point ratio. But there is more to wine than rating scores, as anyone who has tasted high-scoring wines will tell you.

It might be interesting to try to put together a slightly more sophisticated wine value index using WS and other ratings. I don’t think that cents per point is a good measure because it assumes a linear relationship between money and quality — and we all know that is not the case. Very expensive wines frequently receive much lower ratings than their cheaper competitors. I understand that a $100 Chardonnay came in last at the tasting where Two Buck Chuck won the Gold Medal.

Even where price and quality are correlated, the relationship isn’t necessarily linear. The average price difference between an 86 point wine and a 88 point wine may be pretty small, for example, but it might cost a great deal to go from 92 to 94 points if the demand for the very best wines is particularly strong as is often the case in winner-take-all markets.

The price-quality relationship, even using imperfect wine scores as a measure of quality, is certainly non-linear. No wonder wine buyers are so confused — and depend so much on ratings and lists like the WS100.

Here is a simple alternative to cents per point as a measure of value. Let’s adjust price and quality for a baseline wine: Two Buck Chuck. You could call it the TBC index. Suppose that you can purchase a 70-point (to just make up a number) TBC Chardonnay for $2 (or $3 here in Washington State). The question we want to answer is how much does it cost to improve on TBC? A wine that gives you a lot of additional value for only a little additional money is a good deal.

In other words, the TBC index would be a relative index of value calculated by asking would be how many points in excess of 70 (or whatever the quality of the baseline wine you choose) you can buy for the dollars you spend in excess of the baseline cost. Here’s a numerical example. A 88 point wine for $20 would have a TBC rating of (88 – 70 points)/($20 – $2) = 18 point/$18 or a dollar a point. An 86 point wine for $10 would be a better value because (86 – 70)/(10-2) = 16 points /$8 = two points per dollar. It seems to me that this is a better (but still badly flawed) indicator of relative value. (Economics students have already realized that I am applying the principle of decision-making on the margin to this problem).

Perhaps I will find some students to work on the TBC index, perhaps using a different base wine for each varietal or wine type. I predict that their research would find that the “optimal” TBC point is being pretty close to the heart of the premium wine market — right on the center shelf in the supermarket — where so many wine brands compete for your wine dollars.

Wine ratings are very important in some parts of the wine market and very controversial, too, so I think I will see what I can learn about them. With this in mind I have subscribed to six different wine-rating publications: Wine Advocate (Robert Parker), Wine Spectator, Wine & Spirits, Wine Enthusiast, the British Decanter and Wine Press Northwest (for Washington/Oregon wine news and ratings). Watch this space for a comparative analysis of these influential publications.

References: Omer Gokcekus and Andrew Fargnoli, “Is Globalization Good for Wine Drinkers in the United States?” Journal of Wine Economics 2:2 (December 2007) pp. 187-195).