Kayak in slalom gate Home Race Schedule Race Results Race Stats Athlete Stats Rules Canoe in slalom gate
How-to Search Schedule Search Results Search Stats Athlete Rankings Board
Archive Photo Gallery Mailing lists Marketplace Links About
2005   2004   2003   2002   2001   2000   1999   1998   1997   1996   1995  

Position Paper on US Slalom Rankings

Rich Kulawiec

Revision 2.3, November, 2000

1. Introduction

Slalom rankings are used for multiple purposes within the paddling community: for assessment of personal progress, for eligibility for certain competitions, as evidence of qualification for training funds/sponsorship, and other reasons. It is in the best interest of the community as a whole, as well as its members, that these rankings be made as accurate as possible -- especially because the more accurate the rankings are, the more fair they are, and there is no higher obligation on any sports organization than promoting fairness.

This paper explains some of the basic concepts of sports ranking systems and then applies those concepts to the current ranking system, with the purpose of identifying problems with the current system. A proposal is made for a ranking system which substantially resembles the current one, but which attempts to remedy those problems. Finally, directions for future study (and hopefully thus for further improvements) are discussed.

2. Sports ranking systems in general - goals, issues, assessment

2.1. Goals

All sports ranking systems have the same goal: to produce an ordered list of competitors based on observed performance. Roughly speaking, they all have the same inputs (scores/results) and the same outputs (a list). Most make some sort of attempt to "level the playing field", whether by factoring out human bias or by adjusting scores for circumstances.

All sports ranking systems can be checked against reality in the same way, by looking at the correlation between expected results (as predicted by the ranking) and observed results (as actually experienced in competition). This correlation is rarely, if ever, perfect; but it does provide a first-order estimate of the accuracy of the ranking system.

2.2. Issues

There are a number of problems faced by any ranking system. Here are a few of the ones that relate to slalom rankings.

2.2.1 Multivariate performance

Sports rankings systems attempt to reduce the performance of a competitor to a single number. This is a drastic oversimplification of reality -- necessary as it may be to produce an ordered list.

Let me try to explain that by example. If you were trying to characterize the performance of a car, you might start with its elapsed time accelerating from 0 to 60 MPH, a popular metric. But then you could add stopping distance from 55 MPH...turning radius... MPG city and highway...ground clearance...drag coefficient... wheelbase...until you finally had a collection of thousands of numbers, each one of which describes some aspect of the vehicle's performance.

Now which one -- or combination of ones -- should be used to "rank" the car against its peers?

The answer to that question is tricky, because in part it depends on what the "rank" is designed to indicate. A rank designed to indicate safety will be quite different than a rank designed to indicate performance: in fact, it might use a completely different set of numbers.

The problem is the same in the sport of slalom. We could come try to come up with metrics that describe the performance of paddlers on big-water courses and tight technical ones; natural and artificial; ones with and without major holes, left-handed or right-handed offset moves, cold or hot weather conditions, and so on. Eventually we'd have any number of measurements, each one of which describes some aspect of the paddler's performance. But which one or ones should we use for comparison?

2.2.2. Absence of standard

Some sports have a fixed competition which varies little, if any, and can be used as a universal metric for assessing performance. For example, the 100-meter dash (which is essentially the same wherever it's contested, perhaps with an adjustment for altitude or wind-assist), or the pole vault. This makes ranking relatively easy, because it means that competitors anywhere in the world can be compared against each other solely on the basis of a single number (elapsed time or height).

Slalom is not like this (with the sole exception of the national pool slalom). Courses vary dramatically, sometimes even on the same river on the same day. (E.g. Amoskeag/Junior Olympic Qualifier 1998, where the water was visibly rising throughout the race.) At any given site, water level, gate placement, weather conditions, and other factors all influence how difficult the course is.

This makes comparisons between races problematic -- even if they're held on the same site on consecutive days. We simply can't assume that such multi-day events are of the same difficulty. Nor can we assume that "major" events are difficult, and that "minor" events aren't, or vice versa. (Yes, in general, this is roughly true; but there are numerous exceptions, enough so that the standard deviation is large and therefore the assumption is invalid for rankings purposes.)

2.2.3. Repeatability

Paddlers are human beings and don't always perform the same way every time they race. So which performance(s) actually reflect their true ability? Their best? Worst? Median? Average? Average with the best and worst removed? Or something else?

2.2.4. Human bias

Polls are notoriously subject to error due to missing information, disregard for available information, personal bias, geographic bias, and a myriad of other problems. Such systems just can't be taken seriously; they're just thinly-disguised popularity contests.

2.2.5. Head-to-head only

Some sports attempt to do rankings solely by head-to-head competition. This is less reliable than perhaps it might appear on the surface. For example, the NCAA basketball tournament provides 63 head-to-head matchups. It's thus sometimes cited as an example of a system which provides a thorough assessment based on head-to-head competition. But of the 2016 possible head-to-head matchups available to a field of 64 teams, it only encompasses 3.1% of them; and half the teams involved play only one game, yielding a single data point each. This is hardly a significant data sample, and certainly not one which would support any but the most general conclusions.

Head-to-head results are not always the best indicator of how two athletes or teams actually compare, because they may be heavily influenced by the particular match-ups which do or don't occur -- and in the case of a sport as geographically dispersed as slalom, most of the possible head-to-head matchups will never happen.

2.2.6. Algorithms

Systems which attempt to use an algorithm to determine which athletes/teams are better than others are widespread. Probably the most famous and successful of these systems are the ones devised by Jeff Sagarin -- an MIT grad who has been refining his algorithms for many years.

The problem isn't the output of the system -- Sagarin's is quantifiably superior to its competitors. The problem is that increasing the accuracy of a ranking systems requires increasing the complexity of the algorithm, which may make it difficult for participants in the sport to understand.

For example, Sagarin's methodology encompasses the concept of a "good loss" -- wherein a team that lost a game may move up in the rankings because it did so by a small margin against a vastly superior opponent. This is completely logical within the rankings system, but confuses many observers who do not grasp the overall methodology.

2.2.7. The A beats B, B beats C, C beats A problem

If this problem didn't exist, ranking systems would be much simpler. However, it's quite common, and so most ranking systems make some sort of attempt to deal with it gracefully. Most do this by going beyond the binary (A:1, B:0) and using margin-of-victory (A:72, B:65) in each competition to assist in the ordering. Others factor in home court/field advantage, or difficulty of that particular competition, or use other means to attempt to correctly order A, B, and C in such a circumstance.

This is a substantial problem for ranking systems where only a relatively small number of competitors are involved. In the case of slalom, where the number is now over 1000, it's a major problem.

2.3. Assessment

The best way to assess the relative accuracy of a ranking system is to check it against the original data -- the results. An accurate ranking system should show a high correlation to results.

That's not the end of the matter, though. One of the many questions that can be raised is "Which results?". For example, the Sagarin rankings of NCAA Division 1 basketball teams are designed so as to weight recent results more heavily than older ones; this is done in order to provide a ranking which represents the current strength of each team, not the team's strength throughout the season. This is neither good or bad: it's just a design choice in the algorithm which needs to be taken into account when assessing the algorithm's accuracy.

Another question is "How should that correlation be measured?" To continue the example above, critics of Sagarin's rankings often point out that they are not especially effective at predicting the results of individual games. They miss the point that the rankings are not designed to be effective at predicting single outcomes, and therefore that critiques based on their failure to do so are unfounded.

The real answer to assessment lies in the design goal that the ranking system is designed to address: it should be measured solely on the basis of how well it meets that goal.

3. Explanation of the current NWSC ranking system

The algorithm used to compute these rankings is the same as that used in 1996, 1997, and 1998. Here's a step-by-step explanation of it; this explanation is close to what's already on the NWSC web site, but this one has been edited to make it a bit more complete, a bit more up-to-date, and hopefully a bit more understandable.

3.1. Results from races come in a variety of formats; in order to prepare them for subsequent steps, each race's data is rewritten into the same format, which looks like this:

Canonical form for race results
ClassName(s)Time-1Penalty-1Total-1Time-2Penalty-2Total-2Better-Score Total-Score

Table 3.1

The fields mean exactly what they mean on a standard slalom scorestrip.

3.2. The classes for each race are translated from the many names that show up in results to a list of canonical racing classes. In other words, the "race classes" are turned into "ranking classes". Here's an example of part (the full table has 526 entries) of the translation table used to do this:

Class name translation table (excerpt)
Race ClassRanking Class
C-1WC-1W
C-1W JrC-1W
C-1W JuniorC-1W
C-1W expertC-1W
C-1C-1
C-1 (A/B)C-1
C-1 A/BC-1
C-1 C/DC-1
C-1 CadetC-1
C-1 ExpertC-1
C-1 JrC-1

Table 3.2

3.3. Results for classes which don't currently get ranked -- for example, open boats, squirt boats, sit-on-tops, etc. -- are dropped.

3.4. Scores with DNS, DNR, or DNF are interpreted numerically, with 999.99 used for every one of those. Mostly, this is just used to gather per-race statistics, because races where someone DNF'd don't count toward their ranking.

3.5. If a race was scored with 2 seconds for touches/50 seconds for misses, nothing happens in this pass. But if it was scored with 5 seconds/50 seconds, the penalties are recalculated to the 2/50 system. This is done via a look-up table and a small algorithm. The impact of mixing results like this is negligible: in 1998 and 1997 I computed rankings both ways and the differences in final rankings were insignificant.

3.6. Every paddler's name is converted to canonical form, which I've hopefully spelled correctly. Here's an example:

Athlete name translation table (excerpt)
Name in resultsName in Rankings
Carleton GooldGoold, Carleton
Carleton GouldGoold, Carleton
Carlton GouldGoold, Carleton
Goold, CGoold, Carleton
Goold, CarltonGoold, Carleton
Gould, CarletonGoold, Carleton

Table 3.3

The table used to this has around 5000 entries, and covers every paddler who has competed in a ranked US race for the last several years.

3.7. Every race result is converted into this form:

Class=Name Score Ratio

where "Score" is their combined-run total score for the race, and "Ratio" is the ratio of their score to the best-score-of-the-day. For example, here are part of the results from the Riversport Slalom in 1997:

Combined-run scores and ratio to fastest
ClassName(s)Combined ScoreRatio to Best
K-1WThomas, Natalie471.651.904
K-1WPotochny, Evy462.751.868
K-1WGelblat, Renee531.462.145
K-1WHearn, Cathy270.491.092
K-1WWeld, Kara272.071.098
K-1WBeakes, Nancy317.221.280

Table 3.4

In this table, you can see that Cathy Hearn's Ratio is 1.092; the way that was derived was by taking the best combined score of the day (Jason Beakes, 247.76) and dividing Cathy's score (270.49) by it, e.g. 270.49/247.76 = 1.092. In English, this means "Cathy Hearn's score was about 109% of Jason Beake's.")

Boats which did not complete two runs are dropped at this point. (See previous comment about how DNFs don't count toward rankings.)

3.8. The ratio from the previous pass is inverted to give a competitor's race ratio: this number reflects how far off they were from the best-score-of-the-day. (The boat with the best score of the day has a race ratio of 1.000.) Two lookups happen: last year's rank class (A, B, C, D or U for unranked), and membership on the national A team. (The reason for this is that the strength-of-field assignment, which we'll get to later, is based on this.)

Competitor race ratio and rank
ClassName(s)Inverse Ratio to BestCurrent Rank/Team
K-1WThomas, Natalie0.525C
K-1WPotochny, Evy0.535U
K-1WGelblat, Renee0.466C
K-1WHearn, Cathy0.916ATEAM
K-1WWeld, Kara0.911ATEAM
K-1WBeakes, Nancy0.781B

Table 3.5

Again, using Cathy Hearn as an example, 1/1.092 (from previous table) = 0.916. Or, in English, "Cathy raced about 91% as fast as Jason."

In the case of boats which competed in the same race class more than once (e.g. K-1 Masters and K-1) only the better of those two results is used. This is done in order to comply as best as possible with our rules concerning competition in two age classes and to try to level the playing field. (Because, for example, someone who is 41 can take four runs, while someone who is 39 can only take two. It seems that the person taking four already has an advantage, so we shouldn't give them an additional advantage by counting this as two races instead of one.)

3.9. Each race result is weighted by the race weight; the race weight is given by

Equation 3.1

where field strength and importance factor both have maximum values of 10; thus the race weight has a maximum value of 1.000. The table of assigned field strength and importance factors, along with the criteria used to make these assignments, is here. Continuing the example above, and using Riversport's 1997 field strength of 9 and importance factor of 5 (thus giving a race weight of 0.700 by the equation just above):

Weighted race ratios
ClassName(s)Weighted Ratio
K-1WThomas, Natalie0.367
K-1WPotochny, Evy0.374
K-1WGelblat, Renee0.326
K-1WHearn, Cathy0.641
K-1WWeld, Kara0.638
K-1WBeakes, Nancy0.547

Table 3.6

To continue the example, Cathy's unweighted ratio is 0.916; the race weight is 0.700, so the weighted ratio is 0.916 * 0.700 = .641.

This number is a competitor's race weight: think of it as "how much credit you get for doing this well at this race against this competition".

3.10. All results from all races are combined. If a paddler has done more than three races, their best (highest) three race weights are selected. These three best results are then averaged to give the competitor's Rank Ratio. That Rank Ratio is then adjusted if they've done less than three races: if a paddler has done only two races, they're assessed a 5% penalty; if only one race, a 10% penalty. Using the same example as before:

1997 K-1W rank ratios & best races (excerpt)
ClassName(s)Rank RatioBest Races
K-1WThomas, Natalie0.350Bellefonte,Riversport
K-1WPotochny, Evy0.366Lehigh,Riversport,Bellefonte
K-1WGelblat, Renee0.369Lehigh,Codorus,Farmington
K-1WHearn, Cathy0.841Trials-3,Trials-2,Nationals
K-1WWeld, Kara0.824Trials-3,Trials-2,Nationals
K-1WBeakes, Nancy0.654Trials-2,Trials-1,NOC-DBH-2

Table 3.7

3.11. The results are sorted by rank ratio and separated by racing class. In other words, all K-1's are listed in order from highest rank ratio to lowest; all K-1W's are listed in the same order, and so on for the other classes.

3.12. Within each class, a boat's percentile is computed. For example, in 1997, Cathy Hearn was the top-ranked K-1W; she is thus assigned the 100.00 percentile. All other K-1W's are then assigned a percentile based on the ratio of their Rank Ratio to Cathy's. Continuing the example from above:

1997 K-1W rank ratios & percentiles (excerpt)
ClassName(s)Rank RatioPercentile
K-1WThomas, Natalie0.35041.6
K-1WPotochny, Evy0.36643.5
K-1WGelblat, Renee0.36943.9
K-1WHearn, Cathy0.841100.0
K-1WWeld, Kara0.82498.0
K-1WBeakes, Nancy0.65477.8

Table 3.8

Taking the last boat as an example, Nancy Beakes' Rank Ratio was 0.654; Cathy Hearn's was 0.841. Thus Nancy Beakes's percentile is 0.654/0.841 = 77.8.

3.13. A number of lookup tables are consulted. The first is a table which is used to decide which letter class (A, B, C, D) the boat is in. Here's that table:

Class assignments by percentile
Assigned classPercentile
"A" Ranked85% to 100%
"B" Ranked65% to 84%
"C" Ranked40% to 64%
"D" Rankedbelow 40%

Table 3.9

In other words, a boat whose percentile is 77.8 is assigned to the "B" class.

Note that the current cutoff for automatic admission to Team Trials is at the 75th percentile.

3.14. Each boat is assigned an ordinal number: the highest-ranked boat in each class is "1", the second-highest is "2", and so on. Ties are handled by assigning both boats the same number and skipping the subsequent one.

3.15. If the paddler is a citizen of a country other than the US, a notation to that effect is added to their name. The lookup table that I use for this is slowly becoming more accurate, but I wouldn't be surprised to find that I've missed someone.

However, the presence of non-US paddlers also has no impact on rankings, since the breakpoints for class assignments as well as the cutoff for automatic admission to Team Trials are assigned on a percentile basis, not on the number of boats. All it really does is provide our guests with an inkling of how they rank among people who have competed here in the last year.

3.16. Where data is available, boats are marked by age group, e.g. "Jr", "Ms", etc. I've not done this with 1999 rankings because of the unreliability of the current data, but numerous athletes have remarked that they find it useful to be able to compare themselves within their age group.

Here's what the final result looked like in 1997:

1997 NWSC K-1W rankings (excerpts)
RankName(s)Rank RatioPercentileBest Races
A1Hearn, Cathy (Sr)0.841100.0Trials-3,Trials-2,Nationals
A2Bennett, Rebecca0.83298.9Trials-3,Trials-2,Nationals
A3Weld, Kara0.82498.0Trials-3,Trials-2,Nationals
A4Altman, Renata (Sr)0.79294.2Trials-3,Nationals,Trials-2
A5Stalheim, Megan (Jr)0.78192.9Trials-2,Trials-3,Nationals
A6Freeburn, Jana0.77492.0Trials-2,Trials-1
A7Hearn, Jennifer0.73587.4Trials-3,Trials-2,Trials-1
A8Larsen, Hannah (Jr)0.73287.0Trials-3,Trials-2,Jr-Trials-1
A9Brown, Amy0.72986.7Trials-3,Trials-2,SnydersMill
A10Jorgensen, Anna (Jr)0.72986.7Trials-3,Trials-2,Jr-Trials-2
A11Mitchell, Anne0.71885.4Trials-3,Trials-1,Trials-2
B12Miller, Aleta (Jr)0.69182.2Trials-3,Trials-1,Nationals
B13Beakes, Nancy0.65477.8Trials-2,Trials-1,NOC-DBH-2
...
C54Gelblat, Renee (Ms)0.36943.9Lehigh,Codorus,Farmington
C55Green, Polly0.36743.6Jefferson,Animas,W-Trials-Qual
C56Potochny, Evy0.36643.5Lehigh,Riversport,Bellefonte
C57Weldon, Amanda (Jr)0.36343.2Lehigh,Bellefonte,FiddlersElbow
C58Wiley, Janet0.36042.8Animas
C59Hoffheimer, Mary0.35642.3Farmington,Riverfest,Blackwater
C60Wiley, Amy0.35542.2Animas
C61Baldwin, Hailey (Jr)0.35442.1NW-Jr-Oly-Qual,Jr-Trials-2,Jr-Trials-1
C62Thomas, Natalie0.35041.6Bellefonte,Riversport
...

Table 3.10

To summarize what this table says: Evy Potochny was C-ranked; she was the #56 K-1W in the rankings. Her best three races were Lehigh, Riversport and Bellefonte; her Rank Ratio was 0.366 (which compares her to the fastest boat overall) and her percentile was 43.5 (which assesses her performance within the K-1W class).

3.17. In the case of rec boats (plastic, cruiser, etc.), all of the above is repeated *except* that better-of-two instead of combined runs are used. In order to provide an adequate statistical basis for comparison, the rec boats are lumped together with the race boats to crunch through the numbers, then the race boats are dropped out. This ensures that at races where the overwhelming majority of boats are glass (e.g. Mid-America #2) that there are enough boats to compare against. (And since the race boats are dropped out of these calculations *before* the percentiles are calculated, people who race rec boats aren't penalized for racing primarily against glass boats.)

Also, because rec boats haven't been previously ranked, I found it necessary to assign guesstimates to a handful (4) of rec boats in order to provide a starting point for computations. I minimized the number of such estimates (because I loathe making up numbers, even when I can do so with a high degree of confidence). It's also worth noting that if my estimates are wrong, the errors thus introduced will diminish with each iteration of rankings. To put it another way: each time the rankings are run, the effect of my initial estimates decreases, so after a few times through, even gross errors will disappear...and hopefully I didn't make any of those. Let me demonstrate:

Here are the four estimates I made for rec boats:

1999 estimated rank ratios for K-1 Rec
ClassNameEstimated Rank Ratio
K-1 RecBeakes, Jason.950
K-1 RecPoindexter, Mark.685
K-1 RecMaxwell, Tyler.550
K-1 RecCollins, Dave.550

Table 3.11

These were arrived at by comparing performance in glass vs. performance in plastic and were done only to make it possible to compute initial rankings for rec boats. I think my estimates were reasonably close, given that the final rankings for these boats were:

1999 calculated rank ratios for K-1 Rec (excerpts)
ClassNameActual Rank Ratio
K-1 RecBeakes, Jason1.000
K-1 RecPoindexter, Mark.730
K-1 RecMaxwell, Tyler.606
K-1 RecCollins, Dave.643

Table 3.12

3.18. That's it. Please note that although all the calculations were done to several decimal places, that does not mean that rankings are accurate to that degree. For example, the difference between a rank ratio of .453 and .456 falls WELL within the variability of manual timing systems. And boats which fall, say, at percentile 91 and 88, are essentially indistinguishable.

SUMMARY: The overall formula that's used in this system is:

Equation 3.2

where the field strength and importance are assigned from the table below.

Race Weight Assignment Table
Factor PointsField Strength (fastest times)Importance of Race
104 National "A" Team athletesOlympic/National Team Trials
U.S. Nationals
93 National "A" Team athletesCIWS Finals
Junior Trials
Jr/Sr/Ms Nationals
82 National "A" Team athletesCIWS Qualifiers
Junior Olympics
71 National "A" Team athleteTeam Trials Qualifier/USOF Qualifiers
Mid-America Series
6"A" ranked athleteDivisional Championships
Major Cup Series, Major Double Headers
Junior Olympic Qualifiers
5"B" ranked athleteOther Local/Regional Races
C-D Race Series
4
3
"C" ranked athleteCitizens Races
2
1
"D" ranked athleteFlatwater/Pool/Jiffy Slaloms

Table 3.13

(Any race falling into multiple categories in the table above is assigned the factor from the highest category.)

4. Good things about current system

4.1. Use of fastest boat as metric

This is actually a pretty good idea. As explained above, one of the problems with assessing slalom performance is the absence of a standard across races (since races are of different length, difficulty, etc.). The closest that we can come is to use the fastest boat, which, as it turns out, is usually one of the most consistent boats as well (which means it makes a good "measuring stick" for the rest of the field).

Here's a little bit of data to back that up. This is the result of analyzing the data from the 1999 season. For each boat, I calculated the average of their two runs, then used that to calculate the percent difference (for each run) against that average. For example, the data for my K-1 runs at the Codorus SL is 122.48 (1st) and 128.26 (2nd); the average is 125.37; so the percent difference is 2.3%. I then calculated the average across boats depending on how close to the best time-of-the-day they were. The data shown below came from 65 races -- the ones that I had full scores for both runs.

Performance consistency vs. percent-of-fastest-boat
Race Ratio#runs usedAverage % difference
1.000 (fastest)1302.3%
1.000 to 1.111 (90%-100% of fastest)7982.6%
1.111 to 1.250 (80%-90% of fastest)14964.7%
1.250 to 1.428 (70%-80% of fastest)16165.9%
1.428 to 1.666 (60%-70% of fastest)10047.7%
1.666 to 2.000 (50%-60% of fastest)5169.2%

Table 4.1

What this shows is that faster boats are more consistent boats -- and thus make a better "measuring stick" than slower boats. (Repeating this analysis using different percentage cutoffs yields different numbers, obviously, but the trend remains the same.)

4.2. Averaging of data from multiple races

As seen above in step 10 of the current rankings system, the numbers from each boat's best three races are averaged. This tends to mitigate the effects of an exceptional performance at one race, which might otherwise exert an undue influence on a boat's rank. There are many open questions here, though:

Unfortunately, there are no easy answers to any of these, but one of the things I'm working on are experiments to see what effect, if any, different answers to those questions have on the rankings thus generated.

It's probably worth noting at this point that using the best N races, whatever they might be, helps alleviate the effect of old results on rapidly developing paddlers. By that what I mean is that when a paddler's results improve greatly during the course of a year, the most recent ones will probably be used to determine their ranking. This is a good property of the ranking system -- it helps in providing some assurance that the rankings reflect current ability. The rare exception to this is a paddler whose ability significantly declines within a year -- and that's usually due to injury.

4.3. Use of as many races as possible

The more data that's used, the more paddlers that are included. That in itself is desirable, because it reflects an inclusive rather than an exclusive viewpoint.

But beyond that, the use of more data means that the number of paddlers affected by the 5%/10% penalty drops, and it also means that there are additional races which can be used as part of each paddler's best three.

5. Problems with the current system

5.1. Problems with importance factor

Let's start by listing the races whose importance factors (half the overall weight) are >= 8. A look at the rankings for any of the past several years will show that these races have a very heavy influence on the rankings of paddlers in the A & B classes. Here are the top-weighted races over the last four years under the current system:

Races with importance factor >=8, 1996-2000
Event#Races/YearWeightRestrictions'96-'00 Locations
Team Trials3 [*]10ICF classes only
qualifiers
TN, WI, WI, TN, TN
Nationals110noneTN, WI, IN, WI, CA
Jr/Sr/Ms Nationals19age < 18 or age > 30IN, ID, IN, IN, IN
Jr Team Trials29ICF classes only
age <= 18
NH, IN, NH, IN, WI
Jr Olympics18age <= 18
qualifiers
WI, NC, VA, CO, TX
[*] 1996 Team Trials had only 2 races.

Table 5.1

5.1.1. Gender bias in importance factor

Let's look at the current ranking system's effect on boats by gender by comparing races that are open only to the 4 ICF classes with races that are open to all 7 US classes.

Races with importance factor >= 8, 1996-2000
  ICF classes only All classes
WeightEvent19961997199819992000Total 19961997199819992000Total
10Sr Team Trials2333314000000
10Nationals111115111115
9Jr Team Trials2222210000000
9Jr/Sr/Ms Natls111115111115
8Jr Olympics111115111115
Total Races78888393333315

Table 5.2

This means that, roughly speaking, over the past four years the boats in the classes that the ICF recognizes have had about 250% (39 to 15) as many opportunities to significantly improve their ranking (by participating in races races with a high importance factor) as their counterparts in the non-ICF classes.

5.1.2. Geographic bias in importance factor

Here's a table showing the breakdown for each year, for each race, by where it was held. Every entry looks like (number), (state), e.g. "2,TN" means "2 races, Tennessee". The last two lines provide a summary by which half of the US (east/west of the Mississippi) was involved.

Races with importance factor >= 8, 1996-2000,
distribution by state
WeightName19961997199819992000Total
10Sr Team Trials2,TN3,WI3,WI3,TN3,TN14
10Nationals1,TN1,WI1,IN1,WI1,CA5
9Jr Team Trials2,NH2,IN2,NH2,IN2,WI10
9Jr/Sr/Ms Natls1,IN1,ID1,IN1,IN1,IN5
8Jr Olympics1,WI1,NC1,VA1,CO1,TX5
Total Races7888839
Total Races
east of Mississippi River
7787635
Total Races
west of Mississippi River
010124

Table 5.3

Conclusion: Paddlers living west of the Mississippi will have to make many long trips east to have any chance of improving their ranking. Those who live west of the Rockies have it even worse; and those who live in New England are also at a disadvantage. In fact, it could be argued that paddlers living somewhere in the triangle between western PA, southern WI, and central KY are ideally situated, since they were within a day's driving time of 29 of these races.

It's sometimes asserted that the large imbalance to the east is because most of the "good" slalom paddlers are in the eastern US. It's true that most of the highly-ranked slalom paddlers are in the eastern US, but that's in large part because that's where the races that are held which enable them to be highly-ranked -- it's not necessarily because they're "good", per se. It's a self-perpetuating system.

To put it another way: western paddlers are not highly ranked as a group, NOT necessarily because they're bad paddlers; but in large part because they have few opportunities to race at events where they can raise their ranking.

5.1.3. Age bias in important factor

Let's look at the effect of paddler age on their opportunities to improve their ranking; in particular, let's examine the opportunities to compete at race with high importance factors depending on age.

Events open to 17-year-old and 19-year-old C-1's with/without qualifying
WeightEventRaces17-year-old
w/o qualifying
19-year-old
w/o qualifying
17-year-old
with qualifying
19-year-old
with qualifying
10Nationals11111
10Sr Team Trials3--33
9Jr Team Trials22-2-
9Jr/Sr/Ms Natls11-1-
8Jr Olympics1--1-
-Totals84184

Table 5.4

What's this mean? It means that a C-1 paddler who is 17 has somewhere between two and eight times the opportunities of his 19-year-old competitor to boost his ranking -- depending on how good each is. In the case of a strong intermediate 19-year-old C-1 who is good enough to compete at Nationals but not good enough to make Team Trials, this means that this boat has one chance a year to compete in a heavily-weighted race... and if he can't make it to that race because of distance, scheduling or other factors, then he will remain ranked as he is, will not be prequalified to the following year's Sr Team Trials, and the cycle will repeat.

5.1.4. Obsolescence/vagueness of importance factor

The importance factor table still lists the CIWS qualifiers and finals, even though these races have not existed for several years; similarly, the USOF and USOF qualifiers, which haven't been held since 1995, are also listed.

The importance factor table also does not make it clear which races/series should be considered "major", which "regional", and which "local"; nor does it make it clear how to differentiate these from a "C-D race series". For example, is the Esopus June race a "major double-header" -- because it's held on class III water and is the only doubleheader in the entire northeast; or is it a local/regional race? The statement that the highest of the applicable classifications should be used is of some help here, but does not clear up all the uncertainty.

5.1.5. Biases in importance factor acting in combination

The importance factor biases mentioned above often combine to make it difficult even for talented paddlers to be ranked as they should be. Consider the combined effect of the geographic and age biases on a hypothetical C-1 paddler who is 25, lives in the Pacific Northwest, and is currently ranked around #40-45 -- which would make him a strong intermediate/advanced C-ranked boat. (1999 C-1 rankings #40-45 are Renner, Harris, Baldwin, Denz, McEwan, Criscione).

Consider that during the last five years, this hypothetical paddler has had only three races with importance >= 8 within 1000 miles of his home -- and because of his age, he could only compete in one. What chance has this paddler had to improve his ranking? He may really *be* a C-1 which should be ranked in the 40's; or he may be a C-1 that's a top-20 boat but which simply isn't ranked there due to the rankings system.

There are other combinations of these biases which also work against the fair assessment of paddler ability: try working through that example again with a 17-year-old C-1W, for example. And if you do the math, you'll find that this boat has almost no chance whatsoever to achieve a rank ratio comparable to B-ranked C-1's regardless of how good it is -- because this boat can't attend Sr or Jr Team Trials and thus is excluded from 5 of the most heavily-weighted races in the country.

No ranking system, no matter how well-designed, can address all of these issues. For example, the awarding of bids to major races is beyond its scope. But it can certainly attempt to minimize the effects caused by these external decisions.

5.2. Problems with Field Strength

5.2.1. Field strength problem with inclusion of boats other than fastest

This is the most obvious problem, and affects the first three entries (4 team boats=10; 3 team boats = 9; 2 team boats = 8) in the table. Since the formula which is used to compare each competitor's performance normalizes it to the best score of the day, the only relevant field strength is the strength of that boat. The presence or absence of other boats does not affect the performance of either -- at least not in an objectively, quantifiably measurable way.

To put it another way, if a competitor finishes at 115% of Scott Shipley's time, we do not need to and should not consider whether Jason Beakes or Cathy Hearn or Steve Conklin or anyone else showed up at the race -- because they have no effect on either performance. (We could speculate that Scott will paddle faster if he has to race against Jason than, say, against a field comprised entirely of B/C-ranked paddlers. But that's just speculation, and there does not exist a way to confirm or deny this based on measurable data.)

5.2.2. Field strength problem with variance within team

The first four entries in the table of field strength are determined by how many national team athletes are at a race. (I actually use "boats", not "athletes", because it avoids counting C-2's twice.) Let's look at the fourth entry for a moment -- without loss of generality. That entry says that if a single national team athlete competes at a race, the field strength is 7. But not all national team athletes (boats) are equally fast: consider this table of 1999 national team boats, ordered by rank ratio as determined in 1999 rankings:

1999 US National Team Rank Ratios (from 1998 rankings)
ClassName(s)Rank Ratio
K-1Shipley, Scott1.000
K-1Beakes, Jason0.976
K-1Parsons, Scott0.972
K-1Giddens, Eric0.964
C-1Hearn, David0.899
C-1Jacobi, Joe0.866
C-1Michelson, Kevin0.856
C-2Haller, Lecky/Taylor, Matt0.852
C-1Conklin, Steve0.833
K-1WHearn, Cathy0.824
C-2Hepp, David/McCleskey, Scott0.816
K-1WBennett, Rebecca0.809
K-1WLeith, Sarah0.788
C-2Ennis, Chris/Grumbine, John0.785
K-1WSeaver, Mary Marshall0.780
C-2Long, Chad/Long, Kenneth0.776

Table 5.5

The problem is that no matter which of these boats competes at a race, the field strength for the race will be set to 7. Yet there's a clear variation in measured performance between them -- in fact, it's almost 25% easier (if I may use the word "easier" when talking about racing against national team members) to race against the C-2 team of Long/Long than it is to race Shipley in K-1. A paddler who posts a score 125% off Long/Long would be expected to have a score about 156% of Shipley: yet with the current field strength assignment, they would find their results weighted the same way in either case.

This analysis can be repeated for the first three lines in the table to generalize the problem...and that is, that assigning a field strength based on national team membership presumes that all national team boats are equal -- which they're not.

5.2.3. Field strength problem with variance within class

This is a problem similar to the preceding one, except that the issue isn't boats on/not on the national team, it's boats within a class. Consider line 6 of the table, which assigns a field strength of 5 to any race where the fastest boat is B-ranked. Here's a table which shows the highest and lowest B-ranked boats from 1999 in each of the four ICF classes, ordered by rank ratio:

1999 Rank Ratios of highest/lowest B-ranked boats
ClassClass/OrdinalName(s)Rank Ratio
K-1B21Dressen, Richard0.850
C-1B11Haller, Lecky0.759
K-1WB10Beakes, Nancy0.699
C-2B5Babcock, Frank/Larimer, Jeff0.678
K-1B56Gagne, Patrick0.652
C-1B32Larimer, Jeff0.600
C-2B12Peterman, Will/Winger, Ethan0.556
K-1WB28Warner, Heather0.542

Table 5.6

There's roughly a 36% difference between fastest and slowest; yet if either attends a race, the field strength will be 5.

5.2.4. Field strength problem with team/non-team boats

In 1997, Jana Freeburn finished 1st in K-1W on the first day of Sr Team Trials, and 2nd in K-1W on the second day. I believe that by the qualification rules in place at the time, this would have made her the #1 K-1W on the US Team; however, she declined the spot.

This skews the strength-of-field for any 1997 race at which her presence set the field strength (because it would be set to 6, for an A-ranked athlete, not 7, for a national team member). And it skews the strength-of-field for any race at which she was one of several boats determing field strength, for the same reason.

Granted, this is a relatively rare set of circumstances, but it's one that should not have an effect on rankings.

5.2.5. Field strength problem with team boats and yearly competitive calendar

The international calendar varies from year to year, but during significant parts of each season, the national team is out of the US. Any races held during that time period will have a maximum field strength of 6, simply because few (if any) national team members are available to compete in them.

5.2.6. Field strength problem with team boats and geography

Races which are held near places where national team members live occasionally have them as competitors; races in other areas don't. For example, the Penn Cup Riversport race in 1997 had three national team members racing in the expert class, mostly because of its proximity to Washington, DC. However, it's quite rare for a race west of the Rockies to have national team members on hand.

5.3 Problems with importance factor and field strength acting in combination

The problems with both of these combined to cause some races to be overweighted (Nationals, Jr Olympics) and others to be underweighted (Ocoee DBH, Rattlesnake, Animas). The result is a rankings system which enables paddlers to move up in rankings by doing relatively poorly at "important" races that are also attended by national team members -- and keeps paddlers who, by virtue of geography, gender, age or other constraints, perform well at "unimportant" races against strong competition, from doing the same.

This leads to a large number of anomalies in the rankings: paddlers who are ranked ahead of others that they have never come close to beating head-to-head, but who didn't happen to attend a sufficiently highly-weighted race. These anomalies in turn cause the ranking numbers to correlate less well with actual results than if they were not present. They also cause paddler confidence in the accuracy of the rankings to decrease, because in some cases, they're particularly egregious.

6. A recommended approach

In order to address as many of the problems listed above as possible, while simultaneously making evolutionary rather than revolutionary changes, what I propose to do is change the way races are weighted. Specifically, I propose to replace field strength with "rank ratio of athlete with best score of race" and importance factor with "difficulty factor". These changes would revise the current formula to:

Equation 6.1

(Note that the change in the denominator from 20 to 2 is just to keep things normalized; rank ratio and difficulty factor both go from 0 to 1, so this means that the highest possible weight for a race is 1.0, just as it is with the current formula.)

6.1. Why this approach?

Using the rank ratio of the boat with the best score of the race immediately addresses the problems encountered with field strength: it does so by providing a much more accurate metric than simply noting that the fastest boat was "B-ranked". In other words, rather than just setting this to "5" as in the current system, this number would vary between .850 (if Richard Dressen is the fastest boat) and .542 (Heather Warner). IF we presume that these reflect actual performances differences -- and part of the goal is to make them do so -- then these provide a fine-grained way of assessing relative performance against a known standard.

Another way of saying this is that while the fastest boat is not an ideal measuring stick for the rest of the field, it is the best one (and perhaps the only one) available.

The difficulty factor is more problematic. A previous approach to this problem tried to use gradient, length, and other physical properties of the course to assess difficulty. The problem with that approach is that it doesn't take into account water flow, how the course is set, weather conditions or any of the other myriad of variables that determine how hard a course actually is to paddle fast and clean.

But if these properties -- which have the advantage of being directly measurable, and therefore immune to human bias -- aren't sufficient to determine difficulty, what is? How can we devise a system which reflects what actually happens on courses, e.g. "The course for the second day of the Esopus Doubleheader was substantially easier than the first day."?

6.2. How race results indicate difficulty

The answer, I think, is to use the paddlers themselves to measure the course. In particular, to use the paddlers who perform well -- and therefore, as shown above -- consistently well.

To explain: consider a course about which nothing is known. Send four boats down it -- A, B, C, and D-ranked K-1's. If the results look like this:

Example hard race
ClassScore
A137.1
B188.4
C530.2
DDNF

Table 6.1

then clearly it's a course of considerable difficulty: enough to make the B-ranked and C-ranked paddlers have slow runs and/or miss a number of gates, and to put a D-ranked paddler in the water.

But if the results look like this:

Example moderate race
ClassScore
A137.1
B156.2
C194.9
D311.5

Table 6.2

then clearly it's much easier course: the C-ranked paddler is not all that far off the A-ranked one, percentage-wise, and so on.

6.3. Quantifying the difficulty factor

This method can be generalized to use with almost any field of boats on almost every race course: think of each boat's run as a "probe" which reveals something about the course. Lots of A-ranked boats missing gates? Must be hard. Lots of C-ranked boats turning in clean runs? Must be fairly easy.

The problem is not this concept -- everyone who races is accustomed to looking at race scores and guesstimating the difficulty based on who appears to have had trouble and who didn't. The problem is trying to turn this intuitive concept into a reliable mathematical metric that can be used to weight races.

The current answer -- and I say current because I'm continuing to research this and look for better answers -- is to use this average:

with the constraint that the scores of the boats used in this computation have to be within 150% of the score of the fastest boat of the race.

This is a lot easier to understand with an example. So let's take the above two hypothetical four-boat races, and use them to illustrate how this works.

Difficulty calculations-hard race
ClassScoreRace RatioRank RatioRatio Product
A137.11.000.887.877
B188.41.374.713.979
C530.23.867.4221.631
DDNF-.274-
Average (race ratio <= 150%, 2 boats).928

Table 6.3

Difficulty calculations-moderate race
ClassScoreRace RatioRank RatioRatio Product
A137.11.000.887.887
B156.21.139.713.812
C194.91.421.422.599
D311.52.272.274.622
Average(race ratio <= 150%, 3 boats).766

Table 6.4

Granted, this is just a hypothetical example to illustrate how the calculations are done, but it turns out that this actually works quite well for most real races. (Where it doesn't work well is when there are very few boats in the race, or very few boats within 150% of the top boat. This tends to happen at races that are geographically remote or at which a single national team member participates as part of a relatively small and inexperienced field. See below for adjustments.)

The reason for the constraint that boats be within 150% of the fastest boat in order to be used to calculate this difficulty metric is that this uses the most consistent boats available at this particular race to assess the difficulty. It's a compromise number: make it too small and too few boats are included to make the average useful; make it too large and too many inconsistently-performing boats are included, clouding the picture of how hard the course really was. ("Does it appear to be difficult because it really was, or did several C-ranked boats just have a really bad day on a III+ river?")

There's one other piece to this: in order to avoid having a race with few participants end up being heavily weighted in terms of difficulty, the difficulty factor is adjusted 5% downward if fewer than 20 boats are used in the average, and 10% downward if fewer than 10 boats are used. This doesn't mean that the race is "easier"; it means that the data sample size is too small to make an accurate estimate, so we choose to err on the conservative side. Note that the more races (and thus paddlers) that are included each time rankings are run, the less of a problem this will be.

6.4. The proposed weighting system in practice

Here's what the difficulty factor, rank-ratio-of-fastest-boat, and resulting race weight looked like in 1999:

Race weights for 1999
RaceDifficulty MetricRank Ratio of FastestRace Weight
Trials-11.0001.0001.000
Trials-30.9861.0000.993
Ocoee-DBH-10.9691.0000.984
Ocoee-DBH-20.9621.0000.981
Trials-20.9560.9570.956
Nationals0.9031.0000.951
Rattlesnake0.9310.9600.945
Tariffville0.8920.9600.926
NOC0.8940.9570.925
Dickerson10.8370.9600.898
Mid-Am-10.8210.9600.890
Dickerson20.8210.9500.885
Jr-Trials-10.8080.9170.862
Aspen-DBH-10.7870.9180.852
SnydersMill0.8130.8880.850
Mulberry0.7280.9590.843
Jr-Trials-20.7660.9170.841
FIBArk0.7620.9180.840
Aspen-DBH-20.7650.9150.840
Southeasterns-10.7670.9050.836
Jr-Sr-Ms-Natl0.7420.9170.829
Yampa0.7120.9150.813
Jr-Olympics0.6760.9170.796
PEPCO-10.6460.9270.786
PEPCO-20.6280.9270.777
Animas0.8090.7090.759
Riverfest0.5560.9170.736
Mid-Am-20.7400.7210.730
BCE-JrOlyQual0.5610.8920.726
Missouri-10.7010.7210.711
Amoskeag0.5570.8630.710
Texas-Spring-20.6630.7540.708
Texas-Fall-20.6330.7540.693
WACKO-10.630 [2]0.7210.675
WACKO-20.630 [2]0.7210.675
Mascoma0.7080.6190.663
BigPiney0.5720.7540.663
Texas-Fall-30.5440.7610.652
Texas-Spring-10.5190.7610.640
Southeasterns-20.5920.6850.638
Missouri-20.5920.6850.638
Loyalsock0.5730.6990.636
Texas-Spring-30.4920.7610.626
Texas-Fall-10.4840.7610.622
Esopus-DBH-10.5570.6800.618
Esopus-DBH-20.5550.6800.617
Bellefonte0.5270.6990.613
DogDays0.5160.6990.607
WestQuals0.5690.604 [1]0.586
JuneJamboree0.4490.6990.574
Fiddlehead0.4740.6360.555
Farmington0.4240.6800.552
CoveredBridge0.4750.6280.551
Punchbrook0.4970.5780.537
Salmon0.4450.6280.536
Snoqualmie0.5570.4950.526
Gallatin0.4880.5500.519
Nooksack0.5430.4870.515
TJClassic0.3970.6040.500
Mokelumne0.3970.6040.500
SalmonlaSac0.4970.4900.493
Riversport0.4750.4500.462
Blackwater0.4560.4630.459
Payette0.5060.3880.447
Esopus0.4820.4070.444
Lehigh0.4280.4500.439
FallCreek0.3780.4630.420
FiddlersElbow0.4300.4070.418
Codorus0.4060.4070.406
SE-JrOlyQual0.4350.3750.405

Table 6.5


[1] The West Qualifier was lightly attended, and was won --narrowly -- by a boat that was unranked at the time. This is an estimate based on the rank ratio of the 2nd-place boat.
[2] Estimated based on better-of-two results because I don't have results for both runs.

For the most part, this aligns rather well with what one might expect: the 5 races on the Ocoee are at the top; races like Trials Qualifiers, Nationals, and a few other hard ones (e.g. Rattlesnake) follow; races of moderate difficulty such as Amoskeag and Nooksack and Punchbrook are in the middle, and easy races like Lehigh and Farmington are near the bottom. Note that all three days of Sr Trials wind up weighted within 3.4% of each other -- which is quite close to what one might expect for races whose courses are carefully designed to be difficult and consistent, and whose competitors include nearly all current team members.

But there are some things that I'd consider anomalies: the Esopus and Payette races are probably weighted too low, and the races at Bellefonte (Bellefonte, Dog Days, June Jamboree) are weighted too high. However, the good news is that these anomalies aren't as large as the ones generated by the importance-factor/strength-of-field system.

In other words: this isn't perfect; but it's better than what we have. Part of the problem is that these 1999 numbers were calculated using 1998 rank ratios as the starting point, and by all the arguments in section 5 we already know that some of those rank ratios are way off. (For example, most of the paddlers in the west are probably under-ranked; thus races like Nooksack, Snoqualmie, and SalmonlaSac aren't weighted as highly as they ought to be.) Since this is an iterative process, with each year's rank ratios used as the starting point for the next year's calculations, it'll take a cycle or two through to work those problems out. (In other words, each time an estimate is made, the effects of a previously inaccurate estimate, if any, will diminish.)

6.5. Assessing the proposed weighting system's effectiveness

This would be a good point to revisit the critique of the current system outlined in Section 5 and see if the proposed system addresses those points. The issues pertaining to importance factor all disappear because the importance factor isn't used. The issues with field strength also go away, for the same reason.

That's not to say that there aren't issues. For example, replacing the importance factor with a difficulty factor just means that a new set of issues will have to be addressed. Among those are questions such as "Should difficulty account for half the weight of a race?", "What's the most accurate metric for difficulty?", "Is that metric stable when the race is sparsely attended, or when the field is of scattered strength?" The same thing goes for the other part of the race weighting formula, the rank ratio of the fastest boat: "What about once-in-a-lifetime runs by a single boat that skew the race weighting?" and "Since the fastest boat is almost always a K-1, what effect will this have on race weights?" and "Should the strength of the fastest boat account for half the weight of a race?"

Those are not easy questions to answer -- although I think they're much easier to deal with than the issues posed by the current system.

But those questions aside for a moment, does this proposed system result in rankings that are more accurate than the current one? For that matter, how can we measure how accurate the rankings are from either system?

For that, we need to come up with a way of calculating how well estimated performance (the rankings) matches up with measured performance (the results).

6.6. How right is right?

One way to assess how well a ranking system approximates reality is to compute the mean-squared error between (a) actual results and (b) what the ranking system indicates. This isn't necessarily the best way, but it has the advantage of being computationally simple -- and it's certainly good enough to indicate the presence or absence of large errors.

Let's try an example, using the hypothetical results from our four-boat race of moderate difficulty, and rank ratios assigned by some hypothetical ranking system "1":

Mean-squared error computations for moderate race, ranking system "1"
ClassScoreRace RatioRank RatioRank Ratio Fastest/Rank RatioError
A137.11.000.8871.000(1.000-1.000)^2 = 0.000
B156.21.139.7131.244(1.139-1.244)^2 = 0.011
C194.91.421.4222.109(1.421-2.109)^2 = 0.473
D311.52.272.2743.237(2.272-3.237)^2 = 0.931
Mean-squared error = 0.472

Table 6.6

Now let's try it for another hypoethetical ranking system "2" -- same race, same results, but different rank ratios:

Mean-squared error computations for moderate race, ranking system "2"
ClassScoreRace RatioRank RatioRank Ratio Fastest/Rank RatioError
A137.11.000.9001.000(1.000-1.000)^2 = 0.000
B156.21.139.7501.200(1.139-1.200)^2 = 0.004
C194.91.421.4052.222(1.421-2.222)^2 = 0.642
D311.52.272.3752.400(2.272-2.400)^2 = 0.164
Mean-squared error = 0.270

Table 6.7

It's a contrived example--but it does show that for this particular race ranking system 2 was considerably closer to reality than ranking system 1.

One thing to note about this: ranking system 2 had a poorer estimate of the rank ratio of our hypothetical C-ranked paddler than ranking system 1; however, it had much better estimates for the B- and D-ranked paddlers, enough so that it came out ahead in the overall comparison.

The trick is in making this work for all races, not just one: and the problem with that is that a set of rank ratios which matches up beautifully with the results for one race may do quite poorly when checked against a different one. What this leads us to is:

Ranking system 1 is better than ranking system 2 if the average (over all races) mean-squared error between predicted and observed results is less for 1 than for 2.

It's worth noting that this doesn't say anything about the internal workings of 1 and 2: and that's because the exact algorithms used aren't relevant to this. All it says is that whichever ranking system better matches the original data -- which are the only measurements we really have -- is a better ranking system.

It's also worth observing that NO ranking system will reduce the average mean-squared error to zero. Not only will any occurrence of the A beats B, B beats C, C beats A problem prevent this from happening -- but any time A beats B by any percentage other than the ratio of their rank ratios, the mean-squared error will be nonzero. In practice, this isn't that much of an issue: where the measurements provided by the rankings are critical (for funding, team trials eligibility, etc.) they are almost entirely determined by head-to-head competition at a handful of races -- and that in turn mitigates the effects of this problem. (Data point: 1999 US Senior Team members account for 129 results in this year's overall results. Team Trials, Ocoee DBH, Nationals and NOC combined accounted for 100 of those.)

To restate that last paragraph in another way: every ranking system will show some anomalies, because the nature of the results makes it impossible to eliminate them all. The accuracy of the ranking system does not depend solely on the number of anomalies, but also on their severity.

6.7. How wrong is wrong?

The current NWSC ranking system produces an average mean-square error (MSE) of .387 (for 1999 races); the proposed system yields .307. One experiment (see Section 7.5) yielded an average MSE of .280.

So at a first glance, that means the proposed system is about 21-27% more accurate even though it was seeded with numbers from the current system. It's my expectation that when seeded with numbers generated from itself that it would improve even more.

Here's a look at the top 10 K-1W's, C-1's, and K-1's for 1999 as computed by the current and proposed ranking system. (No slight meant to the C-2's; just trying to illustrate the point without too much more data.)

Comparison of current and proposed ranking systems
ClassRankName(s)Rank RatioPercentileRankName(s)Rank RatioPercentile
K-1WA1Hearn, Cathy0.824100.0A1Bennett, Rebecca0.815100.0
K-1WA2Bennett, Rebecca0.80998.2A2Hearn, Cathy0.80799.0
K-1WA3Altman, Renata0.79997.0A3Leith, Sarah0.80098.2
K-1WA4Leith, Sarah0.78895.6A4Altman, Renata0.78896.7
K-1WA5Seaver, Mary Marshall0.78094.7A5Seaver, Mary Marshall0.77394.8
K-1WA6Larsen, Hannah0.74390.2A6Stalheim, Megan0.75492.5
K-1WA7Stalheim, Megan0.74290.0A7Miller, Aleta0.72889.3
K-1WA8Miller, Aleta0.72688.1A7Larsen, Hannah0.72889.3
K-1WA9Jorgensen, Anna0.71486.7A7Beakes, Nancy0.72889.3
K-1WB10Beakes, Nancy0.69984.8A10Jorgensen, Anna0.71988.2
C-1A1Hearn, David0.899100.0A1Hearn, David0.887100.0
C-1A2Jacobi, Joe0.86696.3A2Jacobi, Joe0.84595.3
C-1A3Michelson, Kevin0.85695.2A2Conklin, Steve0.84595.3
C-1A4Ennis, Chris0.84994.4A4Michelson, Kevin0.83994.6
C-1A5Conklin, Steve0.83392.7A5Ennis, Chris0.83393.9
C-1A5Bahn, Ryan0.83392.7A6Boyd, Adam0.82993.5
C-1A7Boyd, Adam0.81790.9A7Bahn, Ryan0.82693.1
C-1A8Davis, Samuel0.80489.4A8Sanders, Lee0.81992.3
C-1A9Sanders, Lee0.79788.7A9Davis, Samuel0.81391.7
C-1A10Crane, Austin0.78287.0A10Crane, Austin0.78989.0
K-1A1Shipley, Scott1.000100.0A1Shipley, Scott0.992100.0
K-1A2Beakes, Jason0.97697.6A2Beakes, Jason0.95896.6
K-1A3Parsons, Scott0.97297.2A3Parsons, Scott0.94995.7
K-1A4Giddens, Eric0.96496.4A4Giddens, Eric0.94495.2
K-1A5Geltman, Louis0.93493.4A5Jackson, Eric0.93694.4
K-1A6Heyl, Brett0.92392.3A6Geltman, Louis0.92893.5
K-1A6Jackson, Eric0.92392.3A7Smith, Shaun0.91692.3
K-1A8Braunlich, Kurt0.92092.0A8Kimmet, Nick0.91592.2
K-1A9Nielsen, Corey0.91291.2A9Heyl, Brett0.91392.0
K-1A10Harris, Cody0.90590.5A10Braunlich, Kurt0.90991.6

Table 6.8

Most of the numbers match up quite closely between the two systems: one example of a major shift that becomes immediately apparent is Shaun Smith in K-1, who doesn't even make the top 20 under the current system, but ends up in 8th under the proposed one. It's worth looking at this example to understand why such a discrepancy occurs, and why it can be argued that the latter ranking is the more correct one.

Comparison of current and proposed ranking systems
1999 Results for K-1 Shaun Smith
Weighted Race Ratio as shown in Section 3.9
Three best races highlighted
RaceRace RatioWeighted Race Ratio
Current System
Weighted Race Ratio
Proposed System
Ocoee-DBH-11.0800.7410.912
Ocoee-DBH-21.0460.7650.938
NOC1.1110.7650.833
Trials-11.1120.8990.899
Trials-21.2880.7760.743
Trials-32.2680.4410.438

Table 6.9

The reason now becomes apparent: Shaun turned his two of his best three performances of the year at the Ocoee DBH, where the only people who beat him were current and former national team members. Under the current system, the weight on Trials is so high and the weight on the Ocoee DBH is so low that his second-best performance of the year -- 8% off Scott Shipley at Ocoee DBH #1 -- doesn't even count as one of his best three races. However, under the proposed system, both Ocoee races are weighted much more heavily, and as a result, Shaun's final rank ratio reflects what he's capable of.

Similar analyses can be repeated for all the differences between the two ranking systems: it's important to remember though, that neither one is correct, in the sense of being demonstrable fact. Both are estimates, and the assessment of which is more accurate can't be made on the basis of one or a handful of individual rankings, but must be made across the entire set of rankings.

7. Directions for future research

7.1. Difficulty Metric

The difficulty metric proposed in Equation 6.2 was derived empirically -- mostly because there doesn't seem to be a theoretical basis to work from. But it's by no means definitive: perhaps there are other difficulty metrics which would yield better results.

For example, I've already tried different percentage cutoffs; no doubt further investigation can be done there. There have also been suggestions that using raw times or times-with-50's-but-not-touches or some other functions may yield better results. That's possible; the only way to find out is to do the experiments.

7.2. Race Weighting

Similarly, the overall race weighting formula ( Equation 6.1) may not represent the best way to balance the difficulty of the race with the strength-of-fastest-boat. One experiment that I've tried was to multiply those factors rather than add them; the result is highly nonlinear behavior in the resulting rankings.

7.3. Race Inclusion

Is three the right number of races to use? Or would using four improve the results? What about using only three races, but selecting them by throwing out the best/worst races for each boat?

7.4. Time-weighting

Should older results be devalued? Or does the selection of the best three (or equivalent) automatically handle this because older races will eventually not be used to compute the current ranking?

7.5. Result selection - better-of-two vs. combined

This might seem counter-intuitive at first, and even second glance: but it's not. Because we select the best three races for each boat, any boat that does a sufficient number of races will eventually manage to have two good runs on the same day; that race is likely to be one of their best three, and may even dominate the average.

7.6. Recomputation interval

We already do this: each year's rankings are used to compute the following year's. The problem is that the presence of many fast-developing paddlers skews those results. A paddler who was of high C strength in 1998 may be competing at a high B level in 1999 -- but because that paddler's rank ratio, as used in the race weight, is from 1998, that won't be adequately reflected.

A possible solution to this is to compute and publish rankings more often. Possible times include just after team trials, mid-summer (before nationals), late October (when the season is mostly over), and late winter (before the season starts).

8. References

  • Shafer, Glen, "A Mathematical Theory of Evidence", Glen Shafer, Princeton University, 1976.
  • Fukanaga, Keinosuke, "Introduction to Statistical Pattern Recognition", Academic Press, 1990.
  • Rosen/Discrete Mathematics
  • The "Real" Mythical College Football Champion (OR/MS Today - October 1995)
  • Computers, Polls and Playoffs (OR/MS Today - October 1995)
  • DMP: Definitions from Discrete Mathematics
  • Mark Schlatter's Research Page
  • MRatings Theory and Descriptions
  • Sports Statistics
  • Ivars Peterson's MathTrek - Who's Really No. 1?

    9. Acknowledgements

    A lot of people inside and outside the slalom community have contributed ideas and questions. I'll try to list them all, and hopefully not be remiss by omitting anyone: Jonathan Altman, Karin Baldzer, John Brennan, Bob Campbell, Chris Carter, Chuck Cooper, Lee deWolski, Oliver Fix, Renee Gelblat, Tom Gelder, Bert Hinkley, Peter Kennedy, Dave Kurtz, Keech LeClair, Brian Parsons, Sylvan Poberaj, Joel Reeves, Mike Sloan, Merril Stock, Max Wellhouse.

    Joan Schaech proofread this and sanity-checked the content multiple times for intelligibility. Any remaining errors are mine.

    John Koeppe has made a number of helpful comments about revision 2.0 of this paper, some of which are included in 2.1, and others of which have led to additional study that will be included in future revisions.

    Many, many race organizers have passed along results which have enabled a sizable quantity of data to be amassed and studied. Thanks to their efforts, the 1999 rankings used more data and included more paddlers than ever. Again, I'll try to list them and hope I didn't leave someone out: Charlie Albright, Scott Bowman, Chris Carter, Mark Ciborowski, Linda & Mark Davidson, John Day, Lee deWolski, Wayne Dickert, Steve Exe, Jennie Goldberg, Kirk Havens, Sonny & Amy Hunt, Ray Ingram, Don & Paula Jamison, Ralph Johns, Barbara Kingsborough, Dave Kovar, Dave Kurtz, Ben Kvanli, Keech & Ann LeClair, David Martin, Sean McCarthy, Peggy & Dave Mitchell, Randolph Pierce, Mark Poindexter, Bob Putnam, Bob Ruppel, Marilyn & Wayne Russell, Ted Ryan, Susan Saphire, Walt Sieger, David Sinish, Dave Slover, Merril Stock, John Trujillo, Boo Turner, Tom Vollstedt, Don Walls, Henry Wight, Rick Wright, Andreas Zimmer.


    All contents © copyright
    Rich Kulawiec 1999, 2000. All rights reserved.
    Please send comments to Rich Kulawiec.
    Contact: webmaster.