Assessing Long Range Rain Forecasts

PAGE MENU

Intro and Forecasts
Harald's Method
Shortcomings of Harald's Method
Solution 1 - Rain Threshold
Solution 2 - Allowing for the ±1 day Leeway
Solution 3 - Total Numbers of Rain and Fine Days
Solution 4 - Rain and Fine Windows
Final Outcome - a tally of my 3 methods
Conclusion

 

Intro and forecasts

Ken Ring, a long range weather forecaster living in New Zealand, made a long range (LR) forecast of rain dates for Sydney for January 2002 during the bushfire crisis.

I took the liberty of posting his forecast on the aus-wx list - here is the forecast:

    Rain days: 6-7; 13-14; 20; 24-28

Please note that Ken asks that +-1 day leeway be allowed for all his forecasts.

A meteorologist, Harald Richter, of the BoM, Melbourne, challenged LR forecasting saying the the results would be no better than a random forecast, and set about setting up a test to prove his point.

He set up 3 random number generators to give 'spurious' forecasts over the same time period as Ken's forecast, dubbing them RNG#1, RNG#2, and RNG#3.

Here are the parameters he used and the 'forecasts' generated as posted on the aus-wx list:

"Harald's IDL random number generator RNG#1 assuming 1/3 rain days 2/3 dry days: (Generates 29 numbers between 1 and 6; take 1,2 to mean "rain", and 3-6 to mean "no rain")"

    " Rain days: 7; 9; 11; 16; 21; 23-24; 28; 30"

"Harald's IDL random number generator RNG#2 assuming 1/4 rain days 3/4 dry days: (Generates 29 numbers between 1 and 8; take 1,2 to mean "rain", and 3-8 to mean "no rain")"

    " Rain days: 6; 9; 13; 19; 21; 24; 26-27; 31"

"Harald's IDL random number generator RNG#3 assuming 1/5 rain days 4/5 dry days: (Generates 29 numbers between 1 and 10; take 1,2 to mean "rain", and 3-10 to mean "no rain")"

    " Rain days: 5; 9; 28; 30"

A point to note is that Haralds RNG's are not truly random, with assumptions of certain numbers of rain days / no rain days factored in to each case, so have by clever design already partly defeated his proposition that random forecasts will do just as well as LR forecasts. What we are really comparing here are carefully crafted semi-random forecasts with a LR forecast.

Note that later on Harald introduced yet another forecast called "Mindless" that was simply no rain during the whole period. As it was looking more likely that there were going to be more no rain days than rain days, this forecast had a reasonable chance of a high score using Haralds evaluation method.

As a further 'control' to help me sort out the at times complex methodology that was unfolding in some of the methods of analysis I was persuing, I introduced yet another forecast called "Flood" that forecast rain every day and changed the name of Mindless to "Drowt".

| Top of Page |

Haralds Method

The observations Harald chose to represent Sydney in the test were YSSY METARS, Sydney Airport to the rest of us - it is quite likely that the Sydney observations Ken used to formulate his forecast were also from this station.

He also set up an evaluation table for a YSSY rain/no rain forecast for Ken and his three random number forecasters.

Here is Haralds post giving his version of the results:

<QUOTE>

Well,

Here it is, the long-awaited score of a simple  hit = (- - or R R) method:

   RNG#1:      18/28
   RNG#2:      16/28
   RNG#3:      15/28
   Mindless:   15/28
   Ken:        13/28

I do realise that the verification methodology can be improved,  but if your
rainfall forecast goal _were_ simply to predict rain/no rain days for the
point location YSSY, this simple method is an acceptable start.  

I find it remarkable that the "inverse" of Ken's forecast (replace 
"-" by "R", and "R" by "-") would have scored better (15/28) than Ken's
actual forecast.  That tells me that _any_ verification scheme that 
has Ken's forecast miraculously come out on top is suspicious.

Regarding "window forecasts",  I could shift Ken's rain windows (using eyeball
technology) up or down in my verification table (below),  and it doesn't
make his forecast look any better.  The forecast simply doesn't
relate well to the obs!

Carl wrote:

> complete while my method still requires another 2 days due to the rainfall
> distribution method and the +-1 day leeway.

My method looks at 5...30 min METARS, requiring no time-weighted 
distributions of 24h rainfall totals.  Time-weighted 24h rain distrinution 
is a temporally smoothing approach and is not optimal in Sydney's summertime 
_convective_ rainfall regime, where rain is more likely to fall in short 
bursts.
 
> Haralds method unashamedly declares Drowt (aka Mindless) with a score of
> 66% as the real winner! 

No, it doesn't. "Mindless" (a.k.a. drought) is the bottom of the pack (apart from Ken).  BTW, my methodology is not capable of feeling shame (or
the lack thereof).  The methodology in itself is mindless.

> The next best is RNG#1 with 62%, then RNG#3 on 59%,
> with Ken coming in 4th at 52%, RNG#2 on 48% and the new comer Flood on 34%.

In terms of percentages, rather than hits/fcst days, my results look like this:

   RNG#1:      64%
   RNG#2:      57%
   RNG#3:      54%
   Mindless:   54%
   Ken:        46%
 
The table from which these scores are derived follows:


  Day  Obs  Ken  RNG#1  RNG#2  RNG#3  Mindless
  --------------------------------------------
  03   -    -    -      -      -         -
  04   -    -    -      -      -         -
  05   -    -    -      -      R         -
  06   R    R    -      R      -         -
  07   R    R    R      -      -         -
  08   -    -    -      -      -         -
  09   R    -    R      R      R         -
  10   -    -    -      -      -         -
  11   -    -    R      -      -         -
  12   -    -    -      -      -         -
  13   -    R    -      R      -         -
  14   -    R    -      -      -         -
  15   R    -    -      -      -         -
  16   R    -    R      -      -         -
  17   R    -    -      -      -         -
  18   R?   -    -      -      -         -
  19   -    -    -      R      -         -
  20   -    R    -      -      -         -
  21   R    -    R      R      -         -
  22   R    -    -      -      -         -
  23   -    -    R      -      -         -
  24   R    R    R      R      -         -
  25   R    R    -      -      -         -
  26   -    R    -      R      -         -
  27   -    R    -      R      -         -
  28   -    R    R      -      R         -
  29   R    -    -      -      -         -
  30   R    -    R      -      R         -
  31   R    -    -      R      -         -<

I excluded 18 Jan 2002 due to an ambiguity in the METARS reports.
No matter whether the 18th was/was not a "R" day,  the relative
outcome would not change as all "forecasts" were "-".

 
> I look forward to Harald's post of any conclusions he may wish to draw
from > his method.

Harald wishes to point out the following:  Despite the obvious shortcomings of the 
(- -;R R) verification methodology,  a truly skillfull forecast methodology
should have at least beaten some of the RNGs (if not all).  I see no point to
engage in a more complex verification methodology at this point,  which would
have to take into account temporal AND SPATIAL rainfall distribution patterns
_and_ rainfall amounts.  Science is teeming with "research rabbit trails"
that demand to be pursued in hope of new discoveries.  A researcher must select
a few of the most promising rabbit trails.  Following down every existing
trail is impossible,  and an inadequate selection of trails is irresponsible 
and a waste of time.  

I consider the results of my simple rainfall verification methodology
sufficiently  informative to close off the trail related to Ken's methodology.
The idea in early January was to "give Ken a go" - I have done so.
I have received yet another _indication_ (not proof! -- I know one month + one
location is not enough) that his method is not sufficiently promising for me 
to be pursued.
This 'closure' is especially appropriate if you add the more physically based
arguments regarding the magnitude of the moon's gravitational pull _relative
 to_ the magnitude of all the other factors that influence rainfall.

In complex physical systems where hundreds of physical factors all influence
each other in nonlinear feedbacks,  almost _any_ methodology can survive testing as it can hide in the complexity of the physical system. 


Stay tuned for VIC thunder today,
Harald

<ENDQUOTE>

I have taken the liberty of changing "no rain" to "Fine" for what follows.

The method Harald uses for analysis is simple enough to implement in a spreadsheet or simple computer program by adapting the following formulae (written here in rough BASIC form):

    IF Obs = "R" AND Fcst = "R";
        THEN Rain = 1, Fine = 0;
    ELSE IF Obs = "R" AND Fcst = "-";
        THEN Rain = 0, Fine = 0;
    ELSE Rain = 0, Fine = 0;
    ENDIF;
    
    DayResult = Rain + Fine;

A final result percentage can be determined by:

    Result = 100 * (Total DayResults) / (Total number of Days)

| Top of Page |

Shortcomings of Harald's Method

Haralds simple evaluation has some important shortcomings aside from his RNG forecasts not being truly random in the first place:

1. What is the minimum observed rainfall to constitute a Rain day?

2. Ken asks for +-1 day leeway on the timing of his forecasts.

3. There is some merit in getting close to the total number of rain or no rain days even if the timing is a bit off.

4. There is some merit in getting close to the correct number of rain or no rain periods (hereafter called "windows") even if the timing is a bit off.

| Top of Page |

Solution 1 - Rain Threshold

Shortcoming 1 can be resolved to by setting a figure that constitutes an acceptable threshold below which any rain is considered insignificant. I suggest 1 mm as being a good starting point, however the 'best' amount to use here is debatable. An 0.2 mm observation late on the 17th or early on the 18th highlighted this issue - the exact timing is uncertain.

| Top of Page |

Solution 2 - Allowing for the ±1 day Leeway.

Shortcoming 2 can be resolved by the use of simple formulae to allow for the leeway (written here in rough BASIC form):

    IF Fcst0 = "R" AND Obs0 = "R";
        THEN Rain = 1, Fine = 0;
    ELSE IF Fcst0 = "-" AND Obs0 = "-";
        THEN Rain = 0, Fine = 1;
    ELSE IF Fcst0 = "R" AND Obs0 = "-";
       THEN IF Fcst-1 = "-" AND Obs-1 = "R";
           THEN Rain = 1, Fine = 0;
       ELSE IF Fcst+1 = "-" AND Obs+1 = "R";
           THEN Rain = 1, Fine = 0;
    ELSE IF Fcst0 = "-" AND Obs0 = "R";
       THEN IF Fcst-1 = "R" AND Obs-1 = "-";
           THEN Rain = 0, Fine = 1;
       ELSE IF Fcst+1 = "R" AND Obs+1 = "-";
           THEN Rain = 0, Fine = 1;
    ELSE Rain = 0, Fine = 0;
    ENDIF;
    
    Where:
        Fcst-1 = the forecast on the day prior to the day in question.
        Fcst0 = the forecast on the day in question.
        Fcst+1 = the forecast on the day after the day in question.
        Obs-1 = the observation on the day prior to the day in question.
        Obs0 = the observation on the day in question.
        Obs+1 = the observation on the day after the day in question.

Here is the final table with my modifications:

    Dy  Obs    Fcst1  Fcst2  Fcst3  Fcst4  Fcst5  Fcst6
    Ja         Ken    RNG#1  RNG#2  RNG#3  Drowt  Flood
                                                                                                
    03  -      -      -      -      -      -      R    
    04  -      -      -      -      -      -      R    
    05  -      -      -      -      R      -      R    
    06  R      R      -      R      -      -      R    
    07  R      R      R      -      -      -      R    
    08  -      -      -      -      -      -      R    
    09  -      -      R      R      R      -      R    
    10  -      -      -      -      -      -      R    
    11  -      -      R      -      -      -      R    
    12  -      -      -      -      -      -      R    
    13  -      R      -      R      -      -      R    
    14  -      R      -      -      -      -      R    
    15  R      -      -      -      -      -      R    
    16  R      -      R      -      -      -      R    
    17  R      -      -      -      -      -      R    
    18  -      -      -      -      -      -      R    
    19  -      -      -      R      -      -      R    
    20  -      R      -      -      -      -      R    
    21  R      -      R      R      -      -      R    
    22  R      -      -      -      -      -      R    
    23  -      -      R      -      -      -      R    
    24  R      R      R      R      -      -      R    
    25  R      R      -      -      -      -      R    
    26  -      R      -      R      -      -      R    
    27  -      R      -      R      -      -      R    
    28  -      R      R      -      R      -      R    
    29  R      -      -      -      -      -      R    
    30  R      -      R      -      R      -      R    
    31  R      -      -      R      -      -      R                

    R = day where rain was observed (YSSY METARS) or forecast
    - = day where no rain was observed (YSSY METARS) or forecast"

Note: Harald had the 18th Obs as "R?". As Harald did not actually know which day it fell on and the amount was only 0.2 mm, I have made it "-".

And here is the relevent part of my post to aus-wx giving my results for this method:

<QUOTE>

Given that Harald questioned the validity of my methods in his post whilst
his own methods are clearly questionable, I am going to elaborate where
appropriate, as openness and a fair go are two things I firmly believe in.

As a direct comparison to Haralds method where he does not allow Ken's
leeway, here are the results of the Hits +-1 day method:


    Number of Rain Hits +-1 day
    Obs   Ken   RNG#1 RNG#2 RNG#3 Drowt Flood
    12    7     6     4     3     0     12
    41%   58%   50%   33%   25%   0%    100%

Here, Ken comes out a clear winner if we ignore the control "Flood" which
hits on all Obs "R"'s  by design.

The hits are simply determined by using Ken's stated leeway as a rule:
if Fcst = "R" then if the Obs = "R" on the same day it is a hit, and if the
first case misses then an Obs "R" on the previous day or the following day
is a hit provided that the same adjacent Fcst day is not itself "R".

The percentage scores displayed as rounded to nearest whole number are
determined relative to the Obs score, where if Fcst <= total Obs the result
= 100 * (Fcst / Obs) percent, and if Fcst > Obs the result = 100 * (Obs /
Fcst) percent. This deals with the small possibility of any Fcst getting
too many hits from the application of the leeway rule by penalising it.

Given the RNG's tendancy to scatter "R"'s down their column whereas Kens
are in "R" window blocks, the RNG's have a significant statistical
advantage with the application of the leeway rule as all "R" days can
potentially score with leeway if they miss a direct hit, however even with
this advantage they did not come up trumps.

The result of 58% is probably less than what Ken would himself be happy
with, and he has since said that his forecast was somewhat hurried and he
was not aware of how much scrutiny it would get. He has generously offered
to do a Sydney forecast for any month agreed to as a true test of his
abilities.


    Number of Fine Hits +-1 day
    Obs   Ken   RNG#1 RNG#2 RNG#3 Drowt Flood
    17    14    17    15    16    17    0
    59%   82%   100%  88%   94%   100%  0%

Here RNG#2 is the winner and all RNG's did better than Ken, once again
ignoring the control "Drowt". Rule same as above substituting "-" for "R".

As noted above, the RNG's have a significant advantage with the application
of the leeway rule, and they have all come in with higher scores than Ken.


    Total Number of Hits +-1 day
    Obs   Ken   RNG#1 RNG#2 RNG#3 Drowt Flood
    29    21    23    19    19    17    12
    100%  72%   79%   66%   66%   59%   41%

In the overall result for this method, RNG#1 is a clear winner with Ken
second.

The results of the previous two methods are added after first adjusting
them for the relative proportions of the Rain and Fine days in the Obs.

<ENDQUOTE>

| Top of Page |

Solution 3 - Total Numbers of Rain and Fine Days.

I will let the relevant part of my post to aus-wx speak for itself:

<QUOTE>

I feel any Fcst with a similar number of "R" and "-" days to the Obs
regardless of timing has some merit, so here is method 2 simply comparing
the total number of Fcst "R" or "-" days to the total number of Obs "R" or
"-" days.


    Number of Rain Days
    Obs   Ken   RNG#1 RNG#2 RNG#3 Drowt Flood
    12    10    9     9     4     0     29
    41%   83%   75%   75%   33%   0%    41%

Here Ken comes out a clear winner. In spite of clever programming factors
introduced by Harald, none of the RNG's generated sufficient rain days to
beat Ken.


    Number of Fine Days
    Obs   Ken   RNG#1 RNG#2 RNG#3 Drowt Flood
    17    19    20    20    25    29    0
    59%   89%   85%   85%   68%   59%   0%

And Ken wins again. Haralds clever programming made all the RNG's come out
with numbers that were too high.


    Total All Days
    Obs   Ken   RNG#1 RNG#2 RNG#3 Drowt Flood
    29    29    29    29    29    29    29
    100%  87%   81%   81%   54%   34%   17%

And on proportionally distributed percentages Ken wins again. Perhaps if
Harald had spent more time playing with his non-random RNG's he could have
come up with better numbers.

<ENDQUOTE>

| Top of Page |

Solution 4 - Rain and Fine Windows.

These are simply periods of Rain or Fine days - I will let my post of the results speak for itself:

<QUOTE>

To complete the 3 method set, here are the results of comparing the number
of Rain or Fine periods (windows) regardless of timing. Any method that can
forecast close to the correct number of windows has some merit.

    Number of Rain Windows
    Obs   Ken   RNG#1 RNG#2 RNG#3 Drowt Flood
    5     4     8     8     4     0     1
    56%   80%   63%   63%   80%   0%    20%

Here Ken & RNG#3 are equal winners. Of course, as RNG#3 only forecast 4
days of rain over the entire period, it's result here is diminished in
importance.

    Number of Fine Windows
    Obs   Ken   RNG#1 RNG#2 RNG#3 Drowt Flood
    4     5     9     8     5     1    0
    44%   80%   44%   50%   80%   25%   0%

And here Ken & RNG#3 are again equal winners. See note above on RNG#3.

    Total Number of Windows
    Obs   Ken   RNG#1 RNG#2 RNG#3 Drowt Flood
    9     9     17    16    9     1     1
    100%  80%   54%   57%   80%   11%   11%

And a dead heat again with proportional distribution. See note above on RNG#3.

<ENDQUOTE>

| Top of Page |

Final Outcome - a tally of my 3 methods.

I will let my post to aus-wx speak for itself:

<QUOTE>

A final score can be determined by simply adding the final total
percentages of the 3 sections and dividing by 3, which helps iron out any
bias advantages of particular methods:

    Final Average
    Obs   Ken   RNG#1 RNG#2 RNG#3 Drowt Flood
    100%  80%   72%   68%   66%   35%   23%

Here Ken has come out a clear winner taking out the best score when
averaged over the 3 analysis methods.

It should be noted that Ken claims an accuracy of about 70 percent if his
+-1 day leeway is allowed - whilst this sample is too small to draw firm
conclusions, it certainly tends to back up his claim with 72% in the Hits
+-1 day section, 87% in the number of days section, 80% in the number of
windows section, and 80% final average over all 3 methods.

<ENDQUOTE>

| Top of Page |

Conclusion.

More work needs to be done on a larger sample of Long Range forecasts to determine the validity of Ken's methods.

If anyone wishes to do their own verifications of Ken's work in his 'home' territory of New Zealand where he is likely to have better accuracy due to familiarity with local conditions, you will find his forecasts for the month ahead for many places in New Zealand on his website at http://www.predictweather.com - click on the "Free Month" link.

Carl Smith.


| Top of Page | Astrometeorology Index | Main Index | Current Cyclone Information |
| Website Feedback: carls@qldnet.com.au |