6/9/98 -trs
                            Y2K? Why not?
                        -----------------------
                  (An analysis of time within Zebra.)

        This document is an attempt to explain the issues and possible
solutions surrounding the Y2K problem within Zebra.  After reading this,
you will know much more than you ever wanted to about the innards of Zebra
and how it goes from one time format to another.  This is important because
there are a couple design issues we need to reach consensus on before we
can implement a Final Solution.

I. Everybody's Gotta Have Their Own Damn Time Format

        Internally, Zebra uses three different formats for dealing with
times and dates: ZebTime, UItime, and something I will call "TM time".  The
first two are defined within Zebra (although they are based on existing
standards), the third is part of the C library.

        ZebTime is the format that Zebra uses internally and in the netCDF
files, and is represented as seconds since January 1, 1970, 00:00:00 UTC.
(A ZebTime structure also has a microsecond offset, but let's just ignore
that.)  This is the way C defines time (re: the time_t structure), and is
easily used by computers and other non-humans.

        "TM time" is based on the "tm" structure in the C library.
Functions such as gmtime() take "seconds since 1/1/1970,00:00:00" and break
it out into components like year, month, day, hours, minutes, seconds,
julian day, etc., all of which are elements of the tm structure.  TM time
is therefore much easier to use by humans, yet the tm elements are still
numerical and therefore usable by computers with a tad more effort.

        UItime groups these readable elements of TM time into a date field
(in YYMMDD format) and a time field (in HHMMSS format).  Both these fields
are stored as long integers, and it is by using UItime that Zebra builds
filenames with the familiar ..YYMMDD.HHMMSS.cdf format.
In Zebra, UItime is generated by first converting ZebTime to TM time, and
building the two integer fields from the componants of the tm structure.

        (UItime is also called "FCC time" in parts of the Zebra code;
whether that's a standard name for the format or just a historical
artifact, I cannot say.)

        Zebra has utilities for going back and forth between these three
formats, as well as building date strings and other manipulations.

II. Two Digits Are Plenty: A Lesson in Hubris

        ZebTime is completely oblivious to the calender, and is therefore
Y2K-proof.  This is a good thing because it means we won't have to redesign
the way Zebra itself deals with time.  However, ZebTime is unreadable to
most humans, and it is in the conversion to a readable format that Y2K
rears it's ugly head.

        The primary culprit is the tm structure in TM time.  The year
element, called "tm_year", is defined as "years since 1900".  This is
perfectly rigorous and should not cause problems to careful programmers,
namely those who don't confuse this with "years mod 100".  Unfortunately, 
Zebra uses TM time to convert to UItime, and it does so like this:

        ui->ds_yymmdd = t->tm_year*10000 + (t->tm_mon+1)*100 + t->tm_mday;

This works fine for the 99 years in which t->tm_year is two digits or less.
However, once we hit the year 2000, tm_year becomes 100, and UItime starts
having problems.  So, while the UI date integer for 1/1/1997 is "970101",
1/1/2000 becomes "1000101".

        Actually, this seven-digit date is _also_ a perfectly rigorous way
to represent time; the problem is that it is ugly and confusing for us poor
humans, and we are the reason we are bothering with UItime in the first
place.  However, this means that if we were to do absolutely nothing to
Zebra for Y2K, I do not believe anything would actually break; we'd just
have ugly filenames and log messages for the day "1-Jan-100" and stuff like
that.

        The actual solution the ARM program is currently lurching towards
is to use all four digits in the year.  This has some complications I will
go into in section IV; however, we already have some four-digit year
conversions implemented, in the filenames put out by the NSA and TWP
versions of Zebra.  Unfortunately...

III.  Let's Use Any Old Four Digits; Nobody Will Ever Notice

        ...the four-digit year patch (patch level 3 to version 4.2) to
Zebra DOES NOT WORK RIGHT.  To wit: it will correctly generate the right
YYYYMMDD in the file format ONLY for the years prior to Y2K.  After Y2K, it
will generate date strings that are even worse than before, because they
won't even be ugly AND rigorous but merely ugly.

        The problem is in how Zebra, with this patch, uses UItime to build
the date string for the file.  The patch INCORRECTLY assumes that the TM
time element tm_year will "roll over" to 00 once we pass the century mark
(i.e. it thinks tm_year is "year mod 100", not "years since 1900").
Therefore, pl3 thinks that ui->ds_yymmdd will suddenly be much _smaller_
once we cross the Y2K boundary: 991231 one day, 101 (as in "000101") the
next.

        Therefore, to build the "YYYYMMDD" part of the file name, pl3
converts ui->ds_yymmdd into a string (using "%06d", so we fill with zeros
to the left), and concatenates a "19" in front if ui->ds_yymmdd > 500000
and a "20" in front if ui->ds_yymmdd < 500000.  After you stare at that for
a while, you will realize that if we did "roll" the years to "00", then
YYMMDD integers for years from 2000 to 2049 will be less than 500000, while
YYMMDD integers for years from 1951 to 1999 will be greater than 500000.

        Of course, since we _don't_ roll the years, YYMMDD will _never_ be
less than 500000, and we will end up with a date string for (e.g.) 1/1/2000
of "191000101".  The "1000101" comes from our discussion in section II; the
"19" in front comes from the patch.

        (Note: this behavior has been confirmed by our tests).

        Needless to say, this is the WRONG way to do it.

IV.  YYYY?  Because Because Because Because

        Okay, smart guy, so what's the RIGHT way to do it?  That depends
upon how we choose to deal with our years, which is something ARM needs to
develop a consensus about.  There are three possibilities that I can see:

        (a) Do nothing, and let "YY" stand for "years since 1900".  (In
which case, dates after Y2K will actually be YYYMMDD).  As indicated
before, this is a rigorous but ugly solution.  One positive is that doing
nothing is a very easy plan to implement (although we will still have to do
something to the NSA and TWP Zebras).

        (b) Continue to use two digit years, and modify the zeb routines
for dealing with UItime so that we get the last two digits of the year
("year mod 100") in the YY part of YYMMDD.  The advantage in doing this are
that everything that expects six digits or less in the YYMMDD field will
still work, including the pl3 patch.  In addition, all humans can be
expected to know what 000101 or 1-Jan-00 means in a date context.  However,
this is still a rather unelegant solution: 1/1/2000 becomes 101, while
1/1/2001 becomes 10101, etc.  Furthermore, it means that post-Y2K dates
will come lexically _before_ pre-Y2K dates, which could have complications
for routines that want to sequentially scan down a list of files.  Finally,
in ninety years or so somebody will have to go through this whole thing
again and do what we should have done in the first place, namely:

        (c) Commit fully to YYYY in all our date strings, and modify the
UItime routines to use YYYYMMDD in the date integer field.  The advantages
are obvious: a perfectly rigorous yet readable and elegant representation
of dates.  The disadvantages are equally obvious: it's not backwards
compatible, and we can expect some problems with existing programs,
especially those that call the Zebra UItime routines.

        Okay, so how big are these problems?  Outside of Zebra itself, it's
likely that programmers will have used UItime routines primarily to convert
ZebTimes into easy to read strings for log messages and labeling and such.
The extra two digits might cause memory errors as you overwrite the end of
the buffer for these strings. This is unlikely, since most people just
define such buffers as char buf[100] or something large like that to start
with, but all procedures that use UItime utilities will need to be checked
to be sure.  In addition, procedures that use the Zebra routines to go from
UItime to ZebTime (rather than the reverse - for example, to take a
YYMMDD.HHMMSS time on the command line) will need to be modified to either
accept YYYYMMDD or convert to YYYY.

        These checks and changes are almost trivial to make for any one
procedure; however, running through _all_ the VAPs and ingests will be a
sizeable job.  Some of this may be simplified by the use of libraries or
the TDB for such time conversions, depending upon implementation details.
For example, a simple change to BW (already implemented, by the way) was
enough to fix the command line issue for the VAPs that use BW.

        Within Zebra, there are a couple of additional changes that need to
be made to reach full YYYY compliance.  Fortunately, most Zebra utilities
themselves use a function called TC_EncodeTime() to convert ZebTime into a
human-readable string for output purposes; a minor change to TC_EncodeTime
will therefore propagate YYYY to most Zebra utilities.  One problem with
this, however, is that TC_EncodeTime requires a user-supplied character
buffer to hold the output date string; the extra two digits may overflow
this buffer if it wasn't big enough.  I have looked through most of the
Zebra code that calls TC_EncodeTime, and the only file that appears to have
this problem is the dsdwidget.c.  (The fix is trivial, just make the buffer
a char buf[40] instead of char buf[20].)

        I therefore believe that moving to YYYY will not be particularly
painful.  In fact, I know it won't, because we've already done it:

V. Zebra Y2K Testing: A Light In the Darkness

        Dave and I have already implemented option (c) (that's the YYYY
one) on top of Zebra version 4.2.3 on the development system in the EC.
The changes amounted to a couple of modifications in the file TCvt.c (in
src/lib) and the above-mentioned modification to dsdwidget.c (in
src/dsdwidget).  In addition, we had to change the variable MaxFuture in
Appl.c (in src/ds), because Zebra idiot-checks any sample time you give it
to make sure it's not more than an hour ahead of the current system time.
For testing purposes, we set MaxFuture to 86400*3650 seconds (ten years),
so that we can fake some post-Y2K sample times and write them out.  (Of
course, before going into production, we will want to reset MaxFuture to
one hour.)

        There are patches available to make all of these changes, even the
MaxFuture one.

        The result: it all worked perfectly.  Zebra will read in post-Y2K
data files without any problems; in fact, it will do so now, without any
modifications, since it only cares about ZebTime.  When writing out both
pre- and post-Y2K data, the test zebra correctly built filenames with the
..YYYYMMDD.HHMMSS.cdf format.  The Zebra utilities
dsdwidget and dsdump both used time strings of the format DD-MMM-YYYY, just
like they should.  There are other Zebra utilities that we haven't checked,
but since they write data strings using TC_EncodeTime, there shouldn't be
any problem there, either.

        With the afore-mentioned modification to BW, no change was
necessary to the test VAP itself, other than relinking it with the new
Zebra libraries.

        There are different versions of Zebra running in various different
places in ARM, especially the remote sites.  However, if the patches don't
work directly to bring all our Zebras up to YYYY compliance, the
modifications are not complicated and could be done by hand and tested in
under one day's effort.

VI.  Okay!  Problem Solved.  What's for Lunch?

        Well, it's not quite that simple.  First of all, we need to
establish an ARM consensus that YYYY is our path forward.  There are
probably additional complications in modifying instrument ingest
procedures, since they all deal with time in their own special ways, and
YYYY might not necessarily be the best solution for every ingest.
Furthermore, there are issues with the archive and what we are going to do
about all the data we've gathered under the old YY scheme (do we rename it
or not?  Do we have to handle both YY and YYYY data?)  Other, non-Zebra
procedures will be impacted by a change in file names; for example,
probably every single one of my IDL scripts will break under YYYY.  In
addition, the EC currently cannot handle YYYY data files, as I understand
it.

        However, it is clear that from Zebra's point of view, YYYY is the
best solution, especially since we already have a YYYY version successfully
implemented.