[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: why TASS searches ignored after 1997
Rob Creager noted:
> 2. When searching, I never seem to get hits after 1997. Am I doing
> something wrong?
Nope. I just checked, and the problem was in the software here.
I believe the problem is that I was using a single wildcard to indicate
one set of files to search:
mailarchive/*/*.html
There is a sub-directory for each month, and lots of messages within
each sub-directory:
mailarchive/1998-01/*.html
mailarchive/1998-02/*.html
mailarchive/1998-03/*.html
etc.
The problem is -- that single wildcard matched too many files --
about 6500 of them. It caused an error, which resulted in the search
program ignoring all those files. Tch, tch.
I've modified the search program so that it searches an explicit
list of sub-directories. Each item has only a few hundred matches,
so the error doesn't occur. I just tried a search on "psf", and
received 361 documents from all years.
The downside of the fixed version is that it takes about 10 minutes
to search through all these files :-( Well, it _is_ 48 Meg of text,
but that's still disappointing ...
Michael Richmond