Mail archiving

Matthew Hawkins matt at mh.dropbear.id.au
Fri Sep 6 04:04:08 EST 2002


Martijn van Oosterhout (kleptog at svana.org) wrote:
> 50,000 messages?!? How long does it take to open it? It doesn't mind holding
> all these messages, it just takes forever to open it.

I'm drawing from a dim memory, but it took 5 minutes and mutt grew to
60Mb or so.  Sure, 5 minutes is a long time to wait if you don't have
anything else to do.  I try to keep busy ;)  Speaking of which, its
3:30 and my coffee mug is empty...

> > > With the default settings, mutt will spend too much time updating the
> > > display and not enough parsing the mailbox.
> 
> It helps, but not much.

Crank it up higher then.  If you set it to 10000 for example, you've got
two things happening with this process - the kernel madly doing i/o, and
mutt building its internal data structures.  From my dim recollection,
when I was opening that large box, mutt's CPU load was spiking between
20% and 90%, this to me indicates that there was something else causing
a bottleneck.
(I'd expect mutt to be at a constant high cpu usage if it was having
trouble doing things)

> > It is an inherent problem with mbox that there's essentially no way to
> > produce a message list without reading the entire file.
> 
> Not true. Netscape when it reads mailboxes produces an index file.
> Subsequent openings produce the message list with no delay. Theres some

Congratulations, you've just discovered that O(1) lookups in a database
are faster than sequential reads of the entire dataset.  This
groundbreaking discovery may change the face of computer science forever!
Oh, wait, somebody already discovered this 40 years ago :P

You should work for mindcraft, your comparison strategy is similar in
determining your position.

> checks in there to ensure that the index file accurately convers changes in
> the file, but it works fine. Not only that, for me netscape can create an
> index for a mailbox faster than mutt can open it. Don't ask me why.

As far as I know, Netscape mailboxes are stored in a proprietary format.
I haven't used it in a long time though so maybe I'm wrong.  But it goes
without saying that placing markers in your proprietary mailbox format
for easy indexing isn't exactly rocket science.  (it's computer science
... hehehe :)  Perhaps one could use a strategy similar to rsync in
that, assuming you have that index file available, then you already have
some idea of how many messages are in the mailbox and where they begin
and end, then correct with rolling checksums as you go.  *shrugs* If
both programs are going through open(2) and read(2) then I think its a
safe assumption that the time taken to obtain the data in the mailbox is
roughly identical for both, and determined solely by a third external
entity - the linux kernel.

I could make mutt take 8 hours to read that mailbox if I wanted to by
various sub-optimal configuration options and perhaps external forces
(maybe nice 19 the process and go play lbreakout2 for a few hours ;-)

Oh, another thing you might like to turn off is line and/or byte counting
in the *_format strings - involves extra processing.

-- 
Matt
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.samba.org/archive/linux/attachments/20020906/744c72de/attachment.bin


More information about the linux mailing list