>:The solution I am looking at now is to maintain a file containing
>:the message-ids of articles that have been read. When an article
>:header is pulled in from the server, if the message-id is in this
>:file, that article will be marked as read in the .newsrc file for that
>:server and will not be displayed. This message-id file could be
>:trimmed by size or age of entry to prevent out of control growth.
> The problem here is that this method doesn't scale well. If a user only
> reads a few articles, things will be fine. What if they read a _lot_ of
> messages? Keeping only a few message ids won't help them at all.
The problem isn't as unmanageble as it may seem. We run a newsserver, and
have approx. 12.000 groups flowing through our system. With a history file
of about 120MB, we hardly get any duplicates. Now with approx. 10kb pr.
group to keep a reasonable backlog, it shouldn't be that big a problem.
Also, the history file format isn't exactly that space efficient. For a
personal newsclient where the needed throughput is a lot lower, you could
save a lot of space by reducing the number of bytes stored pr. article.
Quote:>The best bet might be to a) keep the queues of message ids per newsgroup,
With the potential of increasing the amount of data, due to crossposting.
Quote:> b) default to keeping no message ids, but allow a user to specify particular
> newsgroups they wish to track in this fashion, c) use some sort of compression/
> hashing mechanism to reduce the amount of bits needed to keep track of the
> message ids.
Compressing the message ID fields could of course be worth looking at. Also,
if you don't mind the a duplicate getting through now and then, you could
make assumptions about the ID's. For instance you could try to reduce the
hostname to a few bytes of data. (reducing the number of bits stored for
each byte, shortening top level domains to one byte, or removing it
alltogether etc.
But I don't think it's worth spending much time on it. I've got 65
newsgroups in my newsrc file. With an average of 10kb pr. group, I'll use
650kb to avoid dupes. 650kb of diskspace isn't exactly a huge amount anymore.
But I second the suggestion about letting the user specify which groups to
keep message id data for.
--