Cannot download all headers from news.hitnews.eu

Post Reply
dl marshall
Posts: 8
Joined: Sun Aug 19, 2007 1:01 pm
Location: .nl

Cannot download all headers from news.hitnews.eu

Post by dl marshall »

Hello,

I read good things about UE at binaries4all.nl. So late yesterday evening I downloaded the trial version.
But I first setup a multi-boot for Windows Vista Ultimate 64-bit which I bought that afternoon.
Then installed UE.
Followed by registering with this payserver : http://www.hitnews.eu/english/ for a 7 days free trial account (with full access).

Being a complete noob to Usenet, I cannot determine whether I'm doing something wrong or that UE is at fault.

Working :
downloading NZBs (even old ones of more than 2 months old.)
adding newsservers (payed and free)
downloading newsgroups from these newsservers
downloading headers from all these newsgroups.

Problem :
when it comes to the real big newsgroups like a.b.boneless, I cannot download all headers.

I can only download headers from the 16th Agust upto and including today the 19th August.
The payserver claims 100+ days retention.
For instance I do a search in the UE indexing service.
I see something in the search results that's interesting to use as a test download, it's in a.b.boneless and dated 24 july 2007. That's 26 days.
I see the NZB and the files.
I download them all easily from the UE indexing search results.

To test the payserver retention and the UE header download speed and resource usage, I delete all the download files.
I want to download the same files from the retrieved headers of a.b.boneless.
I add a.b.boneless, retrieve all the newsgroups, purge the ones with zero articles, change the settings of a.b. boneless as per the screenshot, and try to retrieve all its headers.
But I never get to see olders headers than 16th August, so I never get to see the set of files from the 26th.

I assume, being able to download NZBs which are over 2 months old, that the retention of the payserver is OK. (Correct me when this is a wrong assumption.)
Being a noob I try to find the mistake at my end, but I'm hitting a brick wall.
Maybe any of you knows what I'm doing wrong ?

Screenshot at 99% completion of downloading all hreaders in a.b.boneless (see my UE newsgroup settings) :
http://img382.imageshack.us/img382/5071/uenotallheaders01croppeof3.jpg

Screenshot after completion of downloading all headers (shows number of articles in a.b.boneless) :
http://img382.imageshack.us/img382/8632/uenotallheaders02aftercin4.jpg
marshall, aka marshal, ranger.
Assists in controlling and policing the downloaders. Ensures that the downloaders are adhering to the download rules, and encourages a reasonable pace of downloading. Authorised to eject slow downloaders from the net.
alex
Posts: 4516
Joined: Thu Feb 27, 2003 5:57 pm

Post by alex »

edit menu->properties->newsgroups

The default for new installs (I've just checked it) retention 60 days or 8M headers, it means it only downloads last 8M headers, it shows them all but on subsequent newsgroup load it reduces it to 60 days (see reference to "let all incoming headers through" option below).

If you want to download all headers and you know the server retention is, say 100 days - change the default to 100 days and in "headers per server" change it to "all" (click "headers per server" dropdown control or type it in).

The only thing don't change "retention" to "natural" since then it will accumulate headers indefinitely even after they have expired and maybe it is not what you want.

If you leave retention at 60 days and change headers to "all" it will still download all headers because of "edit menu->properties->general, retention, let all incoming headers through" is checked by default.

So those are options related to retention that might be interesting for you.

When you increase retention it will ask whether you want to redownload headers for affected newsgroups, answer "yes" but even if you answer "no" you can still achieve the same through Workspace, leftmost tab, Newsgroups entry->context menu->advanced->reset header ranges.

Those are the options I can think of right now, I'm suprised you've heard about headers :)

Downlading headers from newsgroups usually is being implemented in the limited way (as to newsgroup size) since most clients were written/laid down before the Usenet volume explosion (the "newest" clients sometimes drop direct header download completely), so in UE you can load much higher numbers of headers than usual, in practice (a long time ago, during test stage) I was loading a newsgroup containing 150M headers (someone set retention to "natural"), probably it can take more than that.

There are some more options which may be tweaked to optimize dealing with huge newsgroups.
dl marshall
Posts: 8
Joined: Sun Aug 19, 2007 1:01 pm
Location: .nl

Post by dl marshall »

Hi Alex,

Could you please look at the first screenshot I posted ?
The settings with which I'm downloading the headers are in the screenshot.
And they are exactly like you suggest :
100 Days retention, All Headers per server, and Load Mode not set to Natural.

alex wrote:Those are the options I can think of right now, I'm suprised you've heard about headers :)
I RTFM :)
But I do not claim to understand it all.
alex wrote:Downlading headers from newsgroups usually is being implemented in the limited way (as to newsgroup size) since most clients were written/laid down before the Usenet volume explosion (the "newest" clients sometimes drop direct header download completely), so in UE you can load much higher numbers of headers than usual.
That's why I'm trialing it right now, baby.
marshall, aka marshal, ranger.
Assists in controlling and policing the downloaders. Ensures that the downloaders are adhering to the download rules, and encourages a reasonable pace of downloading. Authorised to eject slow downloaders from the net.
alex
Posts: 4516
Joined: Thu Feb 27, 2003 5:57 pm

Post by alex »

I didn't check your screenshots (I'm working on something else in the same time and it is late, I barely manage to keep my eyes open :) ), wrongly believing the server is ok (currently there is good standard easy to run server software which does provide normal support for huge groups), if the server would be ok - it could be the only explanation.

4M (in newsgroup list screenshot) per group is very small number, I rechecked with ngroups and UNS server the number should be around 125-150M headers for boneless per 100 day retention.

4M means about 3% or about 3 days of retention in boneless in headers which is consistent with August 16.

Probably they keep bodies longer than headers, you can try if they keep 100 days in bodies or they don't have enough storage then it is less (you mention they have 26 days in bodies at least, you may limit searches to the boneless group to try to locate older bodies). It is server side message-id database which is not the same as their per group header database.

They have some problems in short with providing all headers and maybe all bodies in large groups, so at least what they claim is not totally true.

I think they are using some old server software (the welcome message doesn't tell anything about the server) and/or they don't have enough storage capacity (hardware).

Or they are trying to save on header bandwidth, you may try to contact them and ask why they don't return all headers.
dl marshall
Posts: 8
Joined: Sun Aug 19, 2007 1:01 pm
Location: .nl

Post by dl marshall »

Hi Alex,

My apologies about the time difference. I didn't realise it.
Bloody internet,

I didn't realise that payservers could/would hang on to bodies longer than headers. Learned something again.
And I understand your calculation about the 3% equaling 3 days.

And I noticed that the bandwith drops to about 450KB when downloading headers.
When downloading bodies it is always around 1250KB.

I will contact them (refer them to this topic) and ask why they don't return all headers.

If it is like you explain, and it probably is, I will look further.
Because to me it looks like they are not a good match to UE.

I won't bug you any further.

Thanks,
Robert
marshall, aka marshal, ranger.
Assists in controlling and policing the downloaders. Ensures that the downloaders are adhering to the download rules, and encourages a reasonable pace of downloading. Authorised to eject slow downloaders from the net.
dl marshall
Posts: 8
Joined: Sun Aug 19, 2007 1:01 pm
Location: .nl

Post by dl marshall »

I received various explanations (in succession) from Hitnews support.

First explanation (Sunday evening) :
A lot of posting is going on in a.b.boneless
The problem occurred because Newsleecher :?: has a 1 million headers maximum.
That I could download older news with NZBs.

I replied by telling that a.b.boneless is therefore a good group to perform tests on. And that I use UE, and not Newsleecher.
I also made him aware of the fact that if he had taken the time to not even read, but just skim through this topic (I referred him to it) that he wouldn't have given those replies.
And politely requested him to terminate the free trial a/c.

Second explanation in response (Sunday evening) :
Apologies for missing the topic.
They keep all headers and don't delete headers to save bandwith as suggested in this topic.
Over 60 days they currently have some problems with completeness of posts. They were still investigating.
But theoretically I should be able to download all headers upto 60 days.
He would ask technical staff for an explanation.

I replied this evening saying that 100 real days of retention would have been exceptional. And that I would wait for his explanation.

Third explanation (almost immediately) :
The problem is with the software they are using (Diablo).
It's using a 32-bit database.
Which translates into a maximum storage of 4 million headers per group.
He admits that for a group as a.b.boneless that is not much history.

He goes on by explaining that with a few exceptions, all news providers use Diablo and that the problem therefore exists everywhere.
He concludes that it's not a problem with the reader client.
And explains that a project to migrate to 64-bit is underway, but that it will not be finished within a month.
(Why one month ? I do not understand where this comes from.)
He finishes with telling me that downloading across the whole retention period through the use of NZBs is possible.

I have to admit that he tried.
But I found the 4 million header limit a bit unbelievable.
So I checked in the Diablo documentation and there is a dhistory table (the indexing (hash) of all message-ids in a group) :
Ability to set the dhistory hash table size in the diload command.
The default is 4 million entries, equivalent to the '-h 4m' option to diload.
Each hash table entry is 4 bytes, so 4 million entries results in a 16MByte hash table.
The hash table size must be a power of 2, so the next logical step is -h 8m (32 MBytes) or -h 16m (64 MBytes).
If your news box has a lot of memory, changing your biweekly.atrim script (see the adm directory for a sample) to generate a larger hash table will greatly reduce the load on the /news partition.
I might not have completely understood the Diablo docs, but I can't be bothered to dig any further.

I'm going to check out some other news providers.
marshall, aka marshal, ranger.
Assists in controlling and policing the downloaders. Ensures that the downloaders are adhering to the download rules, and encourages a reasonable pace of downloading. Authorised to eject slow downloaders from the net.
Post Reply