Page 1 of 1
Unwanted multiple downloads of same attachment
Posted: Wed Apr 23, 2008 10:25 pm
by jdr1700
Have not noticed this behaviour until yesterday.
Using search to find subject material, selecting posts with wanted attachments, hitting CTRL-D, selecting download directory, and selecting "yes" to "delete headers & bodies....."
Yesterday I ran into a group of posts that each single post selected to download, showed up seven times in the task manager and subsequently downloaded seven times each. Today I'm seeing it again, except only two times for each post. Not all posts are behaving in such a bad manner though.
What's causing this and how do I make it stop?
Posted: Wed Apr 23, 2008 10:43 pm
by alex
if someone reposted a single entry multiple time - the search engine will show it as one entry but when you import it - all related posts will be imported as well, just in the case of partials those entries are merged, in the case of single posts - not.
in principle i could keep most recent post with the same subject/author, for now it indexes everything, it would be a kind of content filtering which is bad in general, on newsreader side it could be added as well, the question if this is worth adding an option.
all those are different posts in short, if same post no way it can be shown as different multiple entries, try to open article bodies, different message-id and maybe posting time.
if those are not replies (subject starts with Re:) entries which are perfectly legitimate in this regard (those are just different replies in the same thread of discussion by the same poster) it also may happen in text oriented groups e.g. some kind of newsletter.
Posted: Wed Apr 23, 2008 10:54 pm
by jdr1700
If I pick and search specifically for only one of the offending message subjects, I see that it only shows up in the database one time, but it still downloads twice.
Posted: Wed Apr 23, 2008 10:59 pm
by jdr1700
And in the task manager I see that the messages have the same date and time.
In the import window it shows the message two times, with each instance showing the same date, time, newsgroup, author, etc.
Posted: Wed Apr 23, 2008 11:18 pm
by alex
so it is the case as i told.
in the message-id column you'll see different message-ids.
you can even reproduce it if you post with UE, you can create short post to e.g. alt.binaries.test, click "post", after you see "success" click "post" again, repeat the sequence several times.
after a short time search for your post, you see a single entry but when you import it you'll see it repeated as many times as you pressed the "post" button.
Posted: Wed Apr 23, 2008 11:26 pm
by jdr1700
ok - it was posted to two different groups at different times.
Why does not database show as two different entries if two different message id numbers as these are single part attachements posted to different groups?
Posted: Wed Apr 23, 2008 11:36 pm
by alex
in newsreaders single posts are not merged, maybe because replies may perfectly come from the same poster with the same subject;
search from other side aggregates posts from all groups into one entry even if it is a single post.
there is an option edit menu->delete probable duplicates to address this issue, it will leave only one post but it only will work (as it is implemented now) if articles are not queued for download or have article body downloaded, it was intended for text groups.
basically it is a question to the poster, instead of posting two posts separatedly he could just crosspost it (same message-id but appearing in different groups), then you would see only a single entry, but he chose to post several times instead of one, it is just reflected in the search results.
Posted: Fri Apr 25, 2008 8:43 am
by alex
another solution could be when there are duplicates among single articles on the search import header stage to mark only one article for download, the question if it is frequently needed.
if several identical articles are marked for download, those are few and small (then cannot be big) and save duplicates mode in properties->save is set to overwrite they just saved as one file in the end.