Collections aggregate on the wrong name sometimes

Post Reply
dominiquefortin
Posts: 17
Joined: Tue May 17, 2005 5:52 pm

Collections aggregate on the wrong name sometimes

Post by dominiquefortin »

Here is an example:

Code: Select all

(-) Supernatural - "Supernatural-116 Shadow.ws.hdtv-lol.svcd.snoconv"
 +--(*) Supernatural - "Supernatural-117 HellHouse.ws.hdtv.xvid-xor.svcd.snoconv.par2" yEnc (1/1)"
 +--(*) Supernatural - "Supernatural-117 HellHouse.ws.hdtv.xvid-xor.svcd.snoconv.par01.rar" yEnc (*/21)"
 +--(*) Supernatural - "Supernatural-117 HellHouse.ws.hdtv.xvid-xor.svcd.snoconv.par02.rar" yEnc (*/21)"
 +--(*) Supernatural - "Supernatural-117 HellHouse.ws.hdtv.xvid-xor.svcd.snoconv.par03.rar" yEnc (*/21)"
         ...
We should see "...-117 HellHouse.ws.hdtv.xvid-xor.svcd.snoconv" as the aggregate name.


P.S.: This very very very useful feature. Thank you.
alex
Posts: 4543
Joined: Thu Feb 27, 2003 5:57 pm

Post by alex »

it is just a heuristic algorithm which gives a guess, and btw it cannot be the same algorithm as "select related headers" since then it would be performance disaster, now as you can see the performance impact is negligible and opening large newsgroups is even several times faster.

if there would be an agreement and all subjects were well-formed you would blame the poster.

what could be done if e.g. you downloaded 116 and deleted all related headers (marked as deleted) it would take the subject of the first undeleted header (i.e. 117), now it takes the name of the topmost file entry even if it is with deleted flag. also in principle the collection subject could be more informative (e.g. enumerating several different before extension file names)

but it doesn't mean that the algorithm cannot be adjusted, what should be taken into account given the relative anarchy regarding the post subjects we cannot go after few posts if, when parsed differently, it would result in breaking too many other posts.
dominiquefortin
Posts: 17
Joined: Tue May 17, 2005 5:52 pm

Post by dominiquefortin »

it is just a heuristic algorithm ... as you can see the performance impact is negligible ...
I agree performance is important.
if there would be an agreement and all subjects were well-formed you would blame the poster.
It would certainly make your life as a programmer easyer, but I don't beleave anything can be done about that. :?
what could be done if e.g. you downloaded 116 and deleted all related headers (marked as deleted) it would take the subject of the first undeleted header (i.e. 117), ...
Choosing the first undeleted header as the aggregate name would make it semanticly significant. That's the way to go.
alex
Posts: 4543
Joined: Thu Feb 27, 2003 5:57 pm

Post by alex »

in the next version i'll just derive the subject from the last file, we'll see if it is enough.
dominiquefortin
Posts: 17
Joined: Tue May 17, 2005 5:52 pm

Wrong date for collections

Post by dominiquefortin »

The choice of date for the collection should also be from the last header so that when sorting by date it gives something consistant with what is listed when we expand the collection.

current version:1.2.8
alex
Posts: 4543
Joined: Thu Feb 27, 2003 5:57 pm

Post by alex »

the second part of the subject is only listed for reference, in principle i could put the latest or whatever date, i'm not sure it is very critical right now.

the difference is it is possible to use more sofisticated methods for forming the collection subject, e.g. choosing the best suitable file, but it may be a performance issue to set the collection date to be the date of the best file.
Post Reply