Page 1 of 1
Collections aggregate on the wrong name sometimes
Posted: Sat Apr 01, 2006 3:19 am
by dominiquefortin
Here is an example:
Code: Select all
(-) Supernatural - "Supernatural-116 Shadow.ws.hdtv-lol.svcd.snoconv"
+--(*) Supernatural - "Supernatural-117 HellHouse.ws.hdtv.xvid-xor.svcd.snoconv.par2" yEnc (1/1)"
+--(*) Supernatural - "Supernatural-117 HellHouse.ws.hdtv.xvid-xor.svcd.snoconv.par01.rar" yEnc (*/21)"
+--(*) Supernatural - "Supernatural-117 HellHouse.ws.hdtv.xvid-xor.svcd.snoconv.par02.rar" yEnc (*/21)"
+--(*) Supernatural - "Supernatural-117 HellHouse.ws.hdtv.xvid-xor.svcd.snoconv.par03.rar" yEnc (*/21)"
...
We should see "...-117 HellHouse.ws.hdtv.xvid-xor.svcd.snoconv" as the aggregate name.
P.S.: This very very very useful feature. Thank you.
Posted: Sat Apr 01, 2006 3:45 am
by alex
it is just a heuristic algorithm which gives a guess, and btw it cannot be the same algorithm as "select related headers" since then it would be performance disaster, now as you can see the performance impact is negligible and opening large newsgroups is even several times faster.
if there would be an agreement and all subjects were well-formed you would blame the poster.
what could be done if e.g. you downloaded 116 and deleted all related headers (marked as deleted) it would take the subject of the first undeleted header (i.e. 117), now it takes the name of the topmost file entry even if it is with deleted flag. also in principle the collection subject could be more informative (e.g. enumerating several different before extension file names)
but it doesn't mean that the algorithm cannot be adjusted, what should be taken into account given the relative anarchy regarding the post subjects we cannot go after few posts if, when parsed differently, it would result in breaking too many other posts.
Posted: Sat Apr 01, 2006 1:41 pm
by dominiquefortin
it is just a heuristic algorithm ... as you can see the performance impact is negligible ...
I agree performance is important.
if there would be an agreement and all subjects were well-formed you would blame the poster.
It would certainly make your life as a programmer easyer, but I don't beleave anything can be done about that.
what could be done if e.g. you downloaded 116 and deleted all related headers (marked as deleted) it would take the subject of the first undeleted header (i.e. 117), ...
Choosing the first undeleted header as the aggregate name would make it semanticly significant.
That's the way to go.
Posted: Sat Apr 01, 2006 2:50 pm
by alex
in the next version i'll just derive the subject from the last file, we'll see if it is enough.
Wrong date for collections
Posted: Fri Apr 07, 2006 7:23 pm
by dominiquefortin
The choice of date for the collection should also be from the last header so that when sorting by date it gives something consistant with what is listed when we expand the collection.
current version:1.2.8
Posted: Fri Apr 07, 2006 8:07 pm
by alex
the second part of the subject is only listed for reference, in principle i could put the latest or whatever date, i'm not sure it is very critical right now.
the difference is it is possible to use more sofisticated methods for forming the collection subject, e.g. choosing the best suitable file, but it may be a performance issue to set the collection date to be the date of the best file.