I've been using the same boolean wildmat for par detection for a long time now, but it's no longer working. I noticed in the dev forum that, in 1.7, Alex made some changes to the syntax. But, I'm having trouble adjusting for it.
The original working string was this:
(\.[pP][0-9][0-9]|vol*.par2)&^(\.[pP]art[0-9]*\.[^pP])&^(\(0/)&^(\(00/)&^(\(000/)
Essentially, it starts with a broad search, which gets a lot of false positives, and then removed the false positives with all those &^() sections.
I tried replacing [] with {}, as per that post, but that did not help. I know I can turn off case-sensitive to replace [pP] with p but I never bothered since "it it ain't broke, don't fix it".
It seems like it starts to fail when I have ()|(). For example, this very simple (useless) testing string gives no results:
(vol*.par2)|(par)
while the two individual components work fine by themselves.
This string used to work perfectly (well it didn't include the index .par2, but that's how I wanted it). Any suggestions? I haven't come up with anything that even comes close to working. I suspect there's just something about the new syntax I'm not grasping.
Thanks,
Greg
Edit: I should mention that I use this via the Filter Editor/Filter Droplist.
Boolean Wildmat problem
Re: Boolean Wildmat problem
I've noticed myself that the OR operator appears to be a bit flaky, even with simple queries. I mainly use mine for the search service but I've been getting hit/miss results since Alex changed the format slightly (brackets or braces also seems to make no difference to my results).Greg_G wrote:It seems like it starts to fail when I have ()|(). For example, this very simple (useless) testing string gives no results: (vol*.par2)|(par) while the two individual components work fine by themselves.
in this particular example \. should be replaced with . since . is not a special character (see the list of changes http://www.netwu.com/ue/UE.txt).
but i also see now expressions with parentheses were affected, since in addition i tried to discern cases when ) is special character or not and i introduced a bug in one line of code.
i've changed it here and it is in effect for the indexing server, but as to usenet explorer itself until the next release you can use this version which i've just compiled:
http://www.netwu.com/ue/ue1701.zip
so your pattern should be then:
(.{pP}{0-9}{0-9}|vol*.par2)&^(.{pP}art{0-9}*\.{^pP})&^(\(0/)&^(\(00/)&^(\(000/)
but additionally, as to this particular pattern {pP} may be replaced with {p} and {^pP} with {^p} since in Usenet Explorer search patterns are case insensitive, so it also may be replaced with:
(.p{0-9}{0-9}|vol*.par2)&^(.part{0-9}*\.{^p})&^(\(0/)&^(\(00/)&^(\(000/)
also in principle parentheses may be omitted in most places:
(.p{0-9}{0-9}|vol*.par2)&^.part{0-9}*\.{^p}&^\(0/&^\(00/&^\(000/
if you see any problems with the version above let me know (unlikely since the main work was adding the whole word special character and i checked the more important code very thoroughly before the 1.7 release), but before all pay attention to those changes listed in UE.txt:
[] were replaced with {} since square brackets are too common in real world subjects
" special character added to help in matching whole words
\ only turns off the special meaning of the special characters and not every character i.e.
\x - if x is a boolean wildmat special character, i.e. one of ? * { } " ( ) ^ & | \ turns off the special meaning of x and matches it directly otherwise \ is not interpreted as a special character. it is not special inside curly brackets.
but i also see now expressions with parentheses were affected, since in addition i tried to discern cases when ) is special character or not and i introduced a bug in one line of code.
i've changed it here and it is in effect for the indexing server, but as to usenet explorer itself until the next release you can use this version which i've just compiled:
http://www.netwu.com/ue/ue1701.zip
so your pattern should be then:
(.{pP}{0-9}{0-9}|vol*.par2)&^(.{pP}art{0-9}*\.{^pP})&^(\(0/)&^(\(00/)&^(\(000/)
but additionally, as to this particular pattern {pP} may be replaced with {p} and {^pP} with {^p} since in Usenet Explorer search patterns are case insensitive, so it also may be replaced with:
(.p{0-9}{0-9}|vol*.par2)&^(.part{0-9}*\.{^p})&^(\(0/)&^(\(00/)&^(\(000/)
also in principle parentheses may be omitted in most places:
(.p{0-9}{0-9}|vol*.par2)&^.part{0-9}*\.{^p}&^\(0/&^\(00/&^\(000/
if you see any problems with the version above let me know (unlikely since the main work was adding the whole word special character and i checked the more important code very thoroughly before the 1.7 release), but before all pay attention to those changes listed in UE.txt:
[] were replaced with {} since square brackets are too common in real world subjects
" special character added to help in matching whole words
\ only turns off the special meaning of the special characters and not every character i.e.
\x - if x is a boolean wildmat special character, i.e. one of ? * { } " ( ) ^ & | \ turns off the special meaning of x and matches it directly otherwise \ is not interpreted as a special character. it is not special inside curly brackets.
As always, Alex, you're the best. Soon as I get time (a day or two...as I'm in the middle of a project for a client) I'll give that build a check and see how things work out. I'll post back here once I have.
And if I run into problems I'll be sure to check the UE.txt first.
Thanks so much,
Greg
Edit: I tested it out and it's working perfectly. Thanks again!
And if I run into problems I'll be sure to check the UE.txt first.
Thanks so much,
Greg
Edit: I tested it out and it's working perfectly. Thanks again!
i found myself another glitch in " whole word special character implementation.
if we look for "red"*"flower" it is natural to have "red flower" matches included as well but in v1.7 it tries literally to match both " " surrounding the star to one or more spaces as in the description instead of the intuitive match.
i changed the implementation on the server, for the client it will be available in the next version.
i was thinking to change \ back to always work as a special character but maybe i'll leave it as it is in the meantime, since it appears that copying subject to search is needed more frequently than the need to use \ as special character and remember which characters are special to follow it.
if we look for "red"*"flower" it is natural to have "red flower" matches included as well but in v1.7 it tries literally to match both " " surrounding the star to one or more spaces as in the description instead of the intuitive match.
i changed the implementation on the server, for the client it will be available in the next version.
i was thinking to change \ back to always work as a special character but maybe i'll leave it as it is in the meantime, since it appears that copying subject to search is needed more frequently than the need to use \ as special character and remember which characters are special to follow it.
Aye, in my opinion "red flower" should definately trigger for "red"*"flower".alex wrote:if we look for "red"*"flower" it is natural to have "red flower" matches included as well but in v1.7 it tries literally to match both " " surrounding the star to one or more spaces as in the description instead of the intuitive match.
As long as \\ (double backslash) acts as a normal backslash, I personally consider this a more traditional implementation and would agree with the change back. But since I'm used to things like C/C++ strings and regex it seems natural to me. To someone used to basic wildmats I suppose it's not. Tough call.alex wrote:i was thinking to change \ back to always work as a special character but maybe i'll leave it as it is in the meantime, since it appears that copying subject to search is needed more frequently than the need to use \ as special character and remember which characters are special to follow it.
Edit: Another thing to think about with the second issue (and I'm sure you've already considered this) is along the same lines as your change from [] to {}. While I tend to like sticking with old standards like [], I have to admit switching to {} was brilliant since I can now paste file names containing square brackets into the quick finder, which is so very common. But I admit I don't see the backslash very often in subjects and (obviously) never in file names. But you have the stats on character usage and perhaps the backslash is used more than I've noticed. Again it's a tough call, I find myself ambivalent about it.