pure negation / stop words search limitations

03 Oct 23: One server was relocated, the server currently doesn't have a public IP address since in the meantime it is running on starlink. I wrote some additional tunnel code running separately to handle this. When the server is engaged the connection is 5600+ days uniform (since around 24 July 2008). In the unlikely case I disengaged the server because of some problem the retention is 1200-3800 days depending on newsgroup. If you experience any issue please let me know.

29 Nov 23: Because of encrypted and obfuscated posts flood mass posted by a few nzb indexing websites using usenet servers as private storage for their members to download the posts - the situation with the content is pretty chaotic since their purpose is to post in such a way, so users must use their website exclusively, when the website disappears - the encrypted posts just eat usenet providers' hard disk space uselessly. If you can't find something specific please let me know at alexbirj at gmail dot com what exactly you can't find for me to check how it is possible to retain the posts. Non-obfuscated posts shouldn't be affected at all, let me know if you notice anything missing.

29 Nov 23: The search protocol had to be upgraded to extend 3 byte (16M+) limit on the server side for number of sets per instance, so at least the version 5.8.3 is needed to access it, otherwise should be no any difference.
Post Reply
slotboxed
Posts: 57
Joined: Sun Nov 09, 2003 3:49 am

pure negation / stop words search limitations

Post by slotboxed »

I am finding that some patterns cannot be searched for.

I want to find all posts in a certain newsgroup (which has a lot of german posts) which don't include the word german, but

^"german" search failed

Also, I get too many results when searching for some words, so I tried
"searchword"&".par2" but that yields the same results, so then I tried, by themselves:

".par2" search failed

".rar" search failed

".zip" works

Is this kind of failure due to there being too many results for a given search term?
alex
Posts: 4514
Joined: Thu Feb 27, 2003 5:57 pm

Post by alex »

Yes, those kinds of searches are not supported and they are very problematic in principle at the extent what is currently possible. It is expensive to use indexing to download unspecifically.

Try also to imagine how it is possible to implement "^german" search pattern. The only practical way is to exhaust all possibilities, thus the search server would need at worst to compare say 100 million of records against the search pattern.

With too frequent words as to stop words (when there are too many matches), it also in a way may need excessive CPU usage, since it may lead to a kind of linear search as well.

If linear search worked for indexing, you would see a lot of search engines, which just add all headers into a database then perform search by applying the search pattern to all records, but in practice with current CPU speeds versus the number of records it will be too slow.

Best for such matches you should use a usenet provider which supports compressed headers, then you can download headers in the newsgroups and then use quick filter to look for what you need.
dengle
Posts: 274
Joined: Mon Jun 30, 2003 2:37 pm

Post by dengle »

I'm not sure if you're quoting for use on the forums or not, but don't use them in the search.

For example, if you're looking for episodes of Family Guy, a good search could be:

Family?Guy&^german

or

Family&Guy&^german

EDIT: Disregard as I just reread your post. I defer to Alex' comments :-)
alex
Posts: 4514
Joined: Thu Feb 27, 2003 5:57 pm

Re: pure negation / stop words search limitations

Post by alex »

i've "improved" the behaviour a bit.

now if the subject pattern is well defined - the author pattern can be anything and vice versa.

it is equivalent to making a search without the stop word or negation and then applying the quick filter in UE, now one can do it on the server side.
Post Reply