Indexing basics

03 Oct 23: One server was relocated, the server currently doesn't have a public IP address since in the meantime it is running on starlink. I wrote some additional tunnel code running separately to handle this. When the server is engaged the connection is 5600+ days uniform (since around 24 July 2008). In the unlikely case I disengaged the server because of some problem the retention is 1200-3800 days depending on newsgroup. If you experience any issue please let me know.

29 Nov 23: Because of encrypted and obfuscated posts flood mass posted by a few nzb indexing websites using usenet servers as private storage for their members to download the posts - the situation with the content is pretty chaotic since their purpose is to post in such a way, so users must use their website exclusively, when the website disappears - the encrypted posts just eat usenet providers' hard disk space uselessly. If you can't find something specific please let me know at alexbirj at gmail dot com what exactly you can't find for me to check how it is possible to retain the posts. Non-obfuscated posts shouldn't be affected at all, let me know if you notice anything missing.

29 Nov 23: The search protocol had to be upgraded to extend 3 byte (16M+) limit on the server side for number of sets per instance, so at least the version 5.8.3 is needed to access it, otherwise should be no any difference.
Post Reply
alex
Posts: 4514
Joined: Thu Feb 27, 2003 5:57 pm

Indexing basics

Post by alex »

Gaining access:

first of all to make the search work you need to open access for UE to the indexing server. the indexing server can be seen in properties->general->import/search->server, the server may use two ports - the port shown to the right of the server name for registered users (genuine serial in help menu->about) and the following port for trial users (say if the port shown is 2020 it will be port 2021).

Search patterns:

i'll use double quotes "...." below to designate patterns, you don't use them around actual filter patterns.

search patterns, group filters and group kill filters are boolean wildmats (read help notes it is simlar to wildcards with boolean expressions, & means AND, | means OR and ^ means NOT).

in part it means space is a legal character (even if you cannot search for space alone).

so if you look for "star wars" you look for the phrase.

if you look for "magic & freedom" you don't put spaces unless you mean it, so you expect a space after "magic " and before " freedom", if not you don't expect spaces just enter "magic&freedom"

if you want to look for a whole word (especially actual for short words which may happen inside other words) you can use wildmat character range exclusion (^ means the following range(s) of characters is/are excluded, see wildmat description in the help notes), it will work also if the word is in the beginning or in the end of the string like:

{^a-z}bach{^a-z}

similarly if you want to look for an exact number

{^0-9}777{^0-9}

or to look for a separate word:

{^a-z0-9}777{^a-z0-9}
{^a-z0-9}bach{^a-z0-9}

if you define a group filter it is not necessary to enter the exact newsgroup name. e.g. instead of "alt.binaries.boneless" you can just enter "boneless". if you like you can enter the full group name as well but it is not like in other indexing services, you can define a fully blown wildmat partern like "picture&^erotic".

if you are not insterested in certain newsgroups and don't want see any posts belonging to them define group kill filter in properties->general->search service. you tell what newsgroups to kill, e.g. you want to exclude adult newsgroups you just enter something like "sex|erotic"

round parentheses are special characters in boolean wildmats so if your pattern contains them as a plain character - preceed it with backshash, so, if you look for "WW164 VERY early Sunday morning Leaders (text)" you don't get any results, but if you look for "WW164 VERY early Sunday morning Leaders \(text\)" you'll find what you need. the same regarding other special characters: boolean & | ^ and original wildmat special characters like square brackets.

i'll try to update this post if more questions arise.

---------------

Updated since around v1.6+:

I added double quotes to wildmats (see http://www.netwu.com/ue/patterns.htm )

I also replaced [ ] with { } in wildmats so many more subjects could be looked for through copy/paste without any modification, it is now reflected above

so to look for whole word or phrase you may just enclose the pattern in the double quotes like:

"bach"

"cpe bach"
Post Reply