Page 1 of 1

Connections not timing out

Posted: Sun Jul 15, 2007 6:11 pm
by Greg_G
Hello,

I'm not sure at what version this started to occur, because it's fairly rare. But occasionally my connection to my ISP will cut out and sometimes require a power cycle of the ADSL "modem". In the past, when this would happen, UE would have long ago timed out the original connection and would be getting GetHostByName failures subsequently (as expected). However, recently, this behavior has changed. Now it just gets stuck at "Connecting" and never times out. When this happens I just restart UE and all is well. But it does not right itself. This happened again yesterday a little after noon and (I didn't notice) it was stuck "Connecting" since then (>24 hours) even though the actual ISP downtime was very short.

Some relevant info:
- Read/Write Timeouts are set to 60 seconds.
- I am using a SSL connection exclusively. I switched to SSL just after the first version of UE to natively support it came out (I had been awaiting it :p) and that's approximately when this started happening, so it might be related. But it's hard to tell since the ISP issue only happens perhaps once a month.
- I just tried powering off my modem. I left it off for 3 minutes (I can't do longer, need to keep my internet servers up). During that time all running jobs (in this case 2 article tasks and 2 header tasks) froze at their current percentage and did not time out. However, unlike the ISP outtages, once I restored connectivity, they resumed within the next minute. So obviously powering off the modem doesn't have the same effect and can't be used for testing this issue.

Honestly this is not a terribly big deal to me. But I thought I should report it in case others had experienced it. If no one else is experiencing this, I probably wouldn't worry about it.

Thanks,
Greg

Posted: Mon Jul 16, 2007 12:53 am
by alex
if you get in the end dns error, you are talking about the "gethostbyname" function.

the function doesn't have timeout.

there is asynchronous microsoft specific gethostbyname call (different function name), i used it at some point (in newspro i think), but back then it led to frequent crashes caused maybe by firewalls i don't remember exactly and the asynchronous version is not quite pretty to use.

in principle it would be possible to abandon gethostbyname call but it would leave a hanging thread (but the same now happens if you cancel the task and the call doesn't return).

from other side the non-returning call is not normal, it points to some problem with a driver, unless you don't use some ISP provided driver to connect to the internet (which i guess is not the case), i think it well might be your firewall, socket aware antivirus or like.

if you have in mind a third party program to be a suspect to blame for the non-returning socket call and you can reproduce the problem (e.g. by unplugging the modem?), try to uninstall the firewall or antivirus temporarily and check whether the problem occurs then, if not it is a faulty third party driver.

when i look in google groups something like "gethostname hangs windows" i don't find anything consistent except for very old articles, so most likely it is something on your computer, but let you check this guess first. technically i can very easy set any timeout for the call with the reservation made above, if i replace a line of code "for ( ; ; )" with for "(int i=0;i<6000;i++)" the call will be abandoned about after 10 minutes, right now it waits for gethostname to exit since it is supposed to.

Posted: Mon Jul 16, 2007 2:13 am
by Greg_G
Alex,

I think you misunderstand. I mentioned the GetHostByName as what it used to do when things would timeout properly. The various connections to the NSP would timeout and then would continuously retry. Since there was no internet connection at that point and therefore no access to a DNS server it would fail at the GetHostByName(). That was the proper behavior. Then if internet connectivity returned UE would resume it's downloading properly because it was retrying all that time.

What is happening NOW is that UE hangs when "Connecting" (when this particular issue happens) and never "gives up" even if connectivity returns. I have to restart UE and then all is well. Aborting the current task solves it as well.

Greg

Posted: Mon Jul 16, 2007 3:03 am
by alex
Actually I understand, so I made use of the information you provided about the period when it was working.

My question was did you mean "connecting" means the connection is busy with dns resolution (so when it gave timeout longer than set in properties->tasks, read timeout the error was always dns error), or you mean you got also read timeout errors longer than in the settings as well? When it writes "Connecting" it may mean either "resolve dns" or "connect".

Just for gethostbyname (dns resolution) there is no timeout setting, the timeout only applies for read/write operations. Connect also doesn't have timeout setting, as to connect did you ever see "Connect error"?

In short when the problem happened in the past what error messages did you see? Those are all possible errors related to "Connecting" status timeout:

"Cannot find server or DNS error" (UE timeout setting is not relevant)

"Connect error" (UE timeout setting is not relevant)

Did you ever see the second error? Did you see running tasks hang forever in other than "Connecting" status? If you saw only the first error "Cannot find server or DNS error" when it worked on dropping connection something could be done about it but probably you should ensure first it is not your third party firewall which does that since I didn't find any conclusive references such a problem would exist without some problem with a third party driver.

To check it is not third party firewall if you have one if you can reproducing the problem e.g. through switching off the modem I asked to uninstall it temporarily and check whether such a problem happens again without the third party interference.

Just what you are talking about is a winsock system call which takes forever when it shouldn't, it is difficult to imagine it is normal way of the system socket layer to operate, so most likely something is meddling in. Most system functions are presumed to finish in finite time, when time may be significant those things are put into a separate thread as in the connect/resolve case so the program is still responsive and you can cancel them, but the times are still guarantied to be reasonably finite under a normally working system.

Still as to dns resolution I can make it to abandon the operation e.g. after 10 minutes as I mentioned.

I have this feeling I'm repeating my previous message, is it more clear now? :)

Try to quote and answer my questions so I have more clear idea what you are talking about if I missed something.

Posted: Mon Jul 16, 2007 6:05 am
by Greg_G
Yes I understand you much better this time, Alex, thanks. :) It's late so I'll post replies tomorrow. (Hopefully, anyway...that client I mentioned in the icon post keeps dropping more work on me...but I'm not complaining, it's a good thing!)

Posted: Mon Jul 16, 2007 5:18 pm
by Greg_G
alex wrote:My question was did you mean "connecting" means the connection is busy with dns resolution (so when it gave timeout longer than set in properties->tasks, read timeout the error was always dns error), or you mean you got also read timeout errors longer than in the settings as well? When it writes "Connecting" it may mean either "resolve dns" or "connect".
I mean the former. In UE's Task Manager it would show "Connecting" but never time out (in this last occurrence it stayed that way for about 24 hours for all four allowed tasks).
Just for gethostbyname (dns resolution) there is no timeout setting, the timeout only applies for read/write operations. Connect also doesn't have timeout setting, as to connect did you ever see "Connect error"?
Well before this problem, if connectivity was down, UE would try to connect and eventually it would give up and report a GetHostByName error. Whenever I saw this I always knew I needed to reboot my modem.
In short when the problem happened in the past what error messages did you see? Those are all possible errors related to "Connecting" status timeout:
"Cannot find server or DNS error" (UE timeout setting is not relevant)
"Connect error" (UE timeout setting is not relevant)
Honestly, I can't remember what I used to get. I just remembered it saying GetHostByName error (since I've programmed with winsock on many occasions I understood it's meaning for me, that the ISP's DNS server that my personal DNS server forwards requests to in unreachable) and so I knew to power cycle the modem.
Did you ever see the second error?
When things worked right, I can't remember, sorry. But no I never see that now.
Did you see running tasks hang forever in other than "Connecting" status?
Yes. I didn't mention it in my original post because I couldn't remember "for sure". I don't like reporting symptoms that I can't say with 100% certainty are true and as of yesterday I thought I remembered them freezing in another state but was not sure. However, when I checked today, two of my tasks are in a frozen state in the middle of a download. One is stuck at 12% progress and the other at 38% progress. I'm fairly sure this has happened before (in the previous occurrence I believe all four allowed tasks were stuck at a percentage progress). From looking at the time stamp on the files that would have been saved around the time these froze, it appears these two have been frozen since 5:30 pm EST yesterday (so about 19 hours...obviously I don't check the system very often :p).
you should ensure first it is not your third party firewall which does that since I didn't find any conclusive references such a problem would exist without some problem with a third party driver.
This is obviously a possibility and since no one else has come forward claiming this problem I certainly have to take it very seriously. Unfortunately disabling (or uninstalling which is sometimes necessary in these situations) is out of the question as there is only one firewall feature-rich enough for me to use now that Tiny PF was bought up and destroyed by CA (I now use Jetico, which is admittedly in beta right now, although I've used Jetico for a while now and this problem only started occurring a month ago, but I realize that's pretty poor exonerating evidence). So I may just have to live with this issue. I certainly understand how it is a very likely culprit.
Just what you are talking about is a winsock system call which takes forever when it shouldn't, it is difficult to imagine it is normal way of the system socket layer to operate, so most likely something is meddling in. Most system functions are presumed to finish in finite time, when time may be significant those things are put into a separate thread as in the connect/resolve case so the program is still responsive and you can cancel them, but the times are still guarantied to be reasonably finite under a normally working system.
Understood. You are saying, I believe, that the timeouts for these operations are handled by the system entirely and therefore out of your hands. That would certainly point the finger at the firewall, since I'm using a stock winsock dll.
Still as to dns resolution I can make it to abandon the operation e.g. after 10 minutes as I mentioned.
Well since this is also happening mid-download, I suppose the issue isn't just with DNS resolution. (I know you didn't have that information when you suggested this.) So likely this is unnecessary. Although I guess if there were a way to auto-"kill" a connection in UE that has been completely idle for X amount of time (like I can by manually untasking a job when it's stuck), that'd allow me to workaround this problem. I hate untasking a job because then I have to go find it again to re-queue it. So usually I just restart UE.

Thanks for talking me through this, Alex. Unless someone else runs into this problem in UE I think we can chalk this up to third-party interference.

Greg

Posted: Fri Jul 20, 2007 12:13 am
by alex
yes, with the downloads frozen in the middle it is symptomatic, since for reading (select for read) operation the timeout is explicitly defined in the socket function call.

it is a firewall bug then.

if it is so easily reproducible why you wouldn't try to contact the firewall support, maybe they know already about the issue and can tell you when it would be fixed.

system (or third party code which is a part of the system like buggy drivers) behaving in unpredictable ways is somewhat difficult to handle, i handle some crashes like crashes inside winsock calls.

Posted: Sat Jul 21, 2007 10:48 pm
by Greg_G
Understood. Thanks Alex. It's actually quite hard to reproduce (it happened I think twice in two months and for no obvious reason). Simulating a cutout of connectivity does not trigger it. I've only seen the problem in UE but that's unsurprising as UE does significantly more downloading than any other app I use, so odds greatly favor a system-wide problem showing up in UE. But I believe it's understandable why I sought help here first.

I agree with your assessment. As the firewall is in very active development right now, I'll wait a bit until it stabilizes some and see if the issue continues to happen.

As I said in the original post, I was just curious if other users had experienced this, which if they had, would help point a finger. It's really not a big deal for me at all. I don't download that much and when I do I'm not in a hurry so even if it gets stuck for a day or two, I'll cope just fine. :)

Thanks again,
Greg

Posted: Mon Nov 05, 2007 2:27 am
by Greg_G
alex,

I've been continuing to look at this issue over the past months and I'm not certain any more that the blame should be on the firewall. You see, if I switch to the non-SSL server this stops happening entirely. It may "freeze" from time to time in the same way, but times out properly and never gets permanently stuck (until a restart of UE or cancelling the task that's stuck). But when I switch back to the SSL server, within a day all my allowed tasks are stuck, usually one at a time until they're all that way.

I think this might be related or identical to the issue you and Josef K are discussing in this thread. If so, then it looks like you are already on the case. But I figured the information I had already provided on my symptoms might be of some use now that we have this added info about it only happening with SSL.

I suspect the problem's root is UNS, but as long as the connection eventually times out then all will be well.

Thanks as always,
Greg

Posted: Mon Nov 05, 2007 3:57 am
by alex
check the other thread when it goes above this one, i may add a version there with connect timeout to check whether it resolves this problem.

another way would be to lodge a complain with UNS, but connect timeout would be useful so why not to add it.

Posted: Mon Nov 05, 2007 6:34 am
by Greg_G
Will do, and thank you.