Page 1 of 1

Got Signal 11 (Linux)

Posted: Tue Dec 03, 2013 10:17 am
by John Adams
Jab, need your advice. I need this block of code to terminate the executable, since currently the RunLoops = false is not actually doing any good.

Code: Select all

void CatchSignal(int sig_num) {
	// In rare cases this can be called after the log system is shut down causing a deadlock or crash
	// when the world shuts down, if this happens again comment out the LogWrite and uncomment the printf
	if(last_signal != sig_num)
		LogWrite(WORLD__WARNING, 0, "World", "Got signal %i", sig_num);
		//printf("Got signal %i", sig_num);
	else {
		LogWrite(WORLD__WARNING, 0, "World", "Got signal %i", sig_num);
		Sleep(10);
	}
	last_signal = sig_num;
	RunLoops = false;
}
Can I just put an exit(0) here or something, to force the world to exit so my persist script will restart it? For some reason after the last few commits, Linux server crashes daily, then fills the drive with Got Signal 11 crash logs.

For now, I will put EQ2 DB Project into Debug (gdb) and try and see what's crashing, because it appears to be every time someone logs in and does something.

Re: Got Signal 11 (Linux)

Posted: Tue Dec 03, 2013 1:19 pm
by Jabantiz
I will try looking into this later tonight, do you know around what revision this started acting up?

Re: Got Signal 11 (Linux)

Posted: Tue Dec 03, 2013 3:22 pm
by John Adams
Foof might know better, since it's him it usually crashes on... and because he cannot access SSH, it just spins away all night :) I would say at least a week ago or so. It could also be related to the new zones being built ?? more /reloading going on than usual.

Thoughts, foofski?

Re: Got Signal 11 (Linux)

Posted: Tue Dec 03, 2013 3:48 pm
by thefoof
John Adams wrote:Foof might know better, since it's him it usually crashes on... and because he cannot access SSH, it just spins away all night :) I would say at least a week ago or so. It could also be related to the new zones being built ?? more /reloading going on than usual.

Thoughts, foofski?
The last freeze was on a reload for sure so that could be it, but it's pretty rare I was doing reloads all the night before with no issues

Re: Got Signal 11 (Linux)

Posted: Tue Dec 03, 2013 3:51 pm
by Jabantiz
If it is signal 11 then it is memory issues

Code: Select all

#define SIGSEGV         11      /* segment violation */
MSDN wrote: SIGSEGV
Illegal storage access. The default action terminates the calling program.
Will go over recent commits looking for what could cause this, if it is happening during a specific /reload that would help narrow my search.

Re: Got Signal 11 (Linux)

Posted: Tue Dec 03, 2013 5:28 pm
by Jabantiz
I had a signal 11 crash when I typed exit into the console, no clue if it is the same signal 11 reported here but I committed code to Dev SVN to fix it.

Re: Got Signal 11 (Linux)

Posted: Tue Dec 03, 2013 5:57 pm
by John Adams
Linux doesn't have a console for commands, only Windows. These happen to me when I am /reloading usually.

Dev is in debug right now, so if it crashes again, it's down til I check it out.

Re: Got Signal 11 (Linux)

Posted: Tue Dec 03, 2013 6:01 pm
by Jabantiz
I should clarify, the signal 11 was a result of the server shutting down, I normally just hit the close button so usually bypassed this, actually typing exit to shut the server down properly resulted in this. Not directly related to the console commands, I just found it using console commands.

Re: Got Signal 11 (Linux)

Posted: Sun Dec 15, 2013 5:26 pm
by John Adams
I am no Linux expert, but this doesn't make any sense to me. The linux server, sitting in debug, zero connections or and not even an attempt to connect yet since I started it, and it just crashed. Sitting there.
09:15:11 D Thread : Starting console command thread...
09:15:11 I Console : Type 'help' or '?' and press enter for menu options.
09:15:11 D Thread : Starting autoinit loginserver thread...
09:15:11 I World : Connected to LoginServer: eq2emulator.net: 9100
09:15:11 D Login : Looking for Login Zone Updates...
09:15:11 D Login : Looking for Login Appearance Updates...
09:30:12 D Login : Looking for Login Appearance Updates...
.
.
.
4:30:18 D Login : Looking for Login Appearance Updates...
14:45:18 D Login : Looking for Login Appearance Updates...
15:00:18 D Login : Looking for Login Appearance Updates...
15:15:12 D Login : Looking for Login Zone Updates...
15:15:18 D Login : Looking for Login Appearance Updates...
15:30:18 D Login : Looking for Login Appearance Updates...
15:45:18 D Login : Looking for Login Appearance Updates...
16:00:19 D Login : Looking for Login Appearance Updates...
16:15:19 D Login : Looking for Login Appearance Updates...

Program received signal SIGTERM, Terminated.
0xb7fe2430 in __kernel_vsyscall ()
(gdb) bt
#0 0xb7fe2430 in __kernel_vsyscall ()
Cannot access memory at address 0xbffff62c
(gdb)
And the "bt" shows nothing. Far as I can tell, this is bad. Maybe my linux server is dying.

Re: Got Signal 11 (Linux)

Posted: Sun Dec 15, 2013 7:19 pm
by Jabantiz
From MSDN
SIGTERM

Termination request
The signal function enables a process to choose one of several ways to handle an interrupt signal from the operating system.
So the operating system sent a termination request, no clue how this could happen with our code but I have never used Linux.

Re: Got Signal 11 (Linux)

Posted: Mon Dec 16, 2013 7:24 am
by John Adams
You keep repeating the same thing, what SIGTERM is. While that is a genuine unknown for us all, the initial request was to ask if it's possible to force an exit from within the same place we keep catching the signal.

Code: Select all

    void CatchSignal(int sig_num) {
       // In rare cases this can be called after the log system is shut down causing a deadlock or crash
       // when the world shuts down, if this happens again comment out the LogWrite and uncomment the printf
       if(last_signal != sig_num)
          LogWrite(WORLD__WARNING, 0, "World", "Got signal %i", sig_num);
          //printf("Got signal %i", sig_num);
       else {
          LogWrite(WORLD__WARNING, 0, "World", "Got signal %i", sig_num);
          Sleep(10);
       }
       last_signal = sig_num;
       RunLoops = false;
    }
It's clear we're inside this function to print the spam, so why can't i just put an exit(0) here or something? Is that not how to quit a C++ prog? Or, are you saying that because it's already quitting in a bad way, we cannot control it?

My goal is to exit completely so the persist_server script can restart things.

Re: Got Signal 11 (Linux)

Posted: Mon Dec 16, 2013 1:03 pm
by Jabantiz
std::exit(error_code) but the main reason why I would not want to do that is it cannot release memory allocated at runtime.

I am also not repeating myself the first issue was SIGSEGV, which gave a clue to what the issue is, the second is SIGTERM, which I have no clue how the server got this.

If you want to use exit go ahead, just be aware of possible issue by using it.

Re: Got Signal 11 (Linux)

Posted: Tue Dec 17, 2013 9:23 pm
by thefoof
I'm going to setup my own linux box and see if I can get linux a little more stable, just an FYI

Re: Got Signal 11 (Linux)

Posted: Tue Dec 17, 2013 9:52 pm
by John Adams
Jabantiz wrote:I am also not repeating myself the first issue was SIGSEGV, which gave a clue to what the issue is, the second is SIGTERM, which I have no clue how the server got this.
Meh, SigSegv, SigTerm, blah... it's all chinese to me.

I guess I am still unsure how exiting a binary completely could cause memory issues, unless of course you mean it will jack up the actual machine's memory? We don't want that.

Foof, it's not that linux isn't Stable, it's that changes are made and not tested on Linux - ever - except for my server, which has been crashing pretty consistently (probably the years-old /reload issues we've never identified). If you can get a box running, awesome. Steal any of my scripts you want to use, imo they make running a linux server pretty easy.

If you're wimpy, there's always Gnome ~laughs and points~

Re: Got Signal 11 (Linux)

Posted: Tue Dec 17, 2013 10:21 pm
by thefoof
John Adams wrote:If you're wimpy, there's always Gnome ~laughs and points~
I am indeed wimpy and just using a desktop ubuntu hah, but it's just for testing anyway :mrgreen:, and yeah I just wanna try and catch some linux crashes and see what's going on, and also so I have my own box to test compiles on :P.

I think it's working though, world is compiling now