Got Signal 11 (Linux)

Old bugs stored here for reference.
Locked
User avatar
John Adams
Retired
Posts: 9684
Joined: Thu Jul 26, 2007 6:27 am
EQ2Emu Server: EQ2Emulator Test Center
Characters: John
Location: Arizona
Contact:

Got Signal 11 (Linux)

Post by John Adams » Tue Dec 03, 2013 10:17 am

Jab, need your advice. I need this block of code to terminate the executable, since currently the RunLoops = false is not actually doing any good.

Code: Select all

void CatchSignal(int sig_num) {
	// In rare cases this can be called after the log system is shut down causing a deadlock or crash
	// when the world shuts down, if this happens again comment out the LogWrite and uncomment the printf
	if(last_signal != sig_num)
		LogWrite(WORLD__WARNING, 0, "World", "Got signal %i", sig_num);
		//printf("Got signal %i", sig_num);
	else {
		LogWrite(WORLD__WARNING, 0, "World", "Got signal %i", sig_num);
		Sleep(10);
	}
	last_signal = sig_num;
	RunLoops = false;
}
Can I just put an exit(0) here or something, to force the world to exit so my persist script will restart it? For some reason after the last few commits, Linux server crashes daily, then fills the drive with Got Signal 11 crash logs.

For now, I will put EQ2 DB Project into Debug (gdb) and try and see what's crashing, because it appears to be every time someone logs in and does something.

Jabantiz
Lead Developer
Posts: 2912
Joined: Wed Jul 25, 2007 2:52 pm
Location: California

Re: Got Signal 11 (Linux)

Post by Jabantiz » Tue Dec 03, 2013 1:19 pm

I will try looking into this later tonight, do you know around what revision this started acting up?

User avatar
John Adams
Retired
Posts: 9684
Joined: Thu Jul 26, 2007 6:27 am
EQ2Emu Server: EQ2Emulator Test Center
Characters: John
Location: Arizona
Contact:

Re: Got Signal 11 (Linux)

Post by John Adams » Tue Dec 03, 2013 3:22 pm

Foof might know better, since it's him it usually crashes on... and because he cannot access SSH, it just spins away all night :) I would say at least a week ago or so. It could also be related to the new zones being built ?? more /reloading going on than usual.

Thoughts, foofski?

User avatar
thefoof
Retired
Posts: 630
Joined: Wed Nov 07, 2012 7:36 pm
Location: Florida

Re: Got Signal 11 (Linux)

Post by thefoof » Tue Dec 03, 2013 3:48 pm

John Adams wrote:Foof might know better, since it's him it usually crashes on... and because he cannot access SSH, it just spins away all night :) I would say at least a week ago or so. It could also be related to the new zones being built ?? more /reloading going on than usual.

Thoughts, foofski?
The last freeze was on a reload for sure so that could be it, but it's pretty rare I was doing reloads all the night before with no issues

Jabantiz
Lead Developer
Posts: 2912
Joined: Wed Jul 25, 2007 2:52 pm
Location: California

Re: Got Signal 11 (Linux)

Post by Jabantiz » Tue Dec 03, 2013 3:51 pm

If it is signal 11 then it is memory issues

Code: Select all

#define SIGSEGV         11      /* segment violation */
MSDN wrote: SIGSEGV
Illegal storage access. The default action terminates the calling program.
Will go over recent commits looking for what could cause this, if it is happening during a specific /reload that would help narrow my search.

Jabantiz
Lead Developer
Posts: 2912
Joined: Wed Jul 25, 2007 2:52 pm
Location: California

Re: Got Signal 11 (Linux)

Post by Jabantiz » Tue Dec 03, 2013 5:28 pm

I had a signal 11 crash when I typed exit into the console, no clue if it is the same signal 11 reported here but I committed code to Dev SVN to fix it.

User avatar
John Adams
Retired
Posts: 9684
Joined: Thu Jul 26, 2007 6:27 am
EQ2Emu Server: EQ2Emulator Test Center
Characters: John
Location: Arizona
Contact:

Re: Got Signal 11 (Linux)

Post by John Adams » Tue Dec 03, 2013 5:57 pm

Linux doesn't have a console for commands, only Windows. These happen to me when I am /reloading usually.

Dev is in debug right now, so if it crashes again, it's down til I check it out.

Jabantiz
Lead Developer
Posts: 2912
Joined: Wed Jul 25, 2007 2:52 pm
Location: California

Re: Got Signal 11 (Linux)

Post by Jabantiz » Tue Dec 03, 2013 6:01 pm

I should clarify, the signal 11 was a result of the server shutting down, I normally just hit the close button so usually bypassed this, actually typing exit to shut the server down properly resulted in this. Not directly related to the console commands, I just found it using console commands.

User avatar
John Adams
Retired
Posts: 9684
Joined: Thu Jul 26, 2007 6:27 am
EQ2Emu Server: EQ2Emulator Test Center
Characters: John
Location: Arizona
Contact:

Re: Got Signal 11 (Linux)

Post by John Adams » Sun Dec 15, 2013 5:26 pm

I am no Linux expert, but this doesn't make any sense to me. The linux server, sitting in debug, zero connections or and not even an attempt to connect yet since I started it, and it just crashed. Sitting there.
09:15:11 D Thread : Starting console command thread...
09:15:11 I Console : Type 'help' or '?' and press enter for menu options.
09:15:11 D Thread : Starting autoinit loginserver thread...
09:15:11 I World : Connected to LoginServer: eq2emulator.net: 9100
09:15:11 D Login : Looking for Login Zone Updates...
09:15:11 D Login : Looking for Login Appearance Updates...
09:30:12 D Login : Looking for Login Appearance Updates...
.
.
.
4:30:18 D Login : Looking for Login Appearance Updates...
14:45:18 D Login : Looking for Login Appearance Updates...
15:00:18 D Login : Looking for Login Appearance Updates...
15:15:12 D Login : Looking for Login Zone Updates...
15:15:18 D Login : Looking for Login Appearance Updates...
15:30:18 D Login : Looking for Login Appearance Updates...
15:45:18 D Login : Looking for Login Appearance Updates...
16:00:19 D Login : Looking for Login Appearance Updates...
16:15:19 D Login : Looking for Login Appearance Updates...

Program received signal SIGTERM, Terminated.
0xb7fe2430 in __kernel_vsyscall ()
(gdb) bt
#0 0xb7fe2430 in __kernel_vsyscall ()
Cannot access memory at address 0xbffff62c
(gdb)
And the "bt" shows nothing. Far as I can tell, this is bad. Maybe my linux server is dying.

Jabantiz
Lead Developer
Posts: 2912
Joined: Wed Jul 25, 2007 2:52 pm
Location: California

Re: Got Signal 11 (Linux)

Post by Jabantiz » Sun Dec 15, 2013 7:19 pm

From MSDN
SIGTERM

Termination request
The signal function enables a process to choose one of several ways to handle an interrupt signal from the operating system.
So the operating system sent a termination request, no clue how this could happen with our code but I have never used Linux.

User avatar
John Adams
Retired
Posts: 9684
Joined: Thu Jul 26, 2007 6:27 am
EQ2Emu Server: EQ2Emulator Test Center
Characters: John
Location: Arizona
Contact:

Re: Got Signal 11 (Linux)

Post by John Adams » Mon Dec 16, 2013 7:24 am

You keep repeating the same thing, what SIGTERM is. While that is a genuine unknown for us all, the initial request was to ask if it's possible to force an exit from within the same place we keep catching the signal.

Code: Select all

    void CatchSignal(int sig_num) {
       // In rare cases this can be called after the log system is shut down causing a deadlock or crash
       // when the world shuts down, if this happens again comment out the LogWrite and uncomment the printf
       if(last_signal != sig_num)
          LogWrite(WORLD__WARNING, 0, "World", "Got signal %i", sig_num);
          //printf("Got signal %i", sig_num);
       else {
          LogWrite(WORLD__WARNING, 0, "World", "Got signal %i", sig_num);
          Sleep(10);
       }
       last_signal = sig_num;
       RunLoops = false;
    }
It's clear we're inside this function to print the spam, so why can't i just put an exit(0) here or something? Is that not how to quit a C++ prog? Or, are you saying that because it's already quitting in a bad way, we cannot control it?

My goal is to exit completely so the persist_server script can restart things.

Jabantiz
Lead Developer
Posts: 2912
Joined: Wed Jul 25, 2007 2:52 pm
Location: California

Re: Got Signal 11 (Linux)

Post by Jabantiz » Mon Dec 16, 2013 1:03 pm

std::exit(error_code) but the main reason why I would not want to do that is it cannot release memory allocated at runtime.

I am also not repeating myself the first issue was SIGSEGV, which gave a clue to what the issue is, the second is SIGTERM, which I have no clue how the server got this.

If you want to use exit go ahead, just be aware of possible issue by using it.

User avatar
thefoof
Retired
Posts: 630
Joined: Wed Nov 07, 2012 7:36 pm
Location: Florida

Re: Got Signal 11 (Linux)

Post by thefoof » Tue Dec 17, 2013 9:23 pm

I'm going to setup my own linux box and see if I can get linux a little more stable, just an FYI

User avatar
John Adams
Retired
Posts: 9684
Joined: Thu Jul 26, 2007 6:27 am
EQ2Emu Server: EQ2Emulator Test Center
Characters: John
Location: Arizona
Contact:

Re: Got Signal 11 (Linux)

Post by John Adams » Tue Dec 17, 2013 9:52 pm

Jabantiz wrote:I am also not repeating myself the first issue was SIGSEGV, which gave a clue to what the issue is, the second is SIGTERM, which I have no clue how the server got this.
Meh, SigSegv, SigTerm, blah... it's all chinese to me.

I guess I am still unsure how exiting a binary completely could cause memory issues, unless of course you mean it will jack up the actual machine's memory? We don't want that.

Foof, it's not that linux isn't Stable, it's that changes are made and not tested on Linux - ever - except for my server, which has been crashing pretty consistently (probably the years-old /reload issues we've never identified). If you can get a box running, awesome. Steal any of my scripts you want to use, imo they make running a linux server pretty easy.

If you're wimpy, there's always Gnome ~laughs and points~

User avatar
thefoof
Retired
Posts: 630
Joined: Wed Nov 07, 2012 7:36 pm
Location: Florida

Re: Got Signal 11 (Linux)

Post by thefoof » Tue Dec 17, 2013 10:21 pm

John Adams wrote:If you're wimpy, there's always Gnome ~laughs and points~
I am indeed wimpy and just using a desktop ubuntu hah, but it's just for testing anyway :mrgreen:, and yeah I just wanna try and catch some linux crashes and see what's going on, and also so I have my own box to test compiles on :P.

I think it's working though, world is compiling now

Locked

Who is online

Users browsing this forum: No registered users and 0 guests