Page 1 of 2

Linux server crashing w/spawns

Posted: Tue Sep 09, 2008 9:27 pm
by John Adams
I am looking into why my linux server (TessEQ2Dev) is now abruptly crashing whenever I try and zone in since dev build 425. If I back up to 423, the server runs fine. This is using Fedora Core 8 (linux).
The backtrace reads:

Code: Select all

#0  0x00110416 in __kernel_vsyscall ()
#1  0x00b84690 in raise () from /lib/libc.so.6
#2  0x00b85f91 in abort () from /lib/libc.so.6
#3  0x00bbc9eb in __libc_message () from /lib/libc.so.6
#4  0x00bc4ac1 in _int_free () from /lib/libc.so.6
#5  0x00bc80f0 in free () from /lib/libc.so.6
#6  0x05d4f6f1 in operator delete () from /usr/lib/libstdc++.so.6
#7  0x080e8f16 in std::_Deque_base<EQProtocolPacket*, std::allocator<EQProtocolPacket*> >::_M_destroy_nodes (this=0xb6465278, __nstart=0x9d95224,
    __nfinish=0x9d95228)
    at /usr/lib/gcc/i386-redhat-linux/4.1.2/../../../../include/c++/4.1.2/ext/new_allocator.h:94
#8  0x080e9024 in ~_Deque_base (this=0xb6465278)
    at /usr/lib/gcc/i386-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/stl_deque.h:445
#9  0x080e81db in EQStream::Write (this=0x9da3e18, eq_fd=7)
    at /usr/lib/gcc/i386-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/stl_deque.h:725
#10 0x081292c7 in EQStreamFactory::WriterLoop (this=0x81d4ae0)
    at ../common/EQStreamFactory.cpp:374
#11 0x0812940f in EQStreamFactoryWriterLoop (eqfs=0x0)
    at ../common/EQStreamFactory.cpp:59
#12 0x00ced50b in start_thread () from /lib/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
#13 0x00c2eb2e in clone () from /lib/libc.so.6
the big error blast:

Code: Select all

39414 New client from ip: 192.168.1.1 port: 2327
*** glibc detected *** ./eq2world: free(): invalid next size (normal): 0x13e556c8 ***
======= Backtrace: =========
/lib/libc.so.6[0xbc4ac1]
/lib/libc.so.6(cfree+0x90)[0xbc80f0]
/usr/lib/libstdc++.so.6(_ZdlPv+0x21)[0x5d4f6f1]
./eq2world(_ZNSt11_Deque_baseIP16EQProtocolPacketSaIS1_EE16_M_destroy_nodesEPPS1_S5_+0x1e)[0x80e8f16]
./eq2world(_ZNSt11_Deque_baseIP16EQProtocolPacketSaIS1_EED2Ev+0x28)[0x80e9024]
./eq2world(_ZN8EQStream5WriteEi+0x5b3)[0x80e81db]
./eq2world(_ZN15EQStreamFactory10WriterLoopEv+0x2ab)[0x81292c7]
./eq2world(_Z25EQStreamFactoryWriterLoopPv+0x15)[0x812940f]
/lib/libpthread.so.0[0xced50b]
/lib/libc.so.6(clone+0x5e)[0xc2eb2e]
======= Memory map: ========
00110000-00111000 r-xp 00110000 00:00 0          [vdso]
00111000-0011b000 r-xp 00000000 fd:00 2091116    /lib/libnss_files-2.7.so
0011b000-0011c000 r--p 00009000 fd:00 2091116    /lib/libnss_files-2.7.so
0011c000-0011d000 rw-p 0000a000 fd:00 2091116    /lib/libnss_files-2.7.so
0011e000-0023f000 r-xp 00000000 fd:00 4085446    /usr/lib/mysql/libmysqlclient.so.15.0.0
0023f000-00281000 rw-p 00120000 fd:00 4085446    /usr/lib/mysql/libmysqlclient.so.15.0.0
00281000-00282000 rw-p 00281000 00:00 0
00282000-002ab000 r-xp 00000000 fd:00 3574656    /usr/lib/liblua-5.1.so
002ab000-002ac000 rw-p 00028000 fd:00 3574656    /usr/lib/liblua-5.1.so
002ac000-002c5000 r-xp 00000000 fd:00 2091351    /lib/libselinux.so.1
002c5000-002c7000 rw-p 00018000 fd:00 2091351    /lib/libselinux.so.1
002cb000-002e0000 r-xp 00000000 fd:00 2091205    /lib/libnsl-2.7.so
002e0000-002e1000 r--p 00014000 fd:00 2091205    /lib/libnsl-2.7.so
002e1000-002e2000 rw-p 00015000 fd:00 2091205    /lib/libnsl-2.7.so
002e2000-002e4000 rw-p 002e2000 00:00 0
002e4000-002e8000 r-xp 00000000 fd:00 2091114    /lib/libnss_dns-2.7.so
002e8000-002e9000 r--p 00003000 fd:00 2091114    /lib/libnss_dns-2.7.so
002e9000-002ea000 rw-p 00004000 fd:00 2091114    /lib/libnss_dns-2.7.so
00754000-00764000 r-xp 00000000 fd:00 2091203    /lib/libresolv-2.7.so
00764000-00765000 r--p 00010000 fd:00 2091203    /lib/libresolv-2.7.so
00765000-00766000 rw-p 00011000 fd:00 2091203    /lib/libresolv-2.7.so
00766000-00768000 rw-p 00766000 00:00 0
0081d000-0081f000 r-xp 00000000 fd:00 2091366    /lib/libcom_err.so.2.1
0081f000-00820000 rw-p 00001000 fd:00 2091366    /lib/libcom_err.so.2.1
0082e000-00836000 r-xp 00000000 fd:00 3574658    /usr/lib/libkrb5support.so.0.1
00836000-00837000 rw-p 00007000 fd:00 3574658    /usr/lib/libkrb5support.so.0.1
0083f000-00841000 r-xp 00000000 fd:00 2091362    /lib/libkeyutils-1.2.so
00841000-00842000 rw-p 00001000 fd:00 2091362    /lib/libkeyutils-1.2.so
0087c000-00999000 r-xp 00000000 fd:00 2091369    /lib/libcrypto.so.0.9.8b
00999000-009ab000 rw-p 0011d000 fd:00 2091369    /lib/libcrypto.so.0.9.8b
009ab000-009af000 rw-p 009ab000 00:00 0
009b1000-009bc000 r-xp 00000000 fd:00 2091338    /lib/libgcc_s-4.1.2-20070925.so.1
009bc000-009bd000 rw-p 0000a000 fd:00 2091338    /lib/libgcc_s-4.1.2-20070925.so.1
009bf000-009e4000 r-xp 00000000 fd:00 3574660    /usr/lib/libk5crypto.so.3.1
009e4000-009e5000 rw-p 00025000 fd:00 3574660    /usr/lib/libk5crypto.so.3.1
009e7000-00a28000 r-xp 00000000 fd:00 2091370    /lib/libssl.so.0.9.8b
00a28000-00a2c000 rw-p 00040000 fd:00 2091370    /lib/libssl.so.0.9.8b
00a2e000-00abe000 r-xp 00000000 fd:00 3574662    /usr/lib/libkrb5.so.3.3
00abe000-00ac1000 rw-p 0008f000 fd:00 3574662    /usr/lib/libkrb5.so.3.3
00ac3000-00af0000 r-xp 00000000 fd:00 3574667    /usr/lib/libgssapi_krb5.so.2.2
00af0000-00af1000 rw-p 0002d000 fd:00 3574667    /usr/lib/libgssapi_krb5.so.2.2
00b3c000-00b57000 r-xp 00000000 fd:00 2091142    /lib/ld-2.7.so
00b57000-00b58000 r--p 0001a000 fd:00 2091142    /lib/ld-2.7.so
00b58000-00b59000 rw-p 0001b000 fd:00 2091142    /lib/ld-2.7.so
00b5b000-00cae000 r-xp 00000000 fd:00 2091144    /lib/libc-2.7.so
00cae000-00cb0000 r--p 00153000 fd:00 2091144    /lib/libc-2.7.so
00cb0000-00cb1000 rw-p 00155000 fd:00 2091144    /lib/libc-2.7.so
00cb1000-00cb4000 rw-p 00cb1000 00:00 0
00cb6000-00cb9000 r-xp Aborted

Posted: Sat Sep 13, 2008 6:00 pm
by John Adams
On this, with a 0 spawn database, the client gets stuck at various stages trying to enter the world. In the linux console, I see the zone start up, but that's it. I have checked the zone name vs current_zone and they seem to be ok.
I will add a spawn to this DB to see if the current DEV build still crashes.

Posted: Sat Sep 13, 2008 6:02 pm
by LethalEncounter
I'll compile and test it on Linux and see if I can reproduce this.

Posted: Sat Sep 13, 2008 6:03 pm
by John Adams
Here's what's happening. I think the LS, or my local client cache? is sending "castleview" as my current_zone, even if I FIX it by typing "Castleview" and log in.
"Loading new Zone 'castleview'"
So we're back to that thing heh. Is LS feeding me my last known zone? If it is, maybe it shouldn't?

Posted: Sat Sep 13, 2008 6:11 pm
by John Adams
Thanks for looking into it LE. I hadn't tried Linux in many moons... didn't mean to create more work :)
Scratch my brilliant findings. While there is something odd and wrong about my current_zone being updated mysteriously, it is not the cause of the failure to connect. Even a brand new toon gets locked up "Receiving zone info".
Thanks again

Posted: Sat Sep 13, 2008 6:13 pm
by LethalEncounter
When is it supposed to crash? I just logged into the world just fine. And no, the login server doesn't have anything to do with the zone that you are in on the server. World updates Login to let it know which zone the character is in, but it is one way. The Login server never sends the information back to World.

Posted: Sat Sep 13, 2008 6:26 pm
by John Adams
Not to be contradictory, but how is my characters "current_zone" field getting updated to a value that does not even exist in the zones table? Unless, was the previous fix to the zones.name field changes to force the string to lower-case and store it from world, then to all comparisons in lower case?
If that's the case, then this is broken on a case-sensitive OS like Linux, because "castleview" is not the same as "Castleview". But I am grasping at straws.
I just deleted my entire dev DB (empty) and downloaded all structures new again, then created a new ratonga for Freeport. I attempt to log in, and get halted at "Receiving zone info". If this comes down to something I mis-configured, I quit. You'll never hear from me again. :D :D :D
Also, when in this state, I CTRL-C to shutdown the world on Linux and it halts (does not unload) after "Shutting down zone <whatever i was trying to access". I have to kill -9 eq2world

Posted: Sat Sep 13, 2008 6:36 pm
by LethalEncounter
Yes, the zone name is saved as lowercase to the database and it is not case sensitive when it is comparing the zone names in the code, so why does it matter? Could you unlock your dev server so I can try it? I'm not sure what is up with it, but it definitely should be able to close if you control-c it.

Posted: Sat Sep 13, 2008 6:41 pm
by John Adams
Sorry, that's what I was doing was unlocking it when I came back to cry about not being able to CTRL-C.
And you are right, if the current_zone is forced to lower, and the zones.name is forced to lower before compare, it does not matter. :)
Still, something changed my "Castleview" to "castleview" before my eyes. But, we'll debate that after you see what may be going on.
This is Fedora Core 8, x86, if it matters.

Posted: Sat Sep 13, 2008 7:03 pm
by LethalEncounter
OK, does the server do anything after the client disconnects? The server seems to send a few things and then stops sending anything. I'm looking at your db now and testing some things out. I'll get back to you hopefully soon.

Posted: Sat Sep 13, 2008 7:12 pm
by John Adams
The server does appear to continue to allow new connections, ie., new character creations or zone load attempts - but they all fail at the same point (from my perspective). Once a client disconnects, within a minute or so you see the Removing Connection.
But you cannot stop (CTRL-C) the server "Got signal 2", just stays at Shutting Down Zone "Neriak" (in your case). I have to kill the process to stop the server.
I just updated to dev 430 and trying again.
Nice, 430 is letting me in further. Now I am halted at Loading UI resources.

Posted: Sat Sep 13, 2008 7:13 pm
by LethalEncounter
Could you grab the latest from SVN and run it? I added a couple of debug statements as well as a *possible* fix.

Posted: Sat Sep 13, 2008 7:20 pm
by John Adams
431 is up. I saw one of your attempts in, got passed a bunch of debug, then you froze up. Second attempt, immediately froze up it looks like.
Here's what I am seeing:

Code: Select all

AddAuth: 4 Key: 1221358979
9966 New client from ip: xxx.xxx.xxx.xxx port: 3958
ZoneAuth: Access Key, 1221358979, Character Name, Lethal, Account ID, xxx, TimeStamp, 1221358979
Loading new Zone 'neriak'
Loaded 0 NPC(s), 0 Object(s), 0 Widget(s), 0 Sign(s).
Sending MOTD
Sending Macros
Loading Factions
Loading Quests
Sending Journal
Sending Faction Update
Sending Command List
Sending Language List
Sending Zone Info
OP_ChatFiltersMsg Received 0x013d
   0: 0D 00 FF FF FF FF FF 01 - 00 00 00 80 FF FF 0B 29  | ...............)
OP_ChatFiltersMsg Received 0x013d
   0: 0D 00 FF FF FF FF FF FF - FF FF FF FF FF FF 0B 19  | ................
OP_ChatFiltersMsg Received 0x013d
   0: 0D 00 FF FF FF FF FF FF - FF FF FF FF FF FF 0B 49  | ...............I
OP_Unknown Received 0x0026
   0: 01 00 00 00 FA 43                                  | .....C
Removing connection
AddAuth: 4 Key: 1221359062
92243 New client from ip: xxx.xxx.xxx.xxx port: 3962
ZoneAuth: Access Key, 1221359062, Character Name, Lethal, Account ID, xxx, TimeStamp, 1221359062
Removing connection

Posted: Sat Sep 13, 2008 7:22 pm
by LethalEncounter
Could you PM me the info in the console window?

Posted: Sat Sep 13, 2008 7:25 pm
by John Adams
Heh, stealth edit :)