Combat Revamp Discussion

EQ2Emulator Development forum.

Moderator: Team Members

Jabantiz
Lead Developer
Posts: 2912
Joined: Wed Jul 25, 2007 2:52 pm
Location: California

Re: Combat Revamp Discussion

Post by Jabantiz » Fri Aug 03, 2012 4:41 pm

This post is mainly me putting ideas out for myself, if you agree, disagree, or have any opinon at all let me know, trying to squash this damn bug.

When we get a zone deadlock the client can run around the zone no problem but any commands the client tries don't get processed, clients in other zones that send commands that can effect the first client (chat mainly haven't thought of trying others until now like /invite) the first client recieves these msgs. This leads me to believe that Client::HandlePacket() is never called, that is only ever called from Client::Process() wich is called from the zoneservers main thread. If Client::Process fails on the zone thread the client should get disconnected or if an exception is thrown we should get the "Exception caught when in ZoneServer::ClientProcess()" message (wich we don't).

I think the issue is here some where, not sure what though. I am still unable to duplicate the issue even with the db from server pack 1.3, putting logging in this function will be spammy as hell also, we could try moving it to its own thread but not sure that would have much of a benefit.

User avatar
John Adams
Retired
Posts: 9684
Joined: Thu Jul 26, 2007 6:27 am
EQ2Emu Server: EQ2Emulator Test Center
Characters: John
Location: Arizona
Contact:

Re: Combat Revamp Discussion

Post by John Adams » Fri Aug 03, 2012 5:40 pm

Jabantiz wrote:putting logging in this function will be spammy as hell also, we could try moving it to its own thread but not sure that would have much of a benefit.
When I first added TRACE logging, I had it at every function enter/exit for a few modules. Due to the *Process() functions constantly spinning, that got to be a nightmare in about 5 seconds.

Logs were writing so fast, they were totally overwriting the middle of each other... which is why I asked Scatman to queue the log output. He said he would, but hasn't gotten to it yet.

Second suggestion, put in your spammy logging... leave them DISABLED by default, and use log_config.xml to turn them on to write ONLY to the file (Logs = 1) and you might get something out of it.

Eventually I will put all TRACE logging back in but not until I have queued logging.

Jabantiz
Lead Developer
Posts: 2912
Joined: Wed Jul 25, 2007 2:52 pm
Location: California

Re: Combat Revamp Discussion

Post by Jabantiz » Fri Aug 03, 2012 9:11 pm

By moving ZoneServer::ClientProcess() to its own thread it seems to have stopped the zone from freezing/deadlocking. I have been on EQ2TC for 3 hours now doing random crap in the same zone and it is all still running good. The issues I noticed while fighting with /invul 1 seem to be gone as well.

Edit: 6 hours and every thing is still working, hopefully this fixed the issue.

User avatar
John Adams
Retired
Posts: 9684
Joined: Thu Jul 26, 2007 6:27 am
EQ2Emu Server: EQ2Emulator Test Center
Characters: John
Location: Arizona
Contact:

Re: Combat Revamp Discussion

Post by John Adams » Sun Aug 05, 2012 1:37 pm

Sitting in EQ2TC now, barely 20 mins, 6 players, 1 in Ruins, 5 in QC, and QC is locked up. Ruins is still functioning.

My setup is:
5 players in QC
2 of them are in a group together (Guardian and Templar)
3 more just standing around QC
1 remaining player (Ranger) is in Ruins

All I did was rebuild the Templar's hotbar with spells she has, then tried casting Aegolism on myself, to no affect (E Spell: SpellProcess::ProcessSpell Unable to find any spell targets for spell 'Aegolism'.). In trying to research why this spell wasn't landing on even myself, I was changing some spell parameters (target_type) and did 1 /reload spells (only 1). I could not get Aegolism to land on self, so I added Guardian to Templar's group. Now, the group has 2 members (only).

Note: changing spells info was moot, since the spell data is on Dev, and I am on EQ2TC - oops!

I cast Aegolism again, this time target_type = Group AE, but still got the same message. No targets. And that's where I went to code, and left chars standing in zone.

I was walking through GetSpellTargets() commenting code so I understood what each convoluted "if" does, I thought I'd move my toons to Ruins to stand with the orcs fighting. This is when I noticed /zone did nothing on all my toons (except Ranger, who is in Ruins). /who all no longer works for QC chars, but does for Ruins char. Zero feedback from World that anything is wrong.

Only 1 client was doing anything, btw. The Templar, casting spells. Everyone else was just standing around. Hope this info helps. I will attempt to reproduce this same deadlock.


Edit: I was compiled in Release x64 btw.

User avatar
John Adams
Retired
Posts: 9684
Joined: Thu Jul 26, 2007 6:27 am
EQ2Emu Server: EQ2Emulator Test Center
Characters: John
Location: Arizona
Contact:

Re: Combat Revamp Discussion

Post by John Adams » Sun Aug 05, 2012 2:34 pm

And of course, now that I am in Debug_x64 mode, I cannot reproduce the issue. Sigh.

User avatar
John Adams
Retired
Posts: 9684
Joined: Thu Jul 26, 2007 6:27 am
EQ2Emu Server: EQ2Emulator Test Center
Characters: John
Location: Arizona
Contact:

Re: Combat Revamp Discussion

Post by John Adams » Fri Aug 10, 2012 11:22 am

Okay, let's talk stability. I am doing back-flips how much better Combat functions now. Seems everything related to combat timers and loops seems to be 100x better than it was a month ago, so I think whatever was done was definitely a plus in that respect. The reason I am a BIG FAN of removing Mutex stuff for vector<> or whatever, is because a year+ ago, Scatman already suggested we do this, we just never had time to... so I think Jabantiz is on the right track.

However, the zone exceptions and crashing World takes some of the "shine" off these positive efforts. I just need an assessment on a) are these new crashes even related to Combat at all (we changed a few things)? and b) whether it is possible to regain stability, quickly or c) put Mutex back in and fix it hoping we understand it better now.

Thoughts?
John Adams
EQ2Emulator - Project Ghost
"Everything should work now, except the stuff that doesn't" ~Xinux

Jabantiz
Lead Developer
Posts: 2912
Joined: Wed Jul 25, 2007 2:52 pm
Location: California

Re: Combat Revamp Discussion

Post by Jabantiz » Fri Aug 10, 2012 2:43 pm

The current zone exception that has me stumped is related to ld code, and that is the only one I know of, there is the deadlock issue (what you posted above) but that was around before and doesn't happen nearly as often now but seems random and haven't been able to track it down. What other zone exceptions are happening that I have forgot or am not aware of?

User avatar
John Adams
Retired
Posts: 9684
Joined: Thu Jul 26, 2007 6:27 am
EQ2Emu Server: EQ2Emulator Test Center
Characters: John
Location: Arizona
Contact:

Re: Combat Revamp Discussion

Post by John Adams » Fri Aug 10, 2012 3:24 pm

Well, because the code does not offer any suggestions about what is threw an error on, I cannot say. I know LD sometimes has issues (inconsistently), and when I /zone or /camp and get a crash, all I know is that is what I did to make it happen. I've had issues with multiple clients doing things at once, to each other or to NPCs, kinda all over the map. Mostly they have been discussed. Unless we take out the Try/Catch and attempt to trace the crashes, I don't see how we'll ever know what is really causing it.

I can try this on EQ2TC, but the player population has all but withered away. Up to us to crash it, I suppose.

User avatar
John Adams
Retired
Posts: 9684
Joined: Thu Jul 26, 2007 6:27 am
EQ2Emu Server: EQ2Emulator Test Center
Characters: John
Location: Arizona
Contact:

Re: Combat Revamp Discussion

Post by John Adams » Sat Aug 18, 2012 4:18 pm

Chasing down Alfa's lag report, I can definitely see what he's talking about. Here's the process CPU utilization with my character just standing in Timorous Deep, with most logging disabled -
cpu.jpg
I cannot say if this is the new changes to Combat, or something else that creeped in as we're adding things. Since Jabantiz has run out of time to spend on this right now, I will take a stab at it. First, I'm going to revert to before Jab did any work, just to see if the MutexList/Map code he removed behaved the same (I'm almost sure it did). If so, then it's definitely not the new combat system changes, and I'll have to rely on Scatman to help me troubleshoot.

This is pretty much a show-stopper for us. There is no sense adding 1 more feature to this mess we have already until these problems are worked out. Hopefully, soon. I'd like to fix this and release it as 0.7.1 and start our next cycle.


Edit: Hah, well I guess that answers that question. Here's the process CPU utilization running the exact same test in "Release" (non debug) mode -
cpu-release.jpg
I knew DEBUGging was slower, but man, I didn't think it was THAT horribly slow. Unfortunately, I cannot debug crashes in Release, so you're going to have to suffer with Lag for now ;)
You do not have the required permissions to view the files attached to this post.
John Adams
EQ2Emulator - Project Ghost
"Everything should work now, except the stuff that doesn't" ~Xinux

User avatar
John Adams
Retired
Posts: 9684
Joined: Thu Jul 26, 2007 6:27 am
EQ2Emu Server: EQ2Emulator Test Center
Characters: John
Location: Arizona
Contact:

Re: Combat Revamp Discussion

Post by John Adams » Sat Aug 25, 2012 9:17 am

Well here's something interesting. Another crash, CombatProcess(),

Stack:

Code: Select all

>	EQ2World__Debug.exe!ZoneServer::CombatProcess()  Line 884 + 0x18 bytes	C++
 	EQ2World__Debug.exe!CombatLoop(void * tmp)  Line 4309	C++
 	EQ2World__Debug.exe!_callthreadstart()  Line 259 + 0xf bytes	C
 	EQ2World__Debug.exe!_threadstart(void * ptd)  Line 243	C
 	kernel32.dll!77e6482f() 	
 	[Frames below may be incorrect and/or missing, no symbols loaded for kernel32.dll]	
Code:

Code: Select all

bool ZoneServer::CombatProcess() {
	bool ret = true;
	MutexList<Spawn*>::iterator itr = spawn_list.begin();
	while (itr.Next()) {
		if (itr->value->IsEntity())  <===
			if (!combat->Process((Entity*)itr->value)) {
				ret = false;
				break;
			}
	}
	return ret;
}
The interesting info is in the last few lines of the World log. Note the # of quests? Might be a bad printf value, and the fact the client going LD is "Null", which is likely our crash this time.
10:49:32 D LUA: Quest: Running Off the Grobin Scouts, function: QuestComplete
10:49:32 D LUA: Done!
10:49:32 D Client: Send Quest Journal...
10:49:32 D Client: Send Quest Journal...
10:49:41 E Command: Error in COMMAND_ACCEPT_REWARD. No pending quest or collection reward was found (unknown=0).
10:50:00 D Client: Found 903850984 pending quests for char_id: 135
10:50:00 D LUA: Quest: Grobin Trouble at the Pond, function: Accepted
10:50:00 D LUA: Done!
10:50:00 D Client: Send Quest Journal...
10:50:01 D Client: Found 903850984 active quests for char_id: 178
10:50:25 D LUA: Found LUA Spell Script: 'Spells/Scout/Tracking.lua'
10:51:44 D Zone: Client is disconnecting in ZoneServer::ClientProcess (camping = false)
10:51:44 D Zone: Sending login equipment appearance updates...
10:51:44 D Zone: Calling clients.Remove(client)...
10:51:44 D Zone: Removing client 'Null' (178) due to LD/Exit...
10:51:44 I Zone: Scheduling client 'Null' for removal.

10:51:44 D Player: Toggling Character OFFLINE!
10:51:44 D CClient: Client Disconnect...
10:51:44 D Zone: Starting zone shutdown timers...
10:51:58 D World: Removing connection...

Soon as my house guests leave (Mon) I will start my attempt to root out this problem by reverting combat system or LD code changes.

Jabantiz
Lead Developer
Posts: 2912
Joined: Wed Jul 25, 2007 2:52 pm
Location: California

Re: Combat Revamp Discussion

Post by Jabantiz » Wed Aug 29, 2012 5:55 pm

I had an idea that was easy enough for me to implement that I could do it and do basic tests with my extremely limited time right now. From the tests that I did it worked a lot better then I had hoped, my tests were very limited though. What I did was moved client process back to the main zone thread (forgot the reason why I moved it to its own and it seemed to just cause new problems) and merged the combat thread with the spawn thread.

This also may have fixed the ld zone crash, I was not able to reproduce that crash. Right now ZoneServer has 2 threads running all the time, the main zone thread and the spawn thread, the spawn thread may be able to be improved though as currently it calls several function that loop through the entire spawn list, would probably be better to just loop through the spawn list once and do every thing that is needed instead of looping through it 5+ times. I would do this myself but I don't have the time to go over all the functions in ZoneServer::SpawnProcess()

I am commiting this code now but I would really appreciate it if some one can do more extensive tests on it.

User avatar
John Adams
Retired
Posts: 9684
Joined: Thu Jul 26, 2007 6:27 am
EQ2Emu Server: EQ2Emulator Test Center
Characters: John
Location: Arizona
Contact:

Re: Combat Revamp Discussion

Post by John Adams » Fri Aug 31, 2012 8:19 am

Good work. Sounds positive. I will put this on EQ2TC right now, let it run and we'll see the results. You can ignore your PM if you think this resolves it.

Jabantiz
Lead Developer
Posts: 2912
Joined: Wed Jul 25, 2007 2:52 pm
Location: California

Re: Combat Revamp Discussion

Post by Jabantiz » Fri Aug 31, 2012 7:27 pm

I did some tests on EQ2TC, mostly ld related but I did do some combat. I did get the duplicate packet message but only when I logged on another char into a diffrent zone (logged into ruins other char was sitting in ant). All my ld attempts, single client and multi client, have not caused a zone exception. As far as combat goes, there was some lag but nothing to terribly bad. I did notice spell icon shading was reversed for some reason though (shaded when ready to cast unshaded when not ready) didn't notice that on my server but will check in a bit. Over all I think this last change was a step in the right direction.

Jabantiz
Lead Developer
Posts: 2912
Joined: Wed Jul 25, 2007 2:52 pm
Location: California

Re: Combat Revamp Discussion

Post by Jabantiz » Sat Sep 01, 2012 5:05 pm

Ignore the previous post, EQ2TC wasn't running the latest code when I tested.

My new test was 3 clients, 2 sitting in antonica, 1 in queens colony. I did combat with the one in queens colony for a while befor I LD'd one of the clients in antonica, then continue killing stuff while I waitied for the LD timer to expire, there was no zone crash. I continued to kill stuff for a little while then I LD'd both clients and waited, again no zone crashes. Combat was smooth, no lag what so ever however the icon shading was wacky at best, don't have that issue on my server but my server runs in debug, will test it in release in a little bit. There was also no Duplicate packet messages this time.

Overall this may be the best solution so far, it will still require some testing though.

EDIT: Getting strange behavior with shaded icons on a 64bit release compile, not as bad as on EQ2TC though. Didn't notice this in debug and my compiler crashes when I try to do 32 bit release. This could either be an issue with combat (spells not getting the right status) or with my passive spells changes. Ran out of time today to track it down though.
Last edited by Jabantiz on Sat Sep 01, 2012 6:24 pm, edited 1 time in total.
Reason: results of local test

User avatar
John Adams
Retired
Posts: 9684
Joined: Thu Jul 26, 2007 6:27 am
EQ2Emu Server: EQ2Emulator Test Center
Characters: John
Location: Arizona
Contact:

Re: Combat Revamp Discussion

Post by John Adams » Sun Sep 02, 2012 12:13 am

EQ2TC should have had the latest code as if 8.31.2012 (note the date on the Debug compile). I run EQ2TC in Debug, else I cannot trap the crashes. I see it is now in Release, which - as far as I know - will not show the Dupe Packet or Future Packet stuff, pretty sure that is only if the code is Debug, not Release. At least that's how I've seen it function the last 5 years. I could be wrong.

I will put Debug back up on EQ2TC, and it is an x86 machine as well. Dev is x64, but Linux. Not sure if the chipset should have anything to do with odd spell shaders, could just be a coincidence. Soon as a few other players get online, we can see if the zone exceptions happen again. Btw, there was no "crash" really, just the zone exceptions - and the zone never shut down.

I shouldn't say "never", because you engineers will cling to that ;) I do mean, most of the exceptions I've seen lately have simply been Exception caught in so and so zone. When I post a call stack, that is a world crash - which has been more frequent lately than in the last 2 years. But, we'll get it figured out. Reverting the code didn't seem to help one bit... so I am completely confused.

Post Reply

Who is online

Users browsing this forum: No registered users and 0 guests