"Raw Data Ethics" battle ;)
Posted: Thu Sep 08, 2011 7:30 am
I am waging a war within myself over the semantics of "Raw Data", maybe you can help.
Currently, PacketParser will identify a spawn based on various unique properties. Name, class, race, gender, and more recently, individual appearances (eye color, hair type, etc). This makes for a LOT of raw spawns that previously did not exist. Ie., back in 2008 before LE added Appearances comparisons for me, we would parse maybe 30,000 spawns. Now, because each eye shape tweak is considered unique, we will parse over 60,000 spawns - for example.
What I have done with Parser currently was modify it so while it is detecting the uniqueness of a spawn, instead of inserting a NEW spawn for an eye shape tweak, it flags the spawns "randomize eye_type" signed values and saves that to the raw_spawn_info record instead... thus cutting down tremendously on the number of Raw Spawns to -populate.
However, in looking over the new dataset, while it looks exciting and clean, I am now struggling with the original premise of "raw data" in that now, there will never be a source of TRULY raw spawned info to refer to.
So my internal debate, that i bring to you to help me solve (quickly) is, do I really give a shit? This will cut down on Hours, Weeks, maybe even Months of effort to consolidate spawns in our database. Running -populate will put spawns into the dev database already merged, randomized, and ready to pop without any additional effort. Should any data be questionable, we would merely need to re-parse the log that spawn came from using an older parser (or I will add -params to allow it with the current).
I think you all have helped me with my dilemma immensely! Thank you! hahaha... no seriously, if you have any opinions, shout em out before I get too much further into this.
Thanks
Currently, PacketParser will identify a spawn based on various unique properties. Name, class, race, gender, and more recently, individual appearances (eye color, hair type, etc). This makes for a LOT of raw spawns that previously did not exist. Ie., back in 2008 before LE added Appearances comparisons for me, we would parse maybe 30,000 spawns. Now, because each eye shape tweak is considered unique, we will parse over 60,000 spawns - for example.
What I have done with Parser currently was modify it so while it is detecting the uniqueness of a spawn, instead of inserting a NEW spawn for an eye shape tweak, it flags the spawns "randomize eye_type" signed values and saves that to the raw_spawn_info record instead... thus cutting down tremendously on the number of Raw Spawns to -populate.
However, in looking over the new dataset, while it looks exciting and clean, I am now struggling with the original premise of "raw data" in that now, there will never be a source of TRULY raw spawned info to refer to.
So my internal debate, that i bring to you to help me solve (quickly) is, do I really give a shit? This will cut down on Hours, Weeks, maybe even Months of effort to consolidate spawns in our database. Running -populate will put spawns into the dev database already merged, randomized, and ready to pop without any additional effort. Should any data be questionable, we would merely need to re-parse the log that spawn came from using an older parser (or I will add -params to allow it with the current).
I think you all have helped me with my dilemma immensely! Thank you! hahaha... no seriously, if you have any opinions, shout em out before I get too much further into this.
Thanks