ubotuCurrent time in Etc/UTC: November 17 2007, 10:21:13 - Next meeting: Desktop Team Development in 5 days10:21
persiaOops.  Forgot to update the fridge.  Next meeting: Stacktracing (in 30 minutes).10:30
=== jenda_ is now known as jenda
pochupersia: ping, you ready? :)10:56
persiapochu: Yep.  The fridge didn't get updated :(  Starting in ~200 seconds...10:56
pochuyeah, although LaserJock said he updated it...10:57
ubotuCurrent time in Etc/UTC: November 17 2007, 10:58:27 - Next meeting: Desktop Team Development in 5 days10:58
persiaOK.  Welcome the session on reading stacktraces.11:00
persiaThe value of reading stacktraces is that one can more easily track down the crash bugs, and find out where in the code the crash happens, and why.11:02
chantrahi there11:02
persiaHmm.  Maybe I should wait a couple more minutes.11:03
geserHi persia11:03
persiahi geser11:03
persiaWelcome to the session on reading stacktraces (for real, this time) :)11:05
persiaThe value of reading stacktraces is that one can more easily track down the crash bugs, and find out where in the code the crash happens, and why.11:05
persiaUbuntu ships with an automatic crash reporting system called apport, which files lots and lots of bugs on launchpad, and attaches stack traces showing where the crash happened.11:05
persiaThe goal of today's session is for everyone to be able to read these stacktraces, and identify the type of problem that is happening.11:05
persiaA stacktrace consists of a listing of the stack at the time of the crash.  Each frame indicates one layer deeper towards the call that crashes, so the top of a stack trace is where the problem occurs, and the bottom of a stacktrace is where the thread starts.11:07
persiaFor examples, we'll be reviewing stack traces in lincity-ng, sunclock, and vlc.  If you're planning on following closely, you might want to download the source for those packages now.11:08
geserpersia: gutsy or hardy?11:09
persiaAs we go through the bugs, please feel free to interrupt and ask questions at any time.  Further, I'll be looking to all of you to help figure out the issue with each frame, and help move forward.11:09
persiageser: Generally I find that the latest source still exposes the problem.  The specific reports are feisty for lincity-ng and sunclock, and gutsy for vlc.11:10
persiaTo expand upon that, it can be easier to understand a specific issue when looking at the source that matches the reported error, but it is often more interesting to look at the current source, as if the issue is to be fixed, the current source is where the fix is required.11:12
persiaOur first bug is #89554 in lincity-ng (https://bugs.launchpad.net/ubuntu/+source/lincity-ng/+bug/89554).11:13
persiaThis is a fairly standard apport report, with a normal level of detail as to how to reproduce "I was just playing the game and it crashed.".  We'll be looking at the stacktrace to turn this information into something useful, and try to provide hints towards a possible solution.11:15
persiaThe first place we'll look is the "Stacktrace.txt (retraced)" in the third comment.11:16
persiaEach section starts with a frame number (e.g. #0, #1, #2, etc.) and lists the function called, the location of that function in the source, and some of the variable values.11:17
persiaSo, looking quickly at the function names, we see "do_power_line" at #0, "do_time_step" at #1, "execute_timestep" at #2, and "Game::run" at #3.  Would anyone like to guess what the program was doing when it crashed?11:18
txwikingersomething with a powerline:)11:19
persiatxwikinger: Yep.  The package description for lincity describes it as a City simulator game.11:20
persiaAny guesses for what the game is doing in a more general sense?11:20
Kmospersia: designing the main window ?11:21
geserit was computing the next simulation step when it crashed11:21
Kmosbefore go into the main()11:21
persiaKmos: I'm not sure how we'd guess that from the function names.11:21
persiageser: That sounds about right.11:21
Kmospersia: maybe from the variables and at #7 it says Start11:22
persiaMy thinking would be that the game is running (#3), and some time is passing (#2), so it does something (#1), and that something includes powerlines (#0).11:22
persiaKmos: Remember that a stacktrace is backwards.  #0 is the place where it crashed, and #7 is when the process started.11:22
Kmospersia: ah ok thx11:23
persiaWhen looking at stack traces, I usually like to start a little further back than #0, to develop a better understanding of what is happening.  Is anyone still waiting for a source download?11:23
persiaOK.  Let's start with #2: execute_timestep.  The goal is not to understand perfectly, but only to get some idea of the naming conventions, code conventions, and variables used in the program.  Please open src/lincity-ng/MainLincity.cpp in the lincity-ng source, and move to line 66.11:25
persiaYou'll see that do_time_step (actually on line 68 for current code) is called with no arguments, and doesn't appear to be in any conditional structures.11:27
persiaFurther, the general function runs every 10 seconds or so, does the update, and redraws the map.  We don't need a higher level of detail here: we're just building context.11:28
persiaDoes anyone have any questions about execute_timestamp?11:28
persiaErr..  execute_timestep11:28
persiaOK.  So the next function on our list is do_time_step from #1, in src/lincity/simulate.cpp.11:30
persiaDoes everyone see that?11:31
RainCTyes, but stacktrace says line153 but do_time_step  goes from 58 to 8711:31
persiaRainCT: Right.  This is a case of the code changing since the stack trace was taken.  Any guess on which line we really want to look at for the current source?11:32
RainCTuh, there's indeed a do_power_line call in 157 but it's in a different function, simulate_mappoints11:34
txwikingerI have the do_power_line call in lin 153 in teh function simulate_mappoints which is called in line 8211:35
persiaSo, 79 & 80 are good guesses, because that mentions "power", but 157 is correct, because simulate_mappoints is called in line 82.11:35
persiatxwikinger: Perfect (and faster than I)11:36
persiaSo, we'll now try to figure how what is happening by reviewing both do_time_step and simulate_mappoints.11:36
robdi1so did the optimizer get rid of the call at 82? otherwise it should be on the stack also?11:37
geserlincity-ng 1.0.3 and 1.1.1 differ only in line numbers11:37
persiarobdi1: That's likely: I don't actually understand optimizers enough to be sure.11:38
persiaSo, do_time_step initialises some things, and calls simulate_mappoints.11:38
txwikingerI think the function simulate_mappoints is only called this one time, so the optimizer just moves the code in there11:38
geserrobdi1: probably as simulate_mappoints is only called on line 8211:38
persiasimulate_mappoints reviews each of the map points, and calls a huge switch statement, but doesn't appear to do much processing.  As a result, we won't learn a lot here, but we do want to understand the nature of the arguments to do_power_line.  Would anyone like to guess as to what the arguments should be?11:40
txwikingerthe position on the map11:40
persiatxwikinger: Sounds sane.  How about the data type?11:40
persiaRainCT: Right.  You can see that from lines 116 and 118 in the current source.11:41
persiaSo, back to the stack trace, we see that x=1 and y=41 in #0, so we have a pretty good idea what do_power_line is supposed to do.  Let's look at that next.11:42
RainCTis the :22 the line where the function starts or where the app crashed?11:43
persiaRainCT: That's where the app crashed.11:43
pochuis -> a pointer? (me doesn't know c++)11:44
geserpochu: the part before -> is the pointer (here to a struct)11:45
persiapochu: I think it's a reference to a struct, but I'm not entirely sure.11:45
geserthe struct is described in src/lincity/power.h11:46
persiaIn this case, we're not sure what it is, but we'll want to look at the type.  You can see that it checks "grid", so we'll look through the include files to find it (or listen to geser, who is fast) :)11:47
geserpersia: as we are after a segfault, should we look instead on what MP_INFO does?11:50
persiaSo, the struct defines a "powered", which is a "short", and so should be allowed so that's not the problem.  next is to figure out what the MP_INFO call is returning.11:50
persiageser: Yes.  I just need a faster keyboard :)11:51
robdi1power.h:19 shows grid is of type Grid which is typedef'd to the grid_struct on 1511:51
persiaSo, again, we'll look through the headers to find what MP_INFO means.  Anyone want to suggest a file and line number?11:51
persiarobdi1: Exactly.11:51
gesersrc/lincity/engglobs.h, line 2911:52
chantrawhich define the struct map_struct11:54
persiaDoes everyone see that now?  The relevant structure is defined starting on line 14.  Any guesses on why it might segfault when called with x=1, y=41?11:55
chantrawith Map_Point_Info info[WORLD_SIDE_LEN][WORLD_SIDE_LEN] where WORLD_SIDE_LEN is 100 (lin-city.h line 227)11:56
persiachantra: Good find: it's probably not an out-of-bounds issue.  Anyone else?11:56
chantrawell, long time since I program c, but it sounds line it should point to it instead of using . ?11:56
albert23because it expects to get shorts but is called with int?11:56
persiaalbert23: That might be it for larger values, but 41 should be safe to convert to short.11:57
geseralbert23: where does it expects short? [...][...] is a two-dimensional array11:59
chantrapersia: ma.info returns a Map_Point_Info11:59
persiaSo, at this point, we can say "lincity-ng may crash when testing if certain map elements are powered".  I think we'd have to read a lot more of the source to solve the problem, but we now have a somewhat more informative report than "I was just playing the game and it crashed.".  Would anyone like to volunteer to update the bug?11:59
* pochu can11:59
pochuAlthough I'm not sure I've followed the final steps11:59
chantrais Map_Point_Info an int ....11:59
txwikingercool.. we haven't even played it :D12:00
pochuwoops, his terminal crashes, let's debug it! :)12:00
persiatxwikinger: Right.  This isn't the "reproduce the bug & fix it" session, only the stacktrace session :)12:00
* RainCT got lost here «Map_Point_Info info[WORLD_SIDE_LEN][WORLD_SIDE_LEN];»12:01
geserpersia: it would be nice to have to core file to lookup that value of MAP_INFO(x,y)12:01
persiaRainCT: That's OK.  It takes a deeper understanding of the code to fix it, the goal here is to understand the bug well enough to document it.12:01
persiageser: Yes.  talk to pitti: core files are deleted to save attachment space.12:02
* tman__ typed the wrong ctrl-alt .... page-down is too close to backspace :D12:02
=== tman__ is now known as chantra
persiaSo, does anyone have any questions about the process of finding out what happened from the stacktrace before we look at the next one?12:02
txwikingerSo, what info would you now put into the bug report?12:03
geserRainCT: MAP_INFO(x,y) is map.info[x][y] (line 29), map is a Map (line 24) and Map.info is in line 2012:03
persiatxwikinger: You could report that the problem is with updating the power grid, and maybe a comment saying that it seems related to MAP_INFO.  The idea is to provide enough information that someone familiar with the code might recognise it, rather than it just being another apport bug where the user reported "it crashed"12:05
geserRainCT: and info is a two-dimensional array of Map_Point_Info (described somewhere else (../lin-city.h))12:05
persiatxwikinger: If you're planning to code to fix it, then of course, you want to know more, but it's a good first step to identify the type of problem.12:05
txwikingertrue.. so the summary should also be more specific12:06
persiatxwikinger: Right.12:07
txwikingerok.. I get it now :)12:07
persiaSo, that's about it for the first bug.  I've a couple more, for those who have more time (it takes about an hour for each, in my experience).  The next bug I selected was #147252 (https://bugs.launchpad.net/ubuntu/+source/vlc/+bug/147252)12:08
RainCTgeser: ah ok, thanks. was not sure if info was a multidim or something strange :P12:08
persiaWould anyone like to make a guess at a good place to start looking at this trace (there are 80 frames)?12:10
robdi1which apport  comment are you looking at...there are 312:10
RainCTStacktrace.txt (retraced) (76.6 KiB, text/plain), or?12:11
persiarobdi1: I usually start with "Stacktrace.txt (retraced)".  ThreadStacktrace.txt is usually only interesting when there is a thread collision, and StacktraceSource.txt is too confusing for me: I'd rather look at the actual source.12:12
robdi1persia: thanks12:12
chantrapersia:  #0 and up we see something is wrong in filechooser12:12
chantrasomewhere around 312:13
persiachantra: Right, but the bug is reported against vlc.  Let's back up a bit more, and find something vlc related.  It may be that the bug is reported against the wrong package, but maybe it's a vlc issue.12:13
chantraat least, we could avoid looking at internal gsignal in the first place12:14
persiachantra: That's the other reason we'll go look for vlc :)12:14
chantra:D , last reference to vlc seems to be around #35 then12:15
geser#84 is vlc related everything else is wx or gtk/glib12:15
chantra#35 0xb581302c in wxvlc::OpenDialog::OnFileBrowse () from /usr/lib/vlc/gui/libwxwidgets_plugin.so12:16
persia#84 is the last call to vlc directly, but #35 uses libvlc, which also interests us.12:16
chantrawhen the dialog is open12:16
persiaOn the other hand, 35 frames is a long way from the problem.  Let's walk through a few, and see if we can figure out what is happening: this is a good candidate for not being a vlc bug.12:17
persiaSo, #34 through #32 are the WX calls to wrap the GTK.12:18
persiaGoing back further, we get to #22, where we're emitting a gtk signal, and can pretty safely say this is a deeper problem.12:19
geserin frame #1 one sees that path is NULL (iirc strcmp doesn't like NULL pointers) I could check if really NULL is passed to strcmp and check where the NULL comes from12:20
persiageser: We could, but we're looking at feisty VLC, and feisty gtk.  In this case, I'd suggest we look for a closed bug in gtk that has a matching stacktrace for the first few frames.12:21
geserpersia: DistroRelease: Ubuntu 7.10 (from the bug)12:22
persiageser: Right.  My mistake.  Thanks.12:23
geserand the package version is from gutsy (but not the final one)12:23
robdi1and since it was entered on 9/30 was this the released version or a pre-release version?12:24
persiarobdi1: It would have been a prerelease, but it may not have been the version distributed on 9/30 (depending on when the user updated).12:24
persiaSo, as I don't want to dig deeply in GTK, and I'm not seeing a lot of questions, I'll move forward.  I'd like to ask someone to volunteer to update the bug to point to the right package, and follow the GTK crash bug title model, and note that it crashes in "shortcut_find_position".12:30
persiaThe next bug I selected was #109743 (https://bugs.launchpad.net/ubuntu/+source/sunclock/+bug/109743).  Anyone want to suggest a starting point here?12:32
robdi1I'll give it a shot. Where would i find info on the GTL crash bug title model?12:32
robdi1GTK...my keyboard is still sleepy12:32
persiarobdi1: I generally guess based on the list of all bugs for the package.  In this case https://bugs.launchpad.net/ubuntu/+source/gtk+2.0/+bugs12:32
persiarobdi1: More expicitly, it's not likely to actually be a problem with strcmp(), but rather something in the file chooser, and it should be looked at by the gtk+2.0 maintainers, rather than the vlc maintainers.12:34
albert23persia: for sunclock I would start looking at the null pointer, Context = (struct Sundata *) 0x012:38
persiaalbert23: That sounds like a good place.  Let's step back a bit and look at showZoomHint first, to get some context to understand why we don't have "Context".12:39
robdi1don't have to go back, in draw_button(), when variable is initialized, it is set to NULL, and then used as paramter function getWinParams()12:41
robdi1it checks for null after calling getWinParams, so appears to be an output parameter12:42
persiarobdi1: Good description.  Any ideas what getWinParams is doing?12:43
persia(or rather, is intended to do)?12:46
txwikingerwell..nicely documented source :)12:46
robdi1setting Context to point to function ZoomCaller?12:46
persiarobdi1: Or more generally, setting Context to point to the appropriate function for the update, but yes.12:47
persiaNow the interesting bit is that the crash is reported on line 576, where for a zoom, we'd expect to see somewhere in line 579-585.12:49
persiaLooking at the package history, we can see that this is the same file as when it crashed, so we're not just confused from version skew.12:49
geserhow does it pass the if check with win=0?12:50
persiageser: That's an interesting question.  I'd suggest we'd like to find information on what "Option" and "Zoom" are.12:51
RainCTextern Window Root, Menu, Filesel, Zoom, Option, Urban;       in line 4112:52
robdi1this line looks suspicious to me 576:    *y = OptionGeom.height-2*OptionCaller->gdata->menustrip - 1; do they want (OptionGeom.height-2)*OptionCaller which is a pointer or do they want12:52
persiaRainCT: Good find.  So, can a "Window" evaluate to "0"?12:53
robdi1the latter is what they get with C precidence12:53
geserdid feisty already had a gcc with stackprotector?12:55
txwikingerwhy is the variable Zoom in line 1496 not set?12:55
persiaI suspect the latter makes more sense: (The height of the options geometry) - (twice the width of the menustrip) - 1.12:55
txwikingerthis is what populates into win12:55
RainCTOption = 0;   line 209212:56
persiatxwikinger: It's defined globally: see line 41.12:56
geserin #4 win was set to Zoom =77594695, in #1 Zoom is 012:58
albert23Window is defined in sunclock.c, line 34412:58
persiageser: Right.  If we take Zoom to be a pointer to something, it apparently got lost somewhere along the way.12:59
geserif the memory got corrupted than the strack trace may also be false13:00
geserpersia: do you know if feisty's gcc had alread stack protector enabled (the last time sunclock got compiled)?13:01
persiageser: Right.  The best we can do here is walk the path from #4 to #1 and see if we can figure out why it got lost.13:01
persiageser: I don't.13:01
robdi1in #2, hint is a char*, but doesn't look like it's pointing to a string13:04
persiaSo, another mystery to add is that line 1492 of widgets.c sets move_pos to -1, and line 1496 (#2) is only supposed to execute if (move_pos>=0).13:05
geserrobdi1: it wasn't initialized yet (widgets.c:1493)13:06
geserpersia: static int move_pos13:06
robdi1yup, just saw that it gets strcpy'd in 150613:06
persiaSo, in cases like this, where the stack trace seems to make some sense, but the code doesn't, I usually just set the stacktrace aside, and look for another bug (as I've nothing to add).  Based on you all avidly hunting the problem, I suspect that the format of the stacktrace isn't really the issue anymore :)13:11
txwikingerthanks  a lot persia... this was very interesting13:21
persiatxwikinger: I'm happy to share: I think stacktraces shouldn't be mysterious, and we've a lot that could be useful bugs with a little looking.13:22
txwikingerindeed. Now I will even look at them :)13:23
persiatxwikinger: Don't get discouraged if you run into another sunclock.  Those are hard, and confusing.  Some of them are either just misplaced (like the vlc one), or fairly easy to trace (like the lincity-ng one).13:24
txwikingerWell... without looking you don't know what it is :)13:27
robdi1persia: if you're still around, could you kindly see if I updated bug 147252 correctly? It is the first time I've updated a bug...13:46
persiarobdi1: Generally I recommend changing the "Package" for the current task, rather than adding a new task, unless you know there are two things that need doing.13:47
robdi1persia: thanks. what would I need to do to make that happen, take VLC off?13:47
persiarobdi1: Other than that, it looks fine.  Thanks for updating it.13:47
robdi1persia: no problem. also, thanks for doing this...it was worth getting up early for...been a long time since I walked stack traces...it was fun.13:48
persiarobdi1: I don't think there is a way to delete a task once it exists.  To change the associated package for a task, click on one of the upside-down eject symbols, and you can select an alternate package.13:48
robdi1persia: ahh. thanks.13:49
=== mrigns_ is now known as mrigns

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!