=== doko_ is now known as doko === psivaa is now known as psivaa-brb === psivaa-brb is now known as psivaa === greyback is now known as greyback|away === Pici` is now known as Pici === wickedpuppy2 is now known as wickedpuppy === greyback|away is now known as greyback [15:03] * barry waves [15:03] * sithlord48 waves [15:03] yo === bhuey_ is now known as bhuey [15:04] meeting? [15:04] o/ [15:05] looks like slangasek nor cjwatson are around =) [15:05] o/ [15:05] we could all just skive off..... =) [15:05] cjwatson is out this afternoon. [15:05] damnit, I prepped my list for once! [15:05] partaay y'all! [15:06] slangasek: was up really late last night [15:07] I'm starting anyways [15:07] #startmeeting [15:07] Meeting started Thu Jul 31 15:07:43 2014 UTC. The chair is xnox. Information about MeetBot at http://wiki.ubuntu.com/meetingology. [15:07] Available commands: action commands idea info link nick [15:07] don't know what the procedure is... [15:07] #topic Lightning round === meetingology changed the topic of #ubuntu-meeting to: Lightning round [15:07] go ? [15:08] Last week [15:08] -change direction and abandoned the security update to 7u55-2.4.7 [15:08] -merged debian package 7u65-2.5.1 to ubuntu utopic, merged differences into this package [15:08] -backported it to 12.04, 14.04 and utopic, this went well except for 12.04 [15:08] This week [15:08] -posted test differences between various openjdk versions for regression analysis [15:08] -struggling to deal with 'quotes' in a call to 'configure' in the build/ directory and wondering why the build system is reverting Matthias's patches [15:08] -continued to debug the above [15:08] -get familiar with debian version 3 packaging [15:08] done [15:08] ... [15:08] $ shuf -e jodh robru sil2100 xnox barry bmurray doko stgraber [15:08] xnox [15:08] barry [15:08] jodh [15:08] robru [15:08] sil2100 [15:08] bmurray [15:08] doko [15:08] stgraber [15:08] not as fancy. [15:08] .. [15:09] whoops [15:09] * usb-creator - formating fixes uploaded [15:09] * mega-transition helping out tie up loose ends [15:09] * llvm 3.5 - fixed powerpc build (committed upstream) and fixed [15:09] ppc64el packaging (accepted in debian). Will switch default to 3.5 [15:09] after alpha2 freeze block is lifted. That will make temporary add [15:09] 3.5 & 3.4 in main, until next mesa update. [15:09] * jodh wonders who 'not as fancy' is :) [15:09] * uploaded ubuntukylin-meta seating in NEW, but proper further seed [15:09] setup is still needed. [15:09] * exit interview, and HRy things. [15:09] * working on porting more lazr.* things to python3 [15:09] * sent rebased apt-ftpmaster generate contents only patch to mvo on [15:09] BTS. (To enable for it to be used, will need further patch to cronscripts) [15:09] TODO: [15:09] * complete more systemd/startpar tasks [15:09] * complete lazr.*/launchpadlib ports [15:09] .. [15:09] barry: =) [15:09] jodh: oh just a comment. as in it's not a one liner as it usually pasted around here. [15:09] xnox: yeah, I'm just being silly. [15:09] phone: system-image 2.3.1. smoke test crash investigation (LP: #1349478). discussing jani's branches. other stuff. [15:09] Launchpad bug 1349478 in Ubuntu system image "/usr/sbin/system-image-dbus:sqlite3.OperationalError:_check_for_update:emit_signal:UpdateAvailableStatus:__init__:__enter__:_cursor" [High,In progress] https://launchpad.net/bugs/1349478 [15:09] debuntu: zope stack (all squared away now!). gunicorn py3 package support (debian bug #756057). testing pyvenv/virtualenv from trusty PPA. [15:09] Debian bug 756057 in gunicorn "gunicorn: Support Python 3" [Normal,Open] http://bugs.debian.org/756057 [15:09] jodh: and it didn't suffly much, did it?! [15:10] other: dealing w/various hardware issues. other discussions on various other topics. [15:10] .. [15:10] * upstart [15:10] jodh: =) [15:10] *** upstart 1.13.1-0ubuntu2 upload to fix cgmanager tests. [15:10] * system-image [15:10] * Spent most of the week learning about how system image builds [15:10] (ongoing :) [15:10] * Working with mvo, cjwatson and stgraber on setting up new images. [15:10] Γ [15:11] barry: gunicorn python3 \o/ [15:11] the all prepared robru is next =) [15:11] xnox: yep, have github pull req pending for debian maintainer. should hopefully land soonish [15:11] * landings, landings, landings, landings! [15:11] * fixed up qtcompositor landing in ubuntu-system-settings [15:11] * learned all about Jasmine unittesting framework for JS -- very nice! [15:11] * wrote vast amounts of unit test code for both CI Train Silo Dashboard and NFSS Web UI [15:11] * attended GUADEC -- saw many talks and took lots of notes [15:11] * Several CI Vanguard shifts [15:11] * stopped some bits of CI Train from hard-coding spreadsheet column indexes, allowing greater [15:11] flexibility to change the spreadsheet in the future (in preparation for RTM) [15:11] * vast simplification of CI Train spreadsheet, dropping all stupid "silo tabs" and streamlinin [15:11] g the testing:done? setting into the pending tab, which will make the spreadsheet much easier [15:11] to navigate when RTM doubles the number of silos we have. [15:11] * various minor pre-RTM preparatory fixes & changes in CI Train [15:11] * 99 commits against the CI Train silo dashboard page. yikes! (many many many small tweaks and [15:11] iterations, and increases in test coverage). [15:12] awesome! [15:12] robru: sil2100: has ci-train been tested / dry-run against rtm on dogfood? [15:12] xnox: doing it all the time ;) [15:12] sil2100: awesome. [15:13] sil2100: you are next! =) [15:13] * slangasek waves belatedly [15:13] xnox: just got blocked by some firewall blockings, but now I'm unblocked again, webops helped out [15:13] xnox, sil2100 has taken charge of the ci train core. I'm mostly working on the periphery, like the dashboard and queuebot. [15:13] #chair slangasek [15:13] Current chairs: slangasek xnox [15:13] o/ [15:13] - Landing team work, landing e-mails, landing coordination - standard stuff [15:13] - Pushing on promotion and TRAINCON-0 issues [15:13] - CI Train maintenance and features: [15:13] * Decoupling prod from preprod, now for testing preprod custom branches can be used [15:13] * Small enablements related to the modified spreadsheet [15:13] * Continuing work on enabling other distributions (ubuntu-rtm) [15:13] - More firewall holes needed in our prodstack instance [15:13] - More complex silo configuration handling [15:13] - Many many corner cases where ubuntu was still considered instead of selected distro [15:13] * Test landing of indicator-location to ubuntu-rtm (in progress) [15:13] * Prepared branch with 'retry failed jobs' [15:13] * Add additional twin upload projects [15:13] * No time to finish work on auto-merge-clean [15:13] - Work on defining TRAINCON-0 formal rules [15:14] - Packaging advice for some upstream developers [15:14] - Documenting the TRAINCON-0 incident [15:14] - Playing around with some hardware [15:14] done [15:14] this covers about 2 weeks for me since I was on holiday last thursday [15:14] updated errors for assets.ubuntu.com to r486 [15:14] testing of errors frontend change to filter on pkg_arch [15:14] submitted RT to have errors frontends updated and errors_static_url modified [15:14] setup logrotation for the daisy, errors frontends and pushed to the daisy and errors charms [15:14] updated daisy retracer to keep core dumps when retracing and save crash files in certain situations [15:14] submitted rt to have retracers updated to r504 [15:14] updated RT 72977 regarding errors log rotation [15:14] submitted RT 73492 regarding updating the daisy frontends to r498 [15:14] updated daisy bucket code to pass architecture to bucketversionscount [15:15] push bzr changes for daisy to increment arch counters [15:15] created some armhf retrace success failures graphs in graphite [15:15] submitted RT to add more retracers for errors [15:15] pinged a webop to run import-user-packages cron job (they said it worked but still nothing in the ColumnFamily) [15:15] manually ran import_user_packages to the temp DSE ring database [15:15] research into recoverable problem bucket grouping upstart and url-dispatcher issues [15:15] discovered and reported apport bug 1349579 [15:15] bug 1349579 in apport (Ubuntu) "whoopsie-upload-all uses an incorrect assumption regarding what to upload" [Undecided,Fix released] https://launchpad.net/bugs/1349579 [15:15] submitted merge proposal fixing apport bug 1329520 [15:15] bug 1329520 in apport (Ubuntu) "whoopsie-upload-all crashes while processing crash file" [High,Fix released] https://launchpad.net/bugs/1329520 [15:15] investigation into SystemImageInfo not appearing in apport .crash files on the phone [15:15] uploaded new version of apport to utopic which will properly gather SystemImageInfo [15:15] uploaded new version of whoopsie to upotic that will send SystemImageInfo to errors [15:15] tested whoopsie bug 1320988 regarding online / offline connectivity [15:15] bug 1320988 in whoopsie (Ubuntu) "whoopsie did not become on-line after connecting to wifi" [High,Confirmed] https://launchpad.net/bugs/1320988 [15:15] investigation into whoopsie bug 1340604 [15:15] bug 1340604 in whoopsie (Ubuntu) "[phone] crash files are only uploaded on boot when not running in the foreground" [Undecided,New] https://launchpad.net/bugs/1340604 [15:15] reported apport bug 1347009 regarding retraced crashes missing stacktrace [15:15] bug 1347009 in Daisy "apport-retrace occassionally creates a retraced report without a stacktrace" [Low,Triaged] https://launchpad.net/bugs/1347009 [15:15] research into duplicates for ubuntu-release-upgrader bug 1347721 [15:15] bug 1347721 in apt (Ubuntu Trusty) "Saucy -> Trusty upgrade failed: procps fails to configure" [High,Triaged] https://launchpad.net/bugs/1347721 [15:15] ✔ done [15:15] - GCC default set to 4.9 [15:15] - update of cross toolchains [15:15] - openjdk-7 mentoring and fixes [15:15] - clean up component mismatches, dep-waits, ftbfs in main, three days of nagging and fixing [15:15] - packaging review of some third party software [15:15] - updated tightvnc, updated tigervnc (at least it built for arm64 and ppc64el) [15:16] - twisted transition (ubuntu-sso-client still unfixed) [15:16] - arm64 toolchain discussion [15:16] (done) [15:17] Was on vacation last week. [15:17] E-mail and IRC catchup on Monday mostly. [15:17] Helped with some code reviews, discussions, ... wrt ubuntu core system-image. [15:17] Got an initial system-image published for ubuntu-core. [15:17] Discussed partitioning plan for new touch devices. [15:17] Fixed some LXC CI issues. [15:17] Recorded a video on running GUI apps inside LXC: https://www.youtube.com/watch?v=QYsj9LEqxXk [15:17] Poked some more at running Unity8 inside LXC. [15:17] Some more LXC-related discussions (conference planning, ...) [15:17] Discussed some of our NetworkManager patches. [15:17] Fixed a couple of configuration issues with the ISO tracker related to alpha-2, 14.04.1 and 12.04.5. [15:17] (DONE) [15:18] slangasek: your turn =) and take over chairing =) [15:18] hmm :) [15:20] * working with bhuey on prepping openjdk security update [15:20] * discussions around the 'init' package and systemd-sysv to unblock new images using systemd from the start [15:20] * tracking crash retracing success on the phones; working with bdmurray et al. to get any blocking issues fixed in advance of RTM [15:20] * performance reviews [15:20] * helped move forward some HWE SRUs related to a server engagement [15:20] * helped with getting out of TRAINCON-0 on Monday [15:20] * next week: joining a cloud team sprint (as is Colin), so expect limited availability [15:20] * the week after: on vacation [15:21] bdmurray: on the errors.u.c side, when do you expect we'll have the per-image view available? [15:22] slangasek: what was the final answer for the counter? [15:22] bdmurray: ah, in scrollback, let's iron that out after the meeting? [15:23] slangasek: okay, it shouldn't take too long to add [15:23] any other questions over status? [15:24] [TOPIC] Upstart cgroup support === meetingology changed the topic of #ubuntu-meeting to: Upstart cgroup support [15:24] forgot to do an in-depth topic for last week's meeting... remembered this week :) [15:25] Thanks Steve. Everyone sitting comfortably? [15:25] so jodh will talk a bit about the work he did to implement cgroup support into upstart [15:25] Today, I'm going to give a brief [1] talk about cgroup support in upstart [15:25] and some of the challenges we faced. This may go some way in explaining [15:25] the seemingly never-ending upstart async branch updates I've given in [15:25] the past in this meeting :-) [15:25] = Intro = [15:25] As of version 1.13, Upstart supports cgroups. By "support", I mean "has [15:25] the ability to place job processes into one or more cgroups" (for [15:25] service resource control). It does _not_ mean that Upstart uses cgroups [15:25] to mop up the mess if a service ends badly (process supervision). [15:25] = The cgroup Stanza = [15:25] After thrashing out the design with stgraber, slangasek and hallyn [15:25] (http://upstart.ubuntu.com/wiki/Cgroup), we added support to parse a new [15:25] "cgroup" stanza that a job can specify. The final syntax is extremely [15:25] clean and praise goes to stgraber for realising how simple we could make [15:25] it!) Here's a summary of the behaviour: [15:25] - If not specified, the job processes are not placed into (any new) [15:25] cgroups. [15:26] - If a job specifies a cgroup stanza, that job cannot legitimately start [15:26] until the cgroup manager itself is running. To handle this, we added a new [15:26] initctl command ("notify-cgroup-manager-address") which the [15:26] cgmanager.conf job itself calls in post-start to notify upstart where [15:26] to find cgmanager :-) [15:26] - If specified, *all* job processes are put into the specified cgroup(s). [15:26] - If specified as "cgroup " ("cgroup cpuset" for example), [15:26] Upstart will add the job to a job-specific cgroup whose value will be [15:26] "$UPSTART_JOB-$UPSTART_INSTANCE". [15:26] - If specified as "cgroup cpuset foo 12", Upstart will place the job [15:26] processes into the implicit job-specific cpuset cgroup and set [15:26] "foo=bar" in that group. [15:26] - If specified as "cgroup cpuset hello foo 12", Upstart will place the [15:26] job into a the cpuset cgroup called "hello" and set "foo=12" in that [15:26] group. If "hello" does not exist, it will be created. This allows [15:26] multiple different jobs to enter the same cgroup if desired. [15:26] - The cgroup name ("hello" in the example above) can also contain variables: [15:26] "cgroup cpuset db/$foo/$bar-$baz". [15:26] - You can also get at the cgroup that Upstart would create on behalf of [15:26] the job using the magic $UPSTART_CGROUP variable (note that this is [15:26] _not_ an environment variable and is only valid within a cgroup [15:26] stanza). For example: "cgroup cpuset db/$UPSTART_CGROUP". [15:26] So far so good. [15:26] Since there is already an excellent cgroup manager available, and since [15:26] we try to avoid adding extra complexity to PID 1, we opted to avoid [15:26] * slangasek waits for the footnote to resolve [15:26] re-inventing the wheel by having cgmanager(8) handle the actual cgroup [15:26] operations. So, when Upstart starts a job that specified a cgroup [15:26] stanza, it needs to do the following: [15:26] - Connect to cgmanager. [15:26] - Ask cgmanager to create the cgroup(s). [15:27] - Ask cgmanager to move the specified process(es) into a cgroup. [15:27] - Ask cgmanager to set apply a particular setting to a cgroup. [15:27] However, there's a problem with the above. What if cgmanager hung? [15:27] We'll come back to this, but first I need to explain how Upstart spawns [15:27] a job. [15:27] = Async spawning = [15:27] == Historical synchronous spawning == [15:27] Upstart used to do the following when wishing to start a new job process: [15:27] 1) Create a pipe. [15:27] 2) fork itself. [15:27] 3) Have the child do all necessary setup such as dropping privileges, [15:27] closing fds, switching apparmor profiles, etc. Then: [15:27] - If the child finished its setup successfully it simply exec'd the [15:27] relevant program specified in the job .conf file for the job process [15:27] in question). [15:27] - But if the setup failed, the child wrote a status message back up to [15:27] the parent (PID 1) explaining what went wrong, and then exited. [15:27] 4) All the time the child wass doing setup, PID 1 was doing a _blocking [15:27] read_ on its end of the pipe. This implies that no operation in the [15:27] child setup phase could block, since if it could block, it would also [15:27] block PID 1, and thus DoS the system. [15:27] As a result, we couldn't call cgmanager from PID 1 directly, since that [15:27] could lead to a DoS, but we couldn't call it from the child _either_. [15:27] == Brave New World == [15:27] The solution was to change the way in which Upstart spawns. In the new [15:27] world, Upstart still creates the pipe but doesn't do a blocking read; it [15:27] just adds the fd for the read end of the pipe to a queue and waits for [15:27] some notification from the child. So we now have asynchronous child [15:27] spawning. To achieve this, we had to increase the number of states the [15:27] job can be, since there is now a distinction between: [15:28] - "the job process has been spawned successfully" [15:28] - "the job process is _being_ spawned" (but we don't know what the outcome is yet). [15:28] Here is the old state transition diagram: [15:28] http://people.canonical.com/~jhunt/upstart/upstart-states-old.png [15:28] And here's the new one: [15:28] http://people.canonical.com/~jhunt/upstart/upstart-states-new.png [15:28] However, that only solved half the problem - since Upstart was now [15:28] spawning asynchronously, the design meant that the order in which child [15:28] notifications could arrive became non-deterministic since either of the [15:28] following could happen "first": [15:28] - child exits and Upstart is notified via waitid(). [15:28] - child pipe closes or has data written to it and is notified by select(). [15:28] This needed careful handling since *both* those operations could update [15:28] the job state, but we didn't want the state to be "double-bumped". [15:28] In summary, adding the new async spawning feature was quite a challenge [15:28] with xnox and I gaining a few grey hairs in the process (his don't show! [15:28] Errm... ;-) [15:28] = Test Suite = [15:28] Since the new states and the new async nature of spawning meant that a [15:28] large chunk of the (large!) set of Upstart tests suddently broke. Hence, [15:28] it took lots of careful reviewing of both the code and all the required [15:28] test changes to resolve this new feature. [15:28] = Stateful re-exec = [15:28] This needed updating to handle the new cgroup stanza data. But we also [15:28] needed to consider scenarios like this: [15:28] - PID 1 starts a job process asynchronously. [15:28] - child takes "a long time" to setup. [15:28] - PID 1 is restarted. [15:28] Post-re-exec, PID 1 needs to know to keep track of the outstanding child [15:28] setup operations it is (asynchronously) waiting on. To handle this, we [15:29] added a new JobProcessData object to store the transitory child setup [15:29] meta-data (which gets discarded once the child has either died or [15:29] responded down the pipe). [15:29] = Cgroup Operations = [15:29] With the advent of async spawning, the child now makes all necessary [15:29] calls on the cgmanager with PID 1 being completely immune to any issues [15:29] that that may entail. In fact, aside from storing the parsed cgroup [15:29] stanza data, all PID 1 does is store the address of cgmanager! [15:29] = Conclusion = [15:29] The final result is an extremely clean and safe design. By introducing [15:29] async spawning we were also able to make Upstart fully immune to the [15:29] child blocking PID 1 (there used to be a couple of areas that [15:29] theoretically could cause issues on a mis-configured system). [15:29] --- [15:29] [1] - FTR, I'm using ev's definition of 'brief' :-) [15:29] * jodh grumbles over whitespace damage.... [15:29] A non-garbled version: http://paste.ubuntu.com/7915372/ [15:29] heh [15:30] jodh: so, http://people.canonical.com/~jhunt/upstart/upstart-states-new.png is the state diagram for what's implemented now in 1.13? [15:31] slangasek: yep - I need to refresh the cookbook with that. [15:32] (also pretified the graph in graphviz for vertical top/down layout) [15:32] vs previously "optimal" graph [15:32] xnox: yeah, that's much improved, thanks! :) [15:32] still just the single error path, though; couldn't you have made it more complicated? ;) [15:33] jodh: very interesting! is there a timeout after which the child is just considered hung, and does upstart do anything about that state? [15:33] barry: no - no timeout. [15:34] barry: it just gets stuck in e.g. "starting/pre-starting" state. [15:34] barry: if the child hangs, the state will reflect that if you run 'initctl status $job'. [15:34] (or some such, can't remember exact names of the spawning state" [15:34] ) [15:34] gotcha [15:35] * slangasek nods [15:35] if the job never gets around to starting, it's not init's job to fix it ;) [15:36] jodh: do we have people using the cgroup support in anger yet? [15:36] barry: If a job does hang though, you may get something useful in cgmanagers log (assuming you've set cgmanager_opts= in /etc/init/cgmanager.conf). [15:37] slangasek: I don't think so actually. I checked on a recent touch image and I can't see any evidence of it being used yet. We need to poke ted! :) [15:37] jodh: has ted been poked about this yet? [15:37] if not, then yes, yes you do ;) [15:38] slangasek: not by me directly. I added him to https://code.launchpad.net/~jamesodhunt/ubuntu/utopic/cgmanager/enable-upstart-cgroup-support/+merge/227209 so he should be aware of it. [15:38] slangasek: i'll poke ted about it. [15:39] xnox: ok, thanks [15:39] let's all poke ted. err... [15:40] any other questions about cgroups in upstart? [15:41] jodh: thanks for presenting! [15:41] [TOPIC] AOB === meetingology changed the topic of #ubuntu-meeting to: AOB [15:42] slangasek: I was thinking maybe you and stgraber could discuss bug 1314616? [15:42] bug 1314616 in bitcoin (Ubuntu) "[SRU] bitcoin to be maintained upstream in PPA: Replace distro archive "bitcoin" bitcoin with an empty dummy package" [Undecided,Confirmed] https://launchpad.net/bugs/1314616 [15:42] oh no [15:42] oh, that again? [15:43] stgraber: I had explicitly told them to submit an SRU to disable the daemon on upgrade [15:43] stgraber: and you are apparently not happy with the proposed solution [15:43] slangasek: I'm not? [15:43] stgraber: so yes, we should talk, but probably not during the meeting :-) [15:43] stgraber: that's what I heard! [15:44] there was an email to ubuntu-devel about it earlier this month [15:44] anyway, maybe we talk on #ubuntu-devel after the meeting? [15:44] slangasek: I think I was just unhappy that the reporter tried to get us to do things without going through the proper SRU process [15:44] ah :-) [15:44] slangasek: I don't care about the package itself and am perfectly happy to have it die one way or another :) [15:45] ok then! [15:45] anything else to discuss on this fine summer day? [15:45] I mostly complained to the reporter when he started nagging me as the current patch pilot to do something which needed discussion with the SRU team. Now that the SRU team has clearly been informed of it, someone should just sponsor a debdiff and be done with it. [15:46] * slangasek nods [15:47] #endmeeting === meetingology changed the topic of #ubuntu-meeting to: Ubuntu Meeting Grounds | Calendar/Scheduled meetings: http://fridge.ubuntu.com/calendar | Logs: https://wiki.ubuntu.com/MeetingLogs | Meetingology documentation: https://wiki.ubuntu.com/meetingology [15:47] Meeting ended Thu Jul 31 15:47:40 2014 UTC. [15:47] Minutes: http://ubottu.com/meetingology/logs/ubuntu-meeting/2014/ubuntu-meeting.2014-07-31-15.07.moin.txt [15:47] thanks, everyone! [15:47] thanks! === psivaa is now known as psivaa-bbl === psivaa-bbl is now known as psivaa [16:58] I wonder how many people have "repaired" their own brakes instead of getting a mechanic, then ended up careening off a cliff [16:58] Wrong channel, sorry :) === DalekSec_ is now known as DalekSec