[02:09] === trainguards: IMAGE 277 building (started: 20141011 02:10) === [03:09] === trainguards: RTM IMAGE 98 building (started: 20141011 03:10) === [04:24] === trainguards: RTM IMAGE 98 DONE (finished: 20141011 04:25) === [04:24] === changelog: http://people.canonical.com/~ogra/touch-image-stats/rtm/98.changes === [09:51] latest image on krillin has been sat for over an hour at bq screen and shows "offline" in adb... [09:58] popey, see ML [09:58] popey, i assume it is the "fix" for https://bugs.launchpad.net/mir/+bug/1370866 [09:58] Ubuntu bug 1370866 in mir (Ubuntu RTM) "Overly strict libmirplatform* dependencies are blocking CI" [High,Fix released] [09:58] we end up with the -mesa package installed ... i bet that forces a wrong alternative [10:01] ugh [10:08] thanks ogra_ [10:11] ogra_: how do you get krillin into recovery? [10:11] nvm, got it eventually [10:15] popey, the adb emergency shell should kick in as well [10:16] now I can't flash it [10:16] krillin not found on server https://system-image.ubuntu.com channel ubuntu-touch/ubuntu-rtm/14.09-proposed [10:16] theoretically you should just need to wait til lightdm stops respawnin [10:16] ubuntu-device-flash --channel=ubuntu-touch/ubuntu-rtm/14.09-proposed --revision=97 is what i did [10:16] are krillin images still on some hidden server? [10:16] no [10:16] try with --device [10:17] that got it [10:17] thanks [10:31] went back to 97 and it's still showing offline [10:31] with a session up, a password set and developer mode enabled ? [10:32] was sat at bq logo for an age again [10:32] (i think ubuntu-device-flash unsets it) [10:32] seems to have moved on now [10:32] its online now [10:32] ah [10:32] yeah, no adb on bq logo anymore [10:32] it only starts after lightdm nowadays [10:32] * popey goes to drink tea and stop playing with computers [10:33] thanks for the help ogra_ ! [10:33] well, i wonder what to do to not leave people out there in the dark [10:34] (for the whole weekend) [10:35] * ogra_ wanta a "one button rollback" for landed silos ! [10:35] *wants [11:01] I don't think there is anything to add.. [11:01] https://www.youtube.com/watch?v=Fr0A7TofowE [11:01] ogra_: is there anything I can help with? [11:02] have we removed the broken revision from the image servers? [11:02] (not that I have the power to do that) [11:02] Wellark, nope, i'm working on a build system hack to force the alternative ... that should get us back to life [11:02] ogra_: ok. thanks for your hard work! [11:03] np, i dont want to leaver the people out there in the dark for the whole weekend :) [11:03] this is probably the worst problem we have faced with our images so far [11:03] yes. I totally understand that [11:03] and if there is anything I can help just ping [11:03] I'll try to watch irc from time to time [11:04] ogra_: only rtm affected? [11:04] not n4? [11:04] heh, there is rtm for n4 too [11:04] i think all images are affected [11:05] n4 affected as well [11:05] right. [11:06] I did a really stupid thing by using -bootstrap while using ubuntu-device-flash to recover from the issue and now I lost all my data :/ [11:06] * nik90 switches to rtm stable image #4 for a while [11:10] nik90: had a lot of personal data? [11:11] Wellark: I was dogfooding the phone as my primary phone for the past 3 weeks...most of them recoverable but local app data lost [11:11] not a lot tl;dr [11:11] nik90: ;( [11:12] maybe we should add a confirmation step to ubuntu-device-flash if you declare --wipe or --bootstrap [11:13] as it's so easy to just add them in a rush.. [11:13] my device was stuck at the google logo and adb devices didn't list it, so I had to go into the recovery mode and use --bootstrap argument to fix the situation. but only later on I realised what I had done [11:13] yeah that would help [11:14] nik90: would you be able to file the bug? [11:14] Wellark: yes, on it [11:14] my wife is already giving me bad looks [11:15] nik90: thanks! [11:15] nik90, how long did you wait for adb ... the emergency shell will kick in after 5-10 min (once lightdm timed out trying to respawn) [11:15] (happens fine here) [11:16] indeed that only works for OTA when developer mode was already working before (or if you flashed with --developer-mode and --password) [11:16] ogra_: well the first time I noticed it was stuck in the google logo, I waited for about 20-30 mins,...then panicked and rebooted the phone ..at which point I didn't wait too long [11:17] hmm [11:17] and you had dev mode enabled and working before the OTA ? [11:17] yup I do clock app dev using qtc on device everyday [11:17] (i mean ... it is moot, i'm supposed to remove the emergency shell anyway soon, then you wont get in at all anymore if the session doesnt start) [11:18] (though i posted a workaround to get adb to the ML in my second mail) [11:18] (that will always work) [11:19] ah, do you mind posting it to ML, might come in handy in the future [11:21] (though i posted a workaround to get adb to the ML in my second mail) [11:21] read closer ;) [11:21] oh :) [11:21] I thought you said you were going to post it, eh my bad [11:21] english :P [11:23] Wellark: bug 1380055 [11:23] bug 1380055 in phablet-tools (Ubuntu) "Image flash arguments like -wipe and -bootstrap should ask for data wipe confirmation" [Undecided,New] https://launchpad.net/bugs/1380055 [11:24] err what ? [11:24] if you define -wipe you pretty clearly state that you want to wipe your data [11:24] ogra_: my bad again, I should remove -wipe from the bug report [11:25] not sure about bootstrap ... but people that know what that word means should surely expect it to wipe as well [11:25] ogra_: well it still wouldn't be a bad idea to ask for a confirmation..kinda like when you do shift-delete [11:25] or shutdown the comp [11:26] well, that would need a lot of bigger changes [11:26] since you also need an override for the question then [11:26] to not break all landings and smoke tests [11:26] oh [11:26] (where u-d-f is widely used automated) [11:26] I hadn't thought of the repercussions [11:27] nik90: thanks1 [12:12] ogra_: i upgraded and my phone doesnt boot anymore (krilin ubuntu-rtm) ... anything bad happened? [12:12] (hello! :) [12:12] asac, yep [12:13] see the ML [12:13] ogra_: cant we pull the binaries? [12:13] i added a hack to livecd-rootfsto work around that issue ... but nobody from the rellease team seems to be around [12:14] asac, to big ... forcing the right alternative is easier for now (and rolling back would need the release team the same way) [12:14] ogra_: i mean, pull the binaries from system-imamge server [12:14] so folks dont hit this that havent yet [12:14] asac, the last time i did that everything fell apart ... [12:15] (copying an older image to be a newer one ... we cant just "pull" we need a newer image that replaces) [12:15] ogra_: texted langasek [12:15] slangasek: ^ [12:15] could we then take the image server offline? [12:15] bit early [12:15] IS can do it? [12:16] ogra_: when and how can we recover? [12:16] they have emergency contanct 24/7 [12:16] asac, you should have texted cjwatson if anyone at this time [12:16] asac, as i said, a livecd-rootfs fix is in unapproved [12:16] ogra_: also texted him [12:16] it might stay there for a while [12:16] i need some release team person to let it through [12:16] now [12:17] ogra_: any way i can recover my phone? [12:17] asac, i put all my forensic findings on the ML in case you want to see whats wrong [12:17] i need it kind of [12:17] asac, did you have dev mode enabled ? [12:17] before the last OTA [12:18] then you should get an emergency adb shell after lightdm gave up respawning (5-10min) [12:18] yeah [12:18] go in and do the following: [12:18] mount -o remount,rw / [12:18] I wonder if there was rtm bug for doing those metapackage changes at this hour. well, what's done is done. [12:18] (err sudo indeed) [12:19] Mirv, well, there was a bug, there is even a fix ... and the issue was also seeen during silo testing [12:20] ogra_: that bug #1370866 (fix for which caused this apparently) by itself was not a rtm bug, but it was reportedly blocking CI some way [12:20] asac, then: sudo apt-get purge libmirplatform3driver-mesa libmirclient8driver-mesa [12:20] bug 1370866 in mir (Ubuntu RTM) "Overly strict libmirplatform* dependencies are blocking CI" [High,Fix released] https://launchpad.net/bugs/1370866 [12:20] asac, and then: sudo mount -o remount,ro / [12:20] exit the shell and adb reboot [12:20] that should get you the session back [12:21] Mirv, right, that was the initial bug ... see what i sent to the ML [12:21] anyway, getting that livecd-rootfs in would be enough. I just wonder if it could be automated that a) each MP is related to a bug report, b) the bug report has rtm tag [12:21] there is a followup bug that was filed after silo testing [12:21] that silo should never have landed without the second bugfix [12:22] right, I read that [12:22] oh yeah, and Mir has failed testing by QA [12:23] ogra_: cant we do something without release team? [12:23] so we can roll a new image? [12:24] asac, not really [12:24] ogra_: ah, but this is the original, already abandoned Mir 0.8.0 release that was stuck in -proposed for a reason. somehow it ended up to release pocket last night. [12:24] ogra_: have you pinged everyone in the release team? [12:24] asac, everythign will get stuck in unapproved [12:24] asac, no, i have pinged #ubuntu-release [12:24] ogra_: please ping them all directly [12:24] maybe someone is around [12:24] they might not watch channel, but highlight might do [12:25] asac, i suspect most of them are on planes right now ... to duesseldorf [12:25] asac: we can pull down the image server [12:25] ogra_: well, worth a try anyway [12:25] Wellark, nope, we cant [12:25] ogra_: sure we can [12:25] we can make the index files RO [12:25] err 000 [12:25] Wellark, only IS can [12:26] yes is should be reachable [12:26] ogra_: yes, and IS has 27/4 emergency contact [12:26] there's a built and tested but not QA sign:d off Mir 0.8.0 retry release in a silo (and has already landed in utopic) [12:27] anyway the phones must survive on an outage of the image servers [12:27] and the broken images are being OTA'ed as we speak [12:27] but that doesn't include the bug #1378995 fix either so the livecd-rootfs fix is needed anyhow [12:27] bug 1378995 in Mir ""citrain upgrade-device" fails to upgrade mir properly (Mesa driver installed instead of Android makes the phone unbootable)." [Critical,In progress] https://launchpad.net/bugs/1378995 [12:27] any of these steps will take time [12:28] lets simply see that my fix can land and we should be fine [12:28] ogra_: but we cant [12:28] asac: your call [12:28] asac, no matter what we do, all actions we can take will take a similar amount of time [12:28] I say let's bring down the image servers until we have a fix [12:29] ogra_: making image server unavailable would be super quick [12:29] temporarily [12:29] IS can make that happen in minutes [12:29] well, your call [12:29] what will happen? [12:29] no idea [12:29] we have never done it [12:29] and i personally find that insane [12:29] asac: taking the servers down will make sure the OTA updates stop fowing [12:29] well, but who knows how our phone will reacht [12:29] well, we anyway have to test if the phones can survive image server outtages [12:30] do we want to test that after we ship? [12:30] nothing should happen [12:30] no, we have to do a big post mortem about this for sure [12:30] as it's nothing else than a network outtage [12:30] or if something happens we have to catch that before we ship [12:30] asac, well, as i said, all forensic work is documented in the mail thread [12:30] nah [12:30] tahts not it [12:31] far bigger issues [12:31] the issue is that a silo landed that should never have been approved ... [12:31] well, that's a problem for later [12:31] for which an issue was known and even a fix exists already [12:31] right now we have bricking OTA updates being server from the image servers [12:32] something that might happen after we ship as well, so this is also a valid case for testing what happens if we have to bring down the image servers for whatever reason [12:32] asac, further i want a meeting at the sprint about one click rollbacks ... we should keep some kind of landings DB (keeping the packages file of a silo in there) so we can have easy rollbacks [12:33] as ogra said: we need release-team to roll out new images [12:33] and we can't reach them [12:33] we can however reach IS [12:33] ok called IS [12:34] left a message [12:34] ogra_: do you know how the mir ended in release pocket? it did fail testing, the landing was abandoned from CI Train. I didn't catch why it was kept in -proposed in the first place instead of deleting last week. [12:34] asac: emergency number did not answer? [12:34] Mirv, no idea, CI train issue ? [12:34] Wellark: its a voice box that pages the guy on duty [12:34] asac: ah, ok [12:34] not sure how long it will take [12:34] asac: you also pinged at #is ? [12:35] you guys are aware that we have no chance to get fixes to the people that have the issue (to at least OTA via adb) ? [12:35] if you take down the server [12:36] ogra_: as soon as we have fixed images ready we online the servers again [12:36] and they can fix by ubuntu-device-flash [12:36] going hacking around with rootfs is not a solution [12:37] nobody said that anyone should hack around [12:37] people that dont have the issue will automatically get the next image ... people that have the issue can use system-image-cli via emergency adb shell [12:37] ogra_: people who don' have the image might download it while we wait for the fix to land in the server [12:38] * ogra_ wonders ... [12:38] let me try something [12:38] taking the server down makes sure nobody else gets the boroken image [12:38] ogra_: it was published on Thursday but stuck in -proposed. I see actually on the landing chart it was QA sign-off:d originally, and that's why it probably got published. [12:38] those who already got it are screwed already [12:38] ogra_: can you join #is too? [12:38] we need to stop spreading the bricking images until the fix is in the servers [12:40] Wellark: well, thats what we are doing [12:40] asac: yes. and IMO that's the right solution [12:43] * ogra_ tries a direct dput to rtm instead [12:44] that should get us at least a working rtm image again [12:44] ha, that worked [12:44] https://lists.ubuntu.com/archives/rtm-14.09-changes/2014-October/000681.html [12:46] so within 2-3h we should be back in business [12:46] ogra_: cant you join #is? [12:46] ogra_: i need info which servers produce images [12:46] asac, i have no idea ... [12:47] ogra_: sure? [12:47] asac, they get synced from nusakan to the http server ... i dont know what that hattp server is [12:47] ogra_: you dont know which server produces the system images? [12:47] that i know [12:47] ogra_: i need to know who produces them [12:47] nusakan produces them, buit that wont help you [12:48] since you definitely dont want to stop nusakan [12:48] but tear down the http machine that provides them [12:48] thats not why i asked it [12:48] whatever is system-image.u.c [12:48] yes, but i want nusakan to still be able to talk to that server [12:48] maybe it uses info from that server [12:48] https://lists.ubuntu.com/archives/rtm-14.09-changes/2014-October/000681.html [12:48] i dont want to risk that we cant produce images anymore [12:49] err [12:49] sigh [12:49] 91.189.88.35 [12:49] so we still allow that server to talk to it [12:49] thats the IP [12:49] the name is good enough [12:49] thanks [12:52] ogra_: why do we touch livecd? [12:52] why not just backout this mir stuff [12:52] i am scared about touching image bits in a hurry [12:53] asac, because it can always happen that some broken dep pulls in the mesa packages .. the livecd-rootfs fix makes sure that in such cases alway the android alternative for the driver is used ... [12:54] asac, backing out that mir stuff will be a day of work i guess, its a ton of packages [12:54] i was aiming for a quicker fix (and for one that fixes the issue once and for all for the future too) [12:55] well, lets hope [12:55] that this doesnt cause the next fire [12:55] ogra_: thanks! [12:56] asac, if it does i'll put it out :) [12:56] the fix in livecd-rootfs is needed in any case [12:56] (we have the samee for hybris already) [12:57] ogra_: ok, so systme-image is 403 now [12:57] from everywhere but nusakan [12:57] ok [12:57] in csae nusakan needs something [12:57] well, nusakan uses ssh to copy [12:57] no need for http [12:57] ogra_: can you check the image before we make system-image available again? [12:57] doesnt matter [12:57] not easily [12:57] its safer [12:58] they might look at index.json or something [12:58] who knows [12:58] ok too much scrollback [12:58] better dont make it unavailable [12:58] what's current status? [12:58] cjwatson: hah :) [12:58] cjwatson, lol [12:58] cjwatson: we have taken system-image down ... make it 403 [12:58] I see somebody accepted my livecd-rootfs upload from this morning [12:58] cjwatson, i dput the livecd-rootfs fix to 14.09-proposed ... setting the alternatives to android [12:58] cjwatson: someone from releas team is helping approving livecd-rootfs, please double check that this change is safe if you can!! [12:58] thanks so much! [12:58] asac: I uploaded it [12:59] when I saw image build failures this morning [12:59] ogra_: is that identical to my upload? [12:59] cjwatson, we ended up with -mesa and -android packages in the image ... the build picking the wrong alternative [12:59] cjwatson, niope [12:59] paste me the diff? [12:59] cjwatson: current issue is that image bricks phones... not fails to build [12:59] cjwatson, i added to the libhybris alternative forcing snippet [12:59] asac: I definitely saw build failures this morning [12:59] cjwatson, http://bazaar.launchpad.net/~ubuntu-core-dev/livecd-rootfs/trunk/revision/979 [12:59] cjwatson: ok i will step out for a bit, check ogras upload [12:59] ogra_: oh I see [13:00] cjwatson, i'll need to clean that up next week to remove the 3/8 in the package names [13:00] might need some thought later but doesn't seem obviously wrong [13:00] right [13:00] in any case that should make us bootable again [13:00] so is that in utopic too? [13:00] yes [13:00] i did dput to rtm before it got out of unapproved in utopic though [13:01] ah you uploaded it separately [13:01] ok [13:01] now waiting for rmadison :) [13:01] doesn't look like anything for me to do now, your patch looks good to me [13:01] cjwatson: do you know how we could copy the previous image on top? [13:02] i think that would even be better because folks might have downloaded in background [13:02] no [13:02] and in this way they get something new [13:02] no doubt it's possible but me messing around with the server with no experience is not the path of wisdom [13:02] ok. i am sure we need that [13:02] as a simple feature documented for such cases :) [13:02] but lets make a post-mortem i guess after [13:02] unless Stéphane happens to be around, right now a new build will likely be faster [13:03] it's clearly possible if nothing else by editing the json, but I don't want to make anything worse [13:03] as i said above ... last time i tried to copy an older image on top hell broke lose [13:04] we need stgraber for that magic [13:04] and we definitely need system-image training at the sprint [13:05] (and if you would just edit the json the deltas might all be wrong, that wouldnt work anyway) [13:09] do we have progressive image release implemented but not enabled, or something? like first to 1% of users, then 5%, 10% etc. similar to SRU updates [13:09] i dont think we have that implemented [13:10] sounds like a good sprint topic too [13:10] yes [13:12] Mirv: heh, I have a to-do item from asac to talk to the landing team about that ;-) [13:13] ok, I'm afraid I have to go, I had other plans for today [13:14] ok, one bug found. https://bugs.launchpad.net/ubuntu/+source/phablet-tools/+bug/1380079 [13:14] Ubuntu bug 1380079 in phablet-tools (Ubuntu) "can't use ubuntu-device-flash offline even for cached images" [Undecided,New] [13:17] how are we going to smoketest the fix given that system-image is refusing access? [13:17] cjwatson: can't we flash them manually with phablet-flash? [13:17] well you still have to fetch it somehow [13:17] or fix ubuntu-device-flash and download the images directly to the local cache [13:18] cjwatson: IS an get us a copy of the images ones they are built and in the servers [13:18] hmm [13:18] true, ogra_ can grab it directly too [13:18] as the machinery is still able to push new images [13:18] i cant get into nusakan [13:18] we allowed nusakan to access the image servers [13:18] ah, just took long [13:18] Wellark: pointlessly [13:19] ogra_: are you familiar with ubuntu-device-flash? [13:19] Wellark: the block was only http; nusakan does not push stuff over http :) [13:19] Wellark, only as a user ... i dont touch go code [13:19] (or didnt touch go code yet at least) [13:19] could someone come up with a quick patch we can apply locally so that we can download the new images straight under .cache as udf would do from the servers and let it flash them [13:20] and i also dont have an idea how u-d-f would allow me to flash something not from a system-image server [13:20] I can tako on it as well [13:20] i doubt thats possible [13:20] it is [13:20] honestly, it would probably be easier to lift the http block for five minutes [13:20] well, that too. [13:20] (beyond flashing your own device tarball ... which isnt what we want) [13:20] for this time [13:20] cjwatson, ++ [13:20] cjwatson, ogra_: device-flash supports custom servers [13:20] i triggered an rtm build ... bot should announce in a few [13:20] next time we could push the fix to a custom channel [13:20] so we ask IS to point the root somehwere [13:20] or something [13:21] Wellark, yes, but you still need a working server setup first [13:21] ogra_: we have that [13:21] it's just blocked. [13:21] I generally think it is unwise to invent process mid-crisis [13:21] is can change the root [13:21] cjwatson, yeah [13:21] simpler to go back to what we had for a short while [13:21] ok. agreed. [13:22] cjwatson: Wellark: ogra_ whatever fix you want would require a copy of channels.json to be available and the device json as well [13:23] well, and all the related filesystem structure i guess [13:23] like I say that sounds like a bad thing to try to do on the fly [13:23] yep [13:23] esp on a weekend [13:23] +1 [13:23] lets just switch on http for a test [13:23] sergiusens: for the future, could udf cache the last succesfull .json files so we could flash the cached images offline? [13:23] we need to fire drill [13:24] and have somthing for crisis management written down though [13:24] ok, really off, SMS my mobile if you need me again. I've turned up the volume so I should hear it more promptly this time [13:24] (well, lets get an image first :) ) [13:24] cjwatson, thanks for showing up !! [13:24] Wellark: yes, but as I said, nothing will solve your immediate problem [13:24] sergiusens: that's the "for the future" part ;) [13:24] Wellark: immediate future? [13:24] :-) [13:24] we should definitely have some sprint topics out of this issue [13:25] as all of the files are already in my local .cache, [13:25] I just can't flash them [13:25] ogra_: disaster recovery? I guess that would involve someone from IS for sure [13:25] 1) easier rollbacks 2) system-image training 3) staged release process [13:25] we lacked in all three bits today [13:25] 4) critical servers are isolated so we can take them down [13:25] ogra_: it is already staged, problem is, we are all on -proposed ;-) [13:25] without hosing everything [13:26] sergiusens, well, rool out staging that is ... only roll out to 1000 people, then to 10000, then 100000, then the rest [13:26] *roll [13:26] ogra_: the canary stuff [13:26] right [13:26] could someone write an update to the ML [13:27] ogra_: still, this doesn't happen if everyone would run the stable channel [13:27] I don't think I'm the right person for that [13:27] that way we dont need to do insane things like tearing down servers [13:27] asac: --^? [13:27] Wellark, about what ? [13:27] we'll announce the fix once it is there [13:27] people should know by now that there is an issue and it is being worked on [13:28] ogra_: well, just saying that the image servers are down.. and will be restored when new image is ready. [13:28] (if they followed the thread) [13:28] as now udf is totally unusable [13:28] for any channel [13:29] Wellark, well, its a ML ... just write something if you feel like :) [13:35] ogra_: I already said I might be the best person, but I will write the email then :) [13:36] === trainguards: RTM IMAGE 99 building (started: 20141011 13:36) === [13:38] ogra_: i sent mail [13:38] Wellark: ^^ [13:38] asac, ok [13:42] asac: oh, you beat me to it. [13:42] I also hit "send" already [13:42] anything else that needs doing or are we waiting for image? [13:42] oh, seems we have the same contents :) [13:43] although your topic is more catching [13:43] asac, image is building ... we need http re-enabled then to test OTA [13:43] (i have my production device still on yesterdays image and can easily test then) [13:43] ogra_: can you check the new image offline somehow first? [13:44] let's just make sure all of the images are there before enabling.. [13:44] before reenabling OTA? [13:44] not easily [13:44] well, as I said, udf support --server [13:44] and i think we also want to make usre OTA works, no ? [13:44] well, you can login, unpack, check that the alternative isthere? [13:44] sure, but only after double checkig that the fix did what it was supposed to do [13:44] if thats possible [13:44] Wellark, it is saturday ... i surely wont set up a server now here [13:44] it would be easy for IS to relocate the image server to test.touchimages.ubuntu.com or something [13:44] we can enable http for 5 min coordinated with IS [13:44] ogra_: it's just apache configuration [13:44] they are virtual servers [13:45] takes like 3 lines of apache configuration [13:45] Wellark, seriously, your comments arent helping [13:45] ogra_: asac asked, I replied [13:45] i didnt ask [13:45] i wanted to know if ogra can look at the image before it goes out [13:45] or someone [13:45] anyway [13:45] enabling http for the time it takes to download the OTA should be enough to verify [13:45] 16:43 < asac> ogra_: can you check the new image offline somehow first? [13:46] i'll know if it boots within 1min after it flashed [13:46] Wellark, and i said i cant easily [13:48] there is a way. I'm just concerned if the image is broken [13:48] we do have a way of testing them without enabling OTA on the existing devices [13:48] asac: your call. [13:49] sergiusens: is --server working? [13:49] has it been tested? [13:50] (just thinking of the accident report future work section here= [13:52] also we need a graphical disaster recovery tool to protect against bricked devices on incident like this after we ship [13:53] as end users having to get their phones to service centers for flashing will cost everyone too much [13:53] small price to pay on a graphical flasher [14:01] oh, hmm, the bot wont announce if the image is done (since it uses the json file to check that) [14:02] * ogra_ wonders what else in the infrastructure might now have fallen apart due to the server being 403 [14:03] ogra_: as long as the image builder is able to build the images we are fine [14:03] IS can copy the files over, if it goes to that [14:03] Wellark, except that changelog generation scripts and other stuff might be completely broken and will need a bunch of work next week [14:04] ogra_: yes. we need buch of work next week. [14:04] ogra_: can you try to write or tell me how folks will need to recover from bricked state? [14:04] right, to recover from the mess tearing down the server causes [14:04] would like to prep that mail [14:05] asac, yes, once we know it is fixed [14:05] lets have an image first [14:05] ogra_: well, would like to prep the mail :) [14:05] ogra_: there was no other choice [14:05] if ou dont know yet its fine [14:05] but thought we are waiting for image to be done so maybe we have a bit time :) [14:05] well, theoretically just: adb shell system-image-cli [14:05] only if developer mode right? [14:05] Wellark, there were other choices we simply didnt take [14:06] guess we need soomething better :/ [14:06] ubuntu-deice-flash? [14:06] we simply can't ship bricked images [14:06] nothing is worse [14:06] asac, well, without dev mode there os only --bootstrap flashing from fastboot [14:06] which wipes the device though [14:06] and --bootstap will nuke all the dogfooders data [14:06] thats awful [14:06] yes, thats on purpose [14:07] its not good enough [14:07] --bootstrap is for getting you a freshly formatted device ... [14:07] which gets us back to: _we simply can't ship bricked images_ [14:07] * ogra_ goes afl for the next hour til the image is done [14:07] the decision to take the servers down was the right one [14:07] as we had no contact for the release team [14:08] and no ETA when we get fixed images done [14:09] ogra_: why does normal ubuntu-device-flash not work? [14:09] that should for official images imo [14:11] it does in recovery indeed [14:11] cool [14:13] rootfs is done ... [14:13] * ogra_ waits for import-images to pick it up on the machine [14:15] ah, it started ... [14:15] another 30-45min and we should be good [14:22] asac: so, just to verify? we will push an image without even a smoketest to the public servers and make it available for OTA clients? [14:29] err [14:29] http://people.canonical.com/~ogra/touch-image-stats/rtm/20141011.1.changes [14:30] cjwatson, was that your change ? http://people.canonical.com/~ogra/touch-image-stats/rtm/20141011.1.changes ... it now just dropped the packages completely [14:30] thats not right [14:31] oh, it *is* right, sorry [14:31] * ogra_ checks the manifest to make sure [14:31] ogra_: how long before we want to turn on http? [14:31] for quick try? [14:31] 20 min? [14:31] asac, gimme a few mins to verify evereything is as it should [14:31] ogra_: ok will be back in 10 minutes and then call IS [14:32] sounds good [14:32] if you give me a go. will tell them to stay around while you test the OTA [14:33] asac, ok, ready whenever you are ... the package changes are fine [14:43] asac, btw, i'm not sure how exactly the "there is a new image" notification system works, but it could well be that we trashed it too by setting the server to 403 (not sure it has ever been tested with that case) [14:43] instead of taking it down the respective subdirs should perhaps been made readonly instead [14:44] ogra_: is all already on system-image? [14:44] no idea, i can only see nuaskan :) [14:44] ah ok [14:44] but it is all where it should be [14:44] (on that machine) [14:45] so i assume it is fine on s-i as well [14:46] ok i called [14:54] === trainguards: RTM IMAGE 99 DONE (finished: 20141011 14:55) === [14:54] === changelog: http://people.canonical.com/~ogra/touch-image-stats/rtm/99.changes === [14:58] ogra_: so utopic cannot build? [14:59] and utopic has this bustage too? [14:59] not sure it does [14:59] but it definitely had build issues tonight ... [14:59] let me trigger a build [14:59] we should have checked that. i thought we had it fixed everywhere now [14:59] * asac sighs [14:59] but well [15:00] fired up ... but dont hold your breath [15:00] point is that we might now let more utopic folks run into death while we fix that image build [15:00] so we should have brougth both up [15:00] https://launchpad.net/~ubuntu-cdimage/+livefs/ubuntu/utopic/ubuntu-touch/ [15:00] (thats the utopic build) [15:01] ogra_: can you check if the current image has the mir landing? [15:02] Setting up libmirclientplatform-android:armhf (0.7.3+14.10.20140918.1-0ubuntu1) ... [15:02] update-alternatives: using /usr/lib/arm-linux-gnueabihf/mir/clientplatform/android/ld.so.conf to provide /etc/ld.so.conf.d/arm-linux-gnueabihf_mirclientplatform.conf (arm-linux-gnueabihf_mirclientplatform_conf) in auto mode [15:02] Setting up libmirclient8:armhf (0.7.3+14.10.20140918.1-0ubuntu1) ... [15:02] Setting up libmirplatformgraphics-android:armhf (0.7.3+14.10.20140918.1-0ubuntu1) ... [15:02] update-alternatives: using /usr/lib/arm-linux-gnueabihf/mir/platformgraphics/android/ld.so.conf to provide /etc/ld.so.conf.d/arm-linux-gnueabihf_mirplatformgraphics.conf (arm-linux-gnueabihf_mirplatformgraphics_conf) in auto mode [15:02] Setting up libmirserver25:armhf (0.7.3+14.10.20140918.1-0ubuntu1) ... [15:02] should be fine actually [15:02] ok, then i dont think we should stress ourselves out [15:02] that the image doesnt build [15:02] (thats from the last successful build, no -mesa alternatives) [15:02] if it doesnt [15:03] well, my fix will in any case force the right alternative in utopic too [15:04] so we should be a) safe with whatever is available ... and b) be safe for future images (of tehy build at all now) [15:06] ogra_: , can you scribble recover instructions? Just do the ones with recovery mode... i dont want to add multiple options in the main mail [15:08] reboot to recovery with the right key combo for your device (they differ by device) and run: ubuntu-device-flash --channel ubuntu-touch/ubuntu-rtm/14.09-proposed --device=$your-device [15:08] that shoudl be all thats needed ... [15:08] if you can still adb shell: "adb shell system-image-cli -v" will be enough [15:09] ogra_: do we have documented how to boot in recover for krillin and mako? [15:09] not sure, perhaps on the install page [15:09] but i dont think we cover krillin anywhere public [15:10] thats fine [15:10] just mako [15:10] let me try my mako [15:10] where is it? [15:10] http://developer.ubuntu.com/start/ubuntu-for-devices/installing-ubuntu-for-devices/ [15:10] ogra_: please validate that the above works [15:10] and i will send announce with that info [15:11] ogra_: there is no info how to boot into recovery for mako [15:11] Power the device off with the Power button. [15:11] Reboot into the bootloader by pressing the correct physical button combination for your device type as shown here: https://source.android.com/source/building-devices.html#booting-into-fastboot-mode [15:11] bah [15:11] thats not helpful [15:11] we want recovery, not fastboot [15:12] ack [15:12] our documetnation isnt really good imo [15:12] no clear sections etc. [15:12] not even a toc [15:12] well, the wiki had a super detailed TOC [15:12] also like two page warnings and disclaimers [15:12] before starting [15:13] wiki was also super hard to follow [15:13] but thats gone when it was moved away from wiki to official docs [15:13] right, but it wasnt really better [15:13] at least for me [15:13] asac, on mako it is vol-dn and power [15:14] anyway, i will not explain how to do that [15:14] guess the mailthread will explain it [15:14] ok [15:14] ogra_: lets validate that the recovery approach works and then i send it [15:14] thansk! [15:14] * asac will be back in 10 minutes [15:15] sigh [15:15] on krillin it is really complex [15:15] vol-dn + power ... then select recovery with vol-up and press power agaiin [15:15] hah [15:16] great [15:16] my mako already had the broken image downloaded and now goes to flash automatically to it [15:17] oh, no, that was utopic anyway [15:18] but that means i can verify utopic now :) [15:18] great, utopic verified, not affected [15:20] hmm [15:20] my rtm krillin doesnt have a location indicator anymore [15:21] location works though [15:21] i wonder if thats a design thing [15:21] ugh [15:21] and the new "pinned in launcher" thingie is butt ugly :( [15:24] ogra_: we dont need instructions for krillin in mail [15:24] i will bounce them to PES folks for support [15:24] k [15:24] so: [15:24] vol-dn while powering on [15:25] select recovery and press power [15:25] once in recovery run: [15:25] ubuntu-device-flash --channel=ubuntu-touch/ubuntu-rtm/14.09-proposed [15:25] thats all [15:25] ogra_: did you validate from bricked state? [15:25] just doing [15:25] no --bootstap required? [15:25] but downloading takes me 20min [15:25] Wellark, not for upgrading [15:26] good. people wont loose their data [15:26] --bootstrap is obviously for bootstrapping ;) [15:26] (which implies wiping) [15:27] ogra_: whats the fixed imge number on mako rtm? [15:27] well, udf expects the device to be in certain states for diggerent options [15:27] 83 [15:28] so as long as udf can flash from recovery without --bootstrap it's fine [15:28] asac, i need to mow my lawn before it gets dark ... can i leave you with this (will be back in 30min/1h) [15:29] ogra_: yes all good [15:29] if not i call [15:31] oki [15:31] my download still runs, should be done in 15 [15:31] i'll probably drop by inbetween [15:35] ogra_: ok so i have to wait unti lyou re back? [15:35] to confirm that your instructions work? [15:35] * asac waits [15:54] Wellark: bootstrap is like "flash_all" from android [15:58] ogra_: so do we know how the mir silo got published? [16:01] asac: brendan sent an email to the list as a reply to "Don't update to RTM build" [16:01] sergiusens: what does he say? [16:02] asac: That explains a lot then - yes this silo failed to install using the citrain tool (dist-upgrade). We were instructed by the lander that apt-get installing each package was the method they used and the 'right' way. I guess this was bad advice and we should have payed attention to what citrain did. We've been caught out by issues like this several times and I think this really brings to the fore the need to have a conversation about [16:02] consistent installation steps for silos so that these kinds of oversights happen less often (preferably not at all). I opened this bug last week: https://bugs.launchpad.net/ubuntu/+source/phablet-tools/+bug/1378245. We can use it as a container for suggestions about how to make the output of the tool most accurately reflect what ends up in the image. [16:02] Ubuntu bug 1378245 in phablet-tools (Ubuntu) "citrain could use a more accurate way to upgrade from silos" [High,In progress] [16:02] asac, Mirv sent a summary [16:02] to the first ML thread [16:04] ogra_: can you try to summarize the summary? [16:04] i am sure its super detailed :) [16:04] * ogra_ finds it most funny that the breakage was known by upstream and a fix existed already [16:06] asac, there was a silo that was tested by upstream, an issue was found and seemingly ignored (the bug i just mentioned) ... then it was handed to QA with the instructions sergiusens talked about ... QA folloed them and signed off [16:06] ogra_: whgich breakage? the "does not boot:? [16:06] yes [16:06] they noticed the wrong dependencies [16:06] i wrote detailed mails about each of these bits in the "Don't update to RTM build" thread [16:07] i am really getting flooded with details [16:07] hence those details i cannot digest really [16:07] the wrong dependencies in turn caused that during rootfs build the mesa alternative for the mir driver was used [16:07] i really rely on very precise super tight summaries [16:07] asac: ogra_ well at least we can agree it was signed off [16:07] at least makse my life super harder [16:07] yes [16:07] err easier [16:08] asac, "wrong deps in Mir that were ignored caused the wrong driver to be used in the rootfs, upstream testing instructions worked around the breakage so that QA signed off" [16:09] asac, thats your one line summmary :) [16:10] ogra_: cool. that was pretty good. where those special instructions just for that landing? e.g. didnt they use the normal wiki instructions> [16:10] well done summary :) [16:10] you have to ask brendand or davmor2 for details on the testing plan [16:10] i think there were specific instructions in the spreadsheet for this silo [16:11] (as i understood it) [16:12] heh [16:12] ok thanks [16:12] guess its a first that someone has special install instructions for a silo [16:13] seems easy to fix by just making it clear that special install instructions are never allowed :P [16:13] * ogra_ has another 150m² to mow now :) [16:13] (after cigarette break) [16:13] ogra_: did you validate the recovery procedure? [16:13] still waiting for an ack [16:13] one sec [16:13] would really like to go off and prep for travel [16:13] * ogra_ needs to go to the office for that [16:13] shops are almost closed [16:14] asac, all fine [16:14] device is up and running [16:14] ogra_: ok one more time: [16:14] 1. connect your phone to your ubuntu desktop/laptop 2. boot phone into recovery (on mako: 1. volup + power, 2. select recover and press power) 3. on desktop/laptop run ubuntu-device-flash --channel=ubuntu-touch/ubuntu-rtm/14.09-proposed [16:14] ogra_: is that correct? [16:14] yep [16:15] thats what i did [16:15] ogra@anubis:~/Devel/seeds/ubuntu-touch.utopic$ adb shell system-image-cli -i|grep "version version" [16:15] version version: 83 [16:15] hahaha [16:15] on mako you shoudl end up with version 83 then [16:16] it seems people are busy here [16:16] quite [16:16] let me see, mir + update alternatives? [16:16] * ogra_ lost his whole day to that :P [16:16] ogra_: ok sent mail [16:16] have fun [16:16] rsalveti, how did you guess ! [16:16] yeah, had that yesterday when updating with apt-get [16:16] i will leave support of follow up questions to the folks now [16:16] have to really run [16:16] but then went to bed thinking I was some sort of crazy [16:16] yeah, safe travels [16:17] * ogra_ thinks asac needs to move to a sane city ... in Kassel the shops only close at 10pm on saturdays [16:17] Hamburg ... provinz ... [16:18] anyway, back to lawn mowing so i get at least one thing done [16:20] ogra_: did we revert mir in the end? [16:21] let me check changes [16:21] oh, the package got renamed [16:21] rsalveti: nope, ogra got this in https://launchpad.net/ubuntu/+source/livecd-rootfs/2.254 [16:21] did someone test this before landing? [16:22] not pointing fingers, just trying to understand what happened [16:22] rsalveti: read the mailing list threads. QA signed it off originally because they were instructed to do a manual workaround (won't happen again...), but I'm not sure what's the story behind the mir stuck in -proposed and then two days later migrating to -release [16:23] ie this problematic Mir was published on Thu already [16:24] oh, right [16:24] yeah, I remember we had a few issues with this landing [16:24] but well, guess we're all fine now [16:24] let me flash latest [16:25] maybe the propose migration happened after meta was updated [16:33] everything seems fine now again, aside from damage caused to dogfooders of course [16:33] guess ci lab as well [16:59] === trainguards: IMAGE 277 DONE (finished: 20141011 17:00) === [16:59] === changelog: http://people.canonical.com/~ogra/touch-image-stats/277.changes === [17:29] has there been any breakage reported yet on the taking down of the image server? [17:29] other than the bug I filed against ubuntu-device-flash [18:13] ogra_, asac: from the latest mails, I understand that a fixed image has been published; is there anything else needed this morning? [18:14] Mirv: it was stuck in 14.09-proposed for a while because it required ubuntu-touch-meta to be updated before it could migrate (due to dependencies on -android alternative packages from there, which had been renamed), and I therefore wanted to get the mir landing into utopic so that I could keep ubuntu-touch-meta in sync [18:14] we finished that off last night [18:15] but it appears nobody considered that livecd-rootfs changes might be needed [18:15] I suspect the ultimate fix for this is per-silo image building [18:15] we could actually have done that even with today's technology, though it requires some setup [18:17] Wellark: I think it's harsh to say that you had no contact for the release team; I responded in under an hour from being SMSed, which OK may not be a stupendous SLA but I don't think it's bad for somebody not on duty on a Saturday [18:18] ogra_: right, I see your livecd-rootfs fix, I figured that's the approach you'd take to un-break it... I think that's fine for a quick fix, but I really don't like forcing things with update-alternatives. I think we should instead fail the image build whenever this happens. [18:18] Wellark: and a community member took action 26 minutes after being asked on #ubuntu-release [18:18] slangasek: agreed [18:23] cjwatson: at the time of the decision, we didn't have a contact [18:23] and there was no guarantee to have a contant on Saturday [18:24] that was _not_ to blame the release-team not working on Saturday [18:25] but there was no reliable ETA when we might get a hold on the team to do the image publishing [18:25] and we have IS for emergencies 24/7 [18:25] right and I think that's the way it should be, but I'd like you to be a bit more careful about how you phrase it :) [18:26] (also, fixing ubuntu-rtm did not involve the release team) [18:28] cjwatson: I don't understand what you are referring to. Where should I have been "more carefil" in my phrasing? [18:31] + the utopic images were broken as well (as far as we knew), so it's irrelevant for the decision making if ubuntu-rtm does not involve the release team [18:32] most of our dogfooders are using utopic-proposed images on n4 [18:39] slangasek, its all fine for now ... will need some cleanup work on monday though [18:42] slangasek, my biggest obstacle here is that we dont have an easy "one click rollback" mechanism for either ... (images as well as silos) and i think we should have a planning session for this at the sprint ... [18:43] anyway ... back into my evening ... i spent enough time online today :) [19:24] Wellark: I thought it sounded as though you thought you (plural) ought to have had a contact in the release team [19:26] Wellark: My understanding is that the utopic images failed to build rather than breaking in this particular way. I did upload a fix for the build failure, but my fix didn't land independently, it was superseded by ogra_'s upload and so there was no image build with just that. Is that incorrect? [19:27] Wellark: The cdimage logs indicate that there was no successful utopic build between 20141010 (log timestamped 2014-10-10 02:57 UTC, so before the mir landing) and 20141011.1 (log timestamped 2014-10-11 15:55 UTC, so after the fixes) [19:28] So I don't think utopic-proposed can have been broken for users at any point