/srv/irclogs.ubuntu.com/2014/10/11/#ubuntu-ci-eng.txt

imgbot=== trainguards: IMAGE 277 building (started: 20141011 02:10) ===02:09
imgbot=== trainguards: RTM IMAGE 98 building (started: 20141011 03:10) ===03:09
imgbot=== trainguards: RTM IMAGE 98 DONE (finished: 20141011 04:25) ===04:24
imgbot=== changelog: http://people.canonical.com/~ogra/touch-image-stats/rtm/98.changes ===04:24
popeylatest image on krillin has been sat for over an hour at bq screen and shows "offline" in adb...09:51
ogra_popey, see ML09:58
ogra_popey, i assume it is the "fix" for https://bugs.launchpad.net/mir/+bug/137086609:58
ubot5Ubuntu bug 1370866 in mir (Ubuntu RTM) "Overly strict libmirplatform* dependencies are blocking CI" [High,Fix released]09:58
ogra_we end up with the -mesa package installed ... i bet that forces a wrong alternative09:58
popeyugh10:01
popeythanks ogra_10:08
popeyogra_: how do you get krillin into recovery?10:11
popeynvm, got it eventually10:11
ogra_popey, the adb emergency shell should kick in as well10:15
popeynow I can't flash it10:16
popeykrillin not found on server https://system-image.ubuntu.com channel ubuntu-touch/ubuntu-rtm/14.09-proposed10:16
ogra_theoretically you should just need to wait til lightdm stops respawnin10:16
popey ubuntu-device-flash --channel=ubuntu-touch/ubuntu-rtm/14.09-proposed --revision=97   is what i did10:16
popeyare krillin images still on some hidden server?10:16
ogra_no10:16
ogra_try with --device10:16
popeythat got it10:17
popeythanks10:17
popeywent back to 97 and it's still showing offline10:31
ogra_with a session up, a password set and developer mode enabled ?10:31
popeywas sat at bq logo for an age again10:32
ogra_(i think ubuntu-device-flash unsets it)10:32
popeyseems to have moved on now10:32
popeyits online now10:32
ogra_ah10:32
ogra_yeah, no adb on bq logo anymore10:32
ogra_it only starts after lightdm nowadays10:32
* popey goes to drink tea and stop playing with computers10:32
popeythanks for the help ogra_ !10:33
ogra_well, i wonder what to do to not leave people out there in the dark10:33
ogra_(for the whole weekend)10:34
* ogra_ wanta a "one button rollback" for landed silos !10:35
ogra_*wants10:35
WellarkI don't think there is anything to add..11:01
Wellarkhttps://www.youtube.com/watch?v=Fr0A7TofowE11:01
Wellarkogra_: is there anything I can help with?11:01
Wellarkhave we removed the broken revision from the image servers?11:02
Wellark(not that I have the power to do that)11:02
ogra_Wellark, nope, i'm working on a build system hack to force the alternative ... that should get us back to life11:02
Wellarkogra_: ok. thanks for your hard work!11:02
ogra_np, i dont want to leaver the people out there in the dark for the whole weekend :)11:03
Wellarkthis is probably the worst problem we have faced with our images so far11:03
Wellarkyes. I totally understand that11:03
Wellarkand if there is anything I can help just ping11:03
WellarkI'll try to watch irc from time to time11:03
Wellarkogra_: only rtm affected?11:04
Wellarknot n4?11:04
ogra_heh, there is rtm for n4 too11:04
ogra_i think all images are affected11:04
nik90n4 affected as well11:05
Wellarkright.11:05
nik90I did a really stupid thing by using -bootstrap while using ubuntu-device-flash to recover from the issue and now I lost all my data :/11:06
* nik90 switches to rtm stable image #4 for a while11:06
Wellarknik90: had a lot of personal data?11:10
nik90Wellark: I was dogfooding the phone as my primary phone for the past 3 weeks...most of them recoverable but local app data lost11:11
nik90not a lot tl;dr11:11
Wellarknik90: ;(11:11
Wellarkmaybe we should add a confirmation step to ubuntu-device-flash if you declare --wipe or --bootstrap11:12
Wellarkas it's so easy to just add them in a rush..11:13
nik90my device was stuck at the google logo and adb devices didn't list it, so I had to go into the recovery mode and use --bootstrap argument to fix the situation. but only later on I realised what I had done11:13
nik90yeah that would help11:13
Wellarknik90: would you be able to file the bug?11:14
nik90Wellark: yes, on it11:14
Wellarkmy wife is already giving me bad looks11:14
Wellarknik90: thanks!11:15
ogra_nik90, how long did you wait for adb ... the emergency shell will kick in after 5-10 min (once lightdm timed out trying to respawn)11:15
ogra_(happens fine here)11:15
ogra_indeed that only works for OTA when developer mode was already working before (or if you flashed with --developer-mode and --password)11:16
nik90ogra_: well the first time I noticed it was stuck in the google logo, I waited for about 20-30 mins,...then panicked and rebooted the phone ..at which point I didn't wait too long11:16
ogra_hmm11:17
ogra_and you had dev mode enabled and working before the OTA ?11:17
nik90yup I do clock app dev using qtc on device everyday11:17
ogra_(i mean ... it is moot, i'm supposed to remove the emergency shell anyway soon, then you wont get in at all anymore if the session doesnt start)11:17
ogra_(though i posted a workaround to get adb to the ML in my second mail)11:18
ogra_(that will always work)11:18
nik90ah, do you mind posting it to ML, might come in handy in the future11:19
ogra_<ogra_> (though i posted a workaround to get adb to the ML in my second mail)11:21
ogra_read closer ;)11:21
nik90oh :)11:21
nik90I thought you said you were going to post it, eh my bad11:21
nik90english :P11:21
nik90Wellark: bug 138005511:23
ubot5bug 1380055 in phablet-tools (Ubuntu) "Image flash arguments like -wipe and -bootstrap should ask for data wipe confirmation" [Undecided,New] https://launchpad.net/bugs/138005511:23
ogra_err what ?11:24
ogra_if you define -wipe you pretty clearly state that you want to wipe your data11:24
nik90ogra_: my bad again, I should remove -wipe from the bug report11:24
ogra_not sure about bootstrap ... but people that know what that word means should surely expect it to wipe as well11:25
nik90ogra_: well it still wouldn't be a bad idea to ask for a confirmation..kinda like when you do shift-delete11:25
nik90or shutdown the comp11:25
ogra_well, that would need a lot of bigger changes11:26
ogra_since you also need an override for the question then11:26
ogra_to not break all landings and smoke tests11:26
nik90oh11:26
ogra_(where u-d-f is widely used automated)11:26
nik90I hadn't thought of the repercussions11:26
Wellarknik90: thanks111:27
asacogra_: i upgraded and my phone doesnt boot anymore (krilin ubuntu-rtm) ... anything bad happened?12:12
asac(hello! :)12:12
ogra_asac, yep12:12
ogra_see the ML12:13
asacogra_: cant we pull the binaries?12:13
ogra_i added a hack to livecd-rootfsto work around that issue ... but nobody from the rellease team seems to be around12:13
ogra_asac, to big ... forcing the right alternative is easier for now (and rolling back would need the release team the same way)12:14
asacogra_: i mean, pull the binaries from system-imamge server12:14
asacso folks dont hit this that havent yet12:14
ogra_asac, the last time i did that everything fell apart ...12:14
ogra_(copying an older image to be a newer one ... we cant just "pull" we need a newer image that replaces)12:15
asacogra_: texted langasek12:15
asacslangasek: ^12:15
Wellarkcould we then take the image server offline?12:15
ogra_bit early12:15
WellarkIS can do it?12:15
asacogra_: when and how can we recover?12:16
Wellarkthey have emergency contanct 24/712:16
ogra_asac, you should have texted cjwatson if anyone at this time12:16
ogra_asac, as i said, a livecd-rootfs fix is in unapproved12:16
asacogra_: also texted him12:16
Wellarkit might stay there for a while12:16
ogra_i need some release team person to let it through12:16
asacnow12:16
asacogra_: any way i can recover my phone?12:17
ogra_asac, i put all my forensic findings on the ML in case you want to see whats wrong12:17
asaci need it kind of12:17
ogra_asac, did you have dev mode enabled ?12:17
ogra_before the last OTA12:17
ogra_then you should get an emergency adb shell after lightdm gave up respawning (5-10min)12:18
asacyeah12:18
ogra_go in and do the following:12:18
ogra_mount -o remount,rw /12:18
MirvI wonder if there was rtm bug for doing those metapackage changes at this hour. well, what's done is done.12:18
ogra_(err sudo indeed)12:18
ogra_Mirv, well, there was a bug, there is even a fix ... and the issue was also seeen during silo testing12:19
Mirvogra_: that bug #1370866 (fix for which caused this apparently) by itself was not a rtm bug, but it was reportedly blocking CI some way12:20
ogra_asac, then: sudo apt-get purge libmirplatform3driver-mesa libmirclient8driver-mesa12:20
ubot5bug 1370866 in mir (Ubuntu RTM) "Overly strict libmirplatform* dependencies are blocking CI" [High,Fix released] https://launchpad.net/bugs/137086612:20
ogra_asac, and then: sudo mount -o remount,ro /12:20
ogra_exit the shell and adb reboot12:20
ogra_that should get you the session back12:20
ogra_Mirv, right, that was the initial bug ... see what i sent to the ML12:21
Mirvanyway, getting that livecd-rootfs in would be enough. I just wonder if it could be automated that a) each MP is related to a bug report, b) the bug report has rtm tag12:21
ogra_there is a followup bug that was filed after silo testing12:21
ogra_that silo should never have landed without the second bugfix12:21
Mirvright, I read that12:22
Mirvoh yeah, and Mir has failed testing by QA12:22
asacogra_: cant we do something without release team?12:23
asacso we can roll a new image?12:23
ogra_asac, not really12:24
Mirvogra_: ah, but this is the original, already abandoned Mir 0.8.0 release that was stuck in -proposed for a reason. somehow it ended up to release pocket last night.12:24
asacogra_: have you pinged everyone in the release team?12:24
ogra_asac, everythign will get stuck in unapproved12:24
ogra_asac, no, i have pinged #ubuntu-release12:24
asacogra_: please ping them all directly12:24
asacmaybe someone is around12:24
asacthey might not watch channel, but highlight might do12:24
ogra_asac, i suspect most of them are on planes right now ... to duesseldorf12:25
Wellarkasac: we can pull down the image server12:25
asacogra_: well, worth a try anyway12:25
ogra_Wellark, nope, we cant12:25
Wellarkogra_: sure we can12:25
asacwe can make the index files RO12:25
asacerr 00012:25
ogra_Wellark, only IS can12:25
asacyes is should be reachable12:26
Wellarkogra_: yes, and IS has 27/4 emergency contact12:26
Mirvthere's a built and tested but not QA sign:d off Mir 0.8.0 retry release in a silo (and has already landed in utopic)12:26
Wellarkanyway the phones must survive on an outage of the image servers12:27
Wellarkand the broken images are being OTA'ed as we speak12:27
Mirvbut that doesn't include the bug #1378995 fix either so the livecd-rootfs fix is needed anyhow12:27
ubot5bug 1378995 in Mir ""citrain upgrade-device" fails to upgrade mir properly (Mesa driver installed instead of Android makes the phone unbootable)." [Critical,In progress] https://launchpad.net/bugs/137899512:27
ogra_any of these steps will take time12:27
ogra_lets simply see that my fix can land and we should be fine12:28
asacogra_: but we cant12:28
Wellarkasac: your call12:28
ogra_asac, no matter what we do, all actions we can take will take a similar amount of time12:28
WellarkI say let's bring down the image servers until we have a fix12:28
asacogra_: making image server unavailable would be super quick12:29
asactemporarily12:29
WellarkIS can make that happen in minutes12:29
ogra_well, your call12:29
asacwhat will happen?12:29
ogra_no idea12:29
ogra_we have never done it12:29
ogra_and i personally find that insane12:29
Wellarkasac: taking the servers down will make sure the OTA updates stop fowing12:29
asacwell, but who knows how our phone will reacht12:29
Wellarkwell, we anyway have to test if the phones can survive image server outtages12:29
Wellarkdo we want to test that after we ship?12:30
Wellarknothing should happen12:30
asacno, we have to do a big post mortem about this for sure12:30
Wellarkas it's nothing else than a network outtage12:30
Wellarkor if something happens we have to catch that before we ship12:30
ogra_asac, well, as i said, all forensic work is documented in the mail thread12:30
asacnah12:30
asactahts not it12:30
asacfar bigger issues12:31
ogra_the issue is that a silo landed that should never have been approved ...12:31
Wellarkwell, that's a problem for later12:31
ogra_for which an issue was known and even a fix exists already12:31
Wellarkright now we have bricking OTA updates being server from the image servers12:31
Wellarksomething that might happen after we ship as well, so this is also a valid case for testing what happens if we have to bring down the image servers for whatever reason12:32
ogra_asac, further i want a meeting at the sprint about one click rollbacks ... we should keep some kind of landings DB (keeping the packages file of a silo in there) so we can have easy rollbacks12:32
Wellarkas ogra said: we need release-team to roll out new images12:33
Wellarkand we can't reach them12:33
Wellarkwe can however reach IS12:33
asacok called IS12:33
asacleft a message12:34
Mirvogra_: do you know how the mir ended in release pocket? it did fail testing, the landing was abandoned from CI Train. I didn't catch why it was kept in -proposed in the first place instead of deleting last week.12:34
Wellarkasac: emergency number did not answer?12:34
ogra_Mirv, no idea, CI train issue ?12:34
asacWellark: its a voice box that pages the guy on duty12:34
Wellarkasac: ah, ok12:34
asacnot sure how long it will take12:34
Wellarkasac: you also pinged at #is ?12:34
ogra_you guys are aware that we have no chance to get fixes to the people that have the issue (to at least OTA via adb) ?12:35
ogra_if you take down the server12:35
Wellarkogra_: as soon as we have fixed images ready we online the servers again12:36
Wellarkand they can fix by ubuntu-device-flash12:36
Wellarkgoing hacking around with rootfs is not a solution12:36
ogra_nobody said that anyone should hack around12:37
ogra_people that dont have the issue will automatically get the next image ... people that have the issue can use system-image-cli via emergency adb shell12:37
Wellarkogra_: people who don' have the image might download it while we wait for the fix to land in the server12:37
* ogra_ wonders ... 12:38
ogra_let me try something12:38
Wellarktaking the server down makes sure nobody else gets the boroken image12:38
Mirvogra_: it was published on Thursday but stuck in -proposed. I see actually on the landing chart it was QA sign-off:d originally, and that's why it probably got published.12:38
Wellarkthose who already got it are screwed already12:38
asacogra_: can you join #is too?12:38
Wellarkwe need to stop spreading the bricking images until the fix is in the servers12:38
asacWellark: well, thats what we are doing12:40
Wellarkasac: yes. and IMO that's the right solution12:40
* ogra_ tries a direct dput to rtm instead12:43
ogra_that should get us at least a working rtm image again12:44
ogra_ha, that worked12:44
ogra_https://lists.ubuntu.com/archives/rtm-14.09-changes/2014-October/000681.html12:44
ogra_so within 2-3h we should be back in business12:46
asacogra_: cant you join #is?12:46
asacogra_: i need info which servers produce images12:46
ogra_asac, i have no idea ...12:46
asacogra_: sure?12:47
ogra_asac, they get synced from nusakan to the http server ... i dont know what that hattp server is12:47
asacogra_: you dont know which server produces the system images?12:47
asacthat i know12:47
asacogra_: i need to know who produces them12:47
ogra_nusakan produces them, buit that wont help you12:47
ogra_since you definitely dont want to stop nusakan12:48
ogra_but tear down the http machine that provides them12:48
asacthats not why i asked it12:48
ogra_whatever is system-image.u.c12:48
asacyes, but i want nusakan to still be able to talk to that server12:48
asacmaybe it uses info from that server12:48
ogra_https://lists.ubuntu.com/archives/rtm-14.09-changes/2014-October/000681.html12:48
asaci dont want to risk that we cant produce images anymore12:48
ogra_err12:49
ogra_sigh12:49
ogra_91.189.88.3512:49
asacso we still allow that server to talk to it12:49
ogra_thats the IP12:49
asacthe name is good enough12:49
asacthanks12:49
asacogra_: why do we touch livecd?12:52
asacwhy not just backout this mir stuff12:52
asaci am scared about touching image bits in a hurry12:52
ogra_asac, because it can always happen that some broken dep pulls in the mesa packages .. the livecd-rootfs fix makes sure that in such cases alway the android alternative for the driver is used ...12:53
ogra_asac, backing out that mir stuff will be a day of work i guess, its a ton of packages12:54
ogra_i was aiming for a quicker fix (and for one that fixes the issue once and for all for the future too)12:54
asacwell, lets hope12:55
asacthat this doesnt cause the next fire12:55
asacogra_: thanks!12:55
ogra_asac, if it does i'll put it out :)12:56
ogra_the fix in livecd-rootfs is needed in any case12:56
ogra_(we have the samee for hybris already)12:56
asacogra_: ok, so systme-image is 403 now12:57
asacfrom everywhere but nusakan12:57
ogra_ok12:57
asacin csae nusakan needs something12:57
ogra_well, nusakan uses ssh to copy12:57
ogra_no need for http12:57
asacogra_: can you check the image before we make system-image available again?12:57
asacdoesnt matter12:57
ogra_not easily12:57
asacits safer12:57
asacthey might look at index.json or something12:58
asacwho knows12:58
cjwatsonok too much scrollback12:58
asacbetter dont make it unavailable12:58
cjwatsonwhat's current status?12:58
asaccjwatson: hah :)12:58
ogra_cjwatson, lol12:58
asaccjwatson: we have taken system-image down ... make it 40312:58
cjwatsonI see somebody accepted my livecd-rootfs upload from this morning12:58
ogra_cjwatson, i dput the livecd-rootfs fix to 14.09-proposed ... setting the alternatives to android12:58
asaccjwatson: someone from releas team is helping approving livecd-rootfs, please double check that this change is safe if you can!!12:58
asacthanks so much!12:58
cjwatsonasac: I uploaded it12:58
cjwatsonwhen I saw image build failures this morning12:59
cjwatsonogra_: is that identical to my upload?12:59
ogra_cjwatson, we ended up with -mesa and -android packages in the image ... the build picking the wrong alternative12:59
ogra_cjwatson, niope12:59
cjwatsonpaste me the diff?12:59
asaccjwatson: current issue is that image bricks phones... not fails to build12:59
ogra_cjwatson, i added to the libhybris alternative forcing snippet12:59
cjwatsonasac: I definitely saw build failures this morning12:59
ogra_cjwatson, http://bazaar.launchpad.net/~ubuntu-core-dev/livecd-rootfs/trunk/revision/97912:59
asaccjwatson: ok i will step out for a bit, check ogras upload12:59
cjwatsonogra_: oh I see12:59
ogra_cjwatson, i'll need to clean that up next week to remove the 3/8 in the package names13:00
cjwatsonmight need some thought later but doesn't seem obviously wrong13:00
cjwatsonright13:00
ogra_in any case that should make us bootable again13:00
cjwatsonso is that in utopic too?13:00
ogra_yes13:00
ogra_i did dput to rtm before it got out of unapproved in utopic though13:00
cjwatsonah you uploaded it separately13:01
cjwatsonok13:01
ogra_now waiting for rmadison :)13:01
cjwatsondoesn't look like anything for me to do now, your patch looks good to me13:01
asaccjwatson: do you know how we could copy the previous image on top?13:01
asaci think that would even be better because folks might have downloaded in background13:02
cjwatsonno13:02
asacand in this way they get something new13:02
cjwatsonno doubt it's possible but me messing around with the server with no experience is not the path of wisdom13:02
asacok. i am sure we need that13:02
asacas a simple feature documented for such cases :)13:02
asacbut lets make a post-mortem i guess after13:02
cjwatsonunless Stéphane happens to be around, right now a new build will likely be faster13:02
cjwatsonit's clearly possible if nothing else by editing the json, but I don't want to make anything worse13:03
ogra_as i said above ... last time i tried to copy an older image on top hell broke lose13:03
ogra_we need stgraber for that magic13:04
ogra_and we definitely need system-image training at the sprint13:04
ogra_(and if you would just edit the json the deltas might all be wrong, that wouldnt work anyway)13:05
Mirvdo we have progressive image release implemented but not enabled, or something? like first to 1% of users, then 5%, 10% etc. similar to SRU updates13:09
ogra_ i dont think we have that implemented13:09
ogra_sounds like a good sprint topic too13:10
Mirvyes13:10
cjwatsonMirv: heh, I have a to-do item from asac to talk to the landing team about that ;-)13:12
cjwatsonok, I'm afraid I have to go, I had other plans for today13:13
Wellarkok, one bug found. https://bugs.launchpad.net/ubuntu/+source/phablet-tools/+bug/138007913:14
ubot5Ubuntu bug 1380079 in phablet-tools (Ubuntu) "can't use ubuntu-device-flash offline even for cached images" [Undecided,New]13:14
cjwatsonhow are we going to smoketest the fix given that system-image is refusing access?13:17
Wellarkcjwatson: can't we flash them manually with phablet-flash?13:17
cjwatsonwell you still have to fetch it somehow13:17
Wellarkor fix ubuntu-device-flash and download the images directly to the local cache13:17
Wellarkcjwatson: IS an get us a copy of the images ones they are built and in the servers13:18
ogra_hmm13:18
cjwatsontrue, ogra_ can grab it directly too13:18
Wellarkas the machinery is still able to push new images13:18
ogra_i cant get into nusakan13:18
Wellarkwe allowed nusakan to access the image servers13:18
ogra_ah, just took long13:18
cjwatsonWellark: pointlessly13:18
Wellarkogra_: are you familiar with ubuntu-device-flash?13:19
cjwatsonWellark: the block was only http; nusakan does not push stuff over http :)13:19
ogra_Wellark, only as a user ... i dont touch go code13:19
ogra_(or didnt touch go code yet at least)13:19
Wellarkcould someone come up with a quick patch we can apply locally so that we can download the new images straight under .cache as udf would do from the servers and let it flash them13:19
ogra_and i also dont have an idea how u-d-f would allow me to flash something not from a system-image server13:20
WellarkI can tako on it as well13:20
ogra_i doubt thats possible13:20
Wellarkit is13:20
cjwatsonhonestly, it would probably be easier to lift the http block for five minutes13:20
Wellarkwell, that too.13:20
ogra_(beyond flashing your own device tarball ... which isnt what we want)13:20
cjwatsonfor this time13:20
ogra_cjwatson, ++13:20
Wellarkcjwatson, ogra_: device-flash supports custom servers13:20
ogra_i triggered an rtm build ... bot should announce in a few13:20
cjwatsonnext time we could push the fix to a custom channel13:20
Wellarkso we ask IS to point the root somehwere13:20
cjwatsonor something13:20
ogra_Wellark, yes, but you still need a working server setup first13:21
Wellarkogra_: we have that13:21
Wellarkit's just blocked.13:21
cjwatsonI generally think it is unwise to invent process mid-crisis13:21
Wellarkis can change the root13:21
ogra_cjwatson, yeah13:21
cjwatsonsimpler to go back to what we had for a short while13:21
Wellarkok. agreed.13:21
sergiusenscjwatson: Wellark: ogra_ whatever fix you want would require a copy of channels.json to be available and the device json as well13:22
ogra_well, and all the related filesystem structure i guess13:23
cjwatsonlike I say that sounds like a bad thing to try to do on the fly13:23
ogra_yep13:23
cjwatsonesp on a weekend13:23
sergiusens+113:23
ogra_lets just switch on http for a test13:23
Wellarksergiusens: for the future, could udf cache the last succesfull .json files so we could flash the cached images offline?13:23
sergiusenswe need to fire drill13:23
sergiusensand have somthing for crisis management written down though13:24
cjwatsonok, really off, SMS my mobile if you need me again.  I've turned up the volume so I should hear it more promptly this time13:24
ogra_(well, lets get an image first :) )13:24
ogra_cjwatson, thanks for showing up !!13:24
sergiusensWellark: yes, but as I said, nothing will solve your immediate problem13:24
Wellarksergiusens: that's the "for the future" part ;)13:24
sergiusensWellark: immediate future?13:24
sergiusens:-)13:24
ogra_we should definitely have some sprint topics out of this issue13:24
Wellarkas all of the files are already in my local .cache,13:25
WellarkI just can't flash them13:25
sergiusensogra_: disaster recovery? I guess that would involve someone from IS for sure13:25
ogra_1) easier rollbacks 2) system-image training 3) staged release process13:25
ogra_we lacked in all three bits today13:25
Wellark4) critical servers are isolated so we can take them down13:25
sergiusensogra_: it is already staged, problem is, we are all on -proposed ;-)13:25
Wellarkwithout hosing everything13:25
ogra_sergiusens, well, rool out staging that is ... only roll out to 1000 people, then to 10000, then 100000, then the rest13:26
ogra_*roll13:26
sergiusensogra_: the canary stuff13:26
ogra_right13:26
Wellarkcould someone write an update to the ML13:26
sergiusensogra_: still, this doesn't happen if everyone would run the stable channel13:27
WellarkI don't think I'm the right person for that13:27
ogra_that way we dont need to do insane things like tearing down servers13:27
Wellarkasac: --^?13:27
ogra_Wellark, about what ?13:27
ogra_we'll announce the fix once it is there13:27
ogra_people should know by now that there is an issue and it is being worked on13:27
Wellarkogra_: well, just saying that the image servers are down.. and will be restored when new image is ready.13:28
ogra_(if they followed the thread)13:28
Wellarkas now udf is totally unusable13:28
Wellarkfor any channel13:28
ogra_Wellark, well, its a ML ... just write something if you feel like :)13:29
Wellarkogra_: I already said I might be the best person, but I will write the email then :)13:35
imgbot=== trainguards: RTM IMAGE 99 building (started: 20141011 13:36) ===13:36
asacogra_: i sent mail13:38
asacWellark: ^^13:38
ogra_asac, ok13:38
Wellarkasac: oh, you beat me to it.13:42
WellarkI also hit "send" already13:42
asacanything else that needs doing or are we waiting for image?13:42
Wellarkoh, seems we have the same contents :)13:42
Wellarkalthough your topic is more catching13:43
ogra_asac, image is building ... we need http re-enabled then to test OTA13:43
ogra_(i have my production device still on yesterdays image and can easily test then)13:43
asacogra_: can you check the new image offline somehow first?13:43
Wellarklet's just make sure all of the images are there before enabling..13:44
asacbefore reenabling OTA?13:44
ogra_not easily13:44
Wellarkwell, as I said, udf support --server13:44
ogra_and i think we also want to make usre OTA works, no ?13:44
asacwell, you can login, unpack, check that the alternative isthere?13:44
asacsure, but only after double checkig that the fix did what it was supposed to do13:44
asacif thats possible13:44
ogra_Wellark, it is saturday ... i surely wont set up a server now here13:44
Wellarkit would be easy for IS to relocate the image server to test.touchimages.ubuntu.com or something13:44
ogra_we can enable http for 5 min coordinated with IS13:44
Wellarkogra_: it's just apache configuration13:44
Wellarkthey are virtual servers13:44
Wellarktakes like 3 lines of apache configuration13:45
ogra_Wellark, seriously, your comments arent helping13:45
Wellarkogra_: asac asked, I replied13:45
asaci didnt ask13:45
asaci wanted to know if ogra can look at the image before it goes out13:45
asacor someone13:45
asacanyway13:45
ogra_enabling http for the time it takes to download the OTA should be enough to verify13:45
Wellark16:43 < asac> ogra_: can you check the new image offline somehow first?13:45
ogra_i'll know if it boots within 1min after it flashed13:46
ogra_Wellark, and i said i cant easily13:46
Wellarkthere is a way. I'm just concerned if the image is broken13:48
Wellarkwe do have a way of testing them without enabling OTA on the existing devices13:48
Wellarkasac: your call.13:48
Wellarksergiusens: is --server working?13:49
Wellarkhas it been tested?13:49
Wellark(just thinking of the accident report future work section here=13:50
Wellarkalso we need a graphical disaster recovery tool to protect against bricked devices on incident like this after we ship13:52
Wellarkas end users having to get their phones to service centers for flashing will cost everyone too much13:53
Wellarksmall price to pay on a graphical flasher13:53
ogra_oh, hmm, the bot wont announce if the image is done (since it uses the json file to check that)14:01
* ogra_ wonders what else in the infrastructure might now have fallen apart due to the server being 40314:02
Wellarkogra_: as long as the image builder is able to build the images we are fine14:03
WellarkIS can copy the files over, if it goes to that14:03
ogra_Wellark, except that changelog generation scripts and other stuff might be completely broken and will need a bunch of work next week14:03
Wellarkogra_: yes. we need buch of work next week.14:04
asacogra_: can you try to write or tell me how folks will need to recover from bricked state?14:04
ogra_right, to recover from the mess tearing down the server causes14:04
asacwould like to prep that mail14:04
ogra_asac, yes, once we know it is fixed14:05
ogra_lets have an image first14:05
asacogra_: well, would like to prep the mail :)14:05
Wellarkogra_: there was no other choice14:05
asacif ou dont know yet its fine14:05
asacbut thought we are waiting for image to be done so maybe we have a bit time :)14:05
ogra_well, theoretically just: adb shell system-image-cli14:05
asaconly if developer mode right?14:05
ogra_Wellark, there were other choices we simply didnt take14:05
asacguess we need soomething better :/14:06
asacubuntu-deice-flash?14:06
Wellarkwe simply can't ship bricked images14:06
Wellarknothing is worse14:06
ogra_asac, well, without dev mode there os only --bootstrap flashing from fastboot14:06
ogra_which wipes the device though14:06
Wellarkand --bootstap will nuke all the dogfooders data14:06
asacthats awful14:06
ogra_yes, thats on purpose14:06
asacits not good enough14:07
ogra_--bootstrap is for getting you a freshly formatted device ...14:07
Wellarkwhich gets us back to: _we simply can't ship bricked images_14:07
* ogra_ goes afl for the next hour til the image is done 14:07
Wellarkthe decision to take the servers down was the right one14:07
Wellarkas we had no contact for the release team14:07
Wellarkand no ETA when we get fixed images done14:08
asacogra_: why does normal ubuntu-device-flash not work?14:09
asacthat should for official images imo14:09
ogra_it does in recovery indeed14:11
asaccool14:11
ogra_rootfs is done ...14:13
* ogra_ waits for import-images to pick it up on the machine 14:13
ogra_ah, it started ...14:15
ogra_another 30-45min and we should be good14:15
Wellarkasac: so, just to verify? we will push an image without even a smoketest to the public servers and make it available for OTA clients?14:22
ogra_err14:29
ogra_http://people.canonical.com/~ogra/touch-image-stats/rtm/20141011.1.changes14:29
ogra_cjwatson, was that your change ? http://people.canonical.com/~ogra/touch-image-stats/rtm/20141011.1.changes ... it now just dropped the packages completely14:30
ogra_thats not right14:30
ogra_oh, it *is* right, sorry14:31
* ogra_ checks the manifest to make sure14:31
asacogra_: how long before we want to turn on http?14:31
asacfor quick try?14:31
asac20 min?14:31
ogra_asac, gimme a few mins to verify evereything is as it should14:31
asacogra_: ok will be back in 10 minutes and then call IS14:31
ogra_sounds good14:32
asacif you give me a go. will tell them to stay around while you test the OTA14:32
ogra_asac, ok, ready whenever you are ... the package changes are fine14:33
ogra_asac, btw, i'm not sure how exactly the "there is a new image" notification system works, but it could well be that we trashed it too by setting the server to 403 (not sure it has ever been tested with that case)14:43
ogra_instead of taking it down the respective subdirs should perhaps been made readonly instead14:43
asacogra_: is all already on system-image?14:44
ogra_no idea, i can only see nuaskan :)14:44
asacah ok14:44
ogra_but it is all where it should be14:44
ogra_(on that machine)14:44
ogra_so i assume it is fine on s-i as well14:45
asacok i called14:46
imgbot=== trainguards: RTM IMAGE 99 DONE (finished: 20141011 14:55) ===14:54
imgbot=== changelog: http://people.canonical.com/~ogra/touch-image-stats/rtm/99.changes ===14:54
asacogra_: so utopic cannot build?14:58
asacand utopic has this bustage too?14:59
ogra_not sure it does14:59
ogra_but it definitely had build issues tonight ...14:59
ogra_let me trigger a build14:59
asacwe should have checked that. i thought we had it fixed everywhere now14:59
* asac sighs14:59
asacbut well14:59
ogra_fired up ... but dont hold your breath15:00
asacpoint is that we might now let more utopic folks run into death while we fix that image build15:00
asacso we should have brougth both up15:00
ogra_https://launchpad.net/~ubuntu-cdimage/+livefs/ubuntu/utopic/ubuntu-touch/15:00
ogra_(thats the utopic build)15:00
asacogra_: can you check if the current image has the mir landing?15:01
ogra_Setting up libmirclientplatform-android:armhf (0.7.3+14.10.20140918.1-0ubuntu1) ...15:02
ogra_update-alternatives: using /usr/lib/arm-linux-gnueabihf/mir/clientplatform/android/ld.so.conf to provide /etc/ld.so.conf.d/arm-linux-gnueabihf_mirclientplatform.conf (arm-linux-gnueabihf_mirclientplatform_conf) in auto mode15:02
ogra_Setting up libmirclient8:armhf (0.7.3+14.10.20140918.1-0ubuntu1) ...15:02
ogra_Setting up libmirplatformgraphics-android:armhf (0.7.3+14.10.20140918.1-0ubuntu1) ...15:02
ogra_update-alternatives: using /usr/lib/arm-linux-gnueabihf/mir/platformgraphics/android/ld.so.conf to provide /etc/ld.so.conf.d/arm-linux-gnueabihf_mirplatformgraphics.conf (arm-linux-gnueabihf_mirplatformgraphics_conf) in auto mode15:02
ogra_Setting up libmirserver25:armhf (0.7.3+14.10.20140918.1-0ubuntu1) ...15:02
ogra_should be fine actually15:02
asacok, then i dont think we should stress ourselves out15:02
asacthat the image doesnt build15:02
ogra_(thats from the last successful build, no -mesa alternatives)15:02
asacif it doesnt15:02
ogra_well, my fix will in any case force the right alternative in utopic too15:03
ogra_so we should be a) safe with whatever is available ... and b) be safe for future images (of tehy build at all now)15:04
asacogra_: , can you scribble recover instructions? Just do the ones with recovery mode... i dont want to add multiple options in the main mail15:06
ogra_reboot to recovery with the right key combo for your device (they differ by device) and run: ubuntu-device-flash --channel ubuntu-touch/ubuntu-rtm/14.09-proposed --device=$your-device15:08
ogra_that shoudl be all thats needed ...15:08
ogra_if you can still adb shell: "adb shell system-image-cli -v"  will be enough15:08
asacogra_: do we have documented how to boot in recover for krillin and mako?15:09
ogra_not sure, perhaps on the install page15:09
ogra_but i dont think we cover krillin anywhere public15:09
asacthats fine15:10
asacjust mako15:10
ogra_let me try my mako15:10
asacwhere is it?15:10
ogra_http://developer.ubuntu.com/start/ubuntu-for-devices/installing-ubuntu-for-devices/15:10
asacogra_: please validate that the above works15:10
asacand i will send announce with that info15:10
asacogra_: there is no info how to boot into recovery for mako15:11
ogra_Power the device off with the Power button.15:11
ogra_Reboot into the bootloader by pressing the correct physical button combination for your device type as shown here: https://source.android.com/source/building-devices.html#booting-into-fastboot-mode15:11
ogra_bah15:11
ogra_thats not helpful15:11
ogra_we want recovery, not fastboot15:11
asacack15:12
asacour documetnation isnt really good imo15:12
asacno clear sections etc.15:12
asacnot even a toc15:12
ogra_well, the wiki had a super detailed TOC15:12
asacalso like two page warnings and disclaimers15:12
asacbefore starting15:12
asacwiki was also super hard to follow15:13
ogra_but thats gone when it was moved away from wiki to official docs15:13
asacright, but it wasnt really better15:13
asacat least for me15:13
ogra_asac, on mako it is vol-dn and power15:13
asacanyway, i will not explain how to do that15:14
asacguess the mailthread will explain it15:14
asacok15:14
asacogra_: lets validate that the recovery approach works and then i send it15:14
asacthansk!15:14
* asac will be back in 10 minutes15:14
ogra_sigh15:15
ogra_on krillin it is really complex15:15
ogra_vol-dn + power ... then select recovery with vol-up and press power agaiin15:15
ogra_hah15:15
ogra_great15:16
ogra_my mako already had the broken image downloaded and now goes to flash automatically to it15:16
ogra_oh, no, that was utopic anyway15:17
ogra_but that means i can verify utopic now :)15:18
ogra_great, utopic verified, not affected15:18
ogra_hmm15:20
ogra_my rtm krillin doesnt have a location indicator anymore15:20
ogra_location works though15:21
ogra_i wonder if thats a design thing15:21
ogra_ugh15:21
ogra_and the new "pinned in launcher" thingie is butt ugly :(15:21
asacogra_: we dont need instructions for krillin in mail15:24
asaci will bounce them to PES folks for support15:24
ogra_k15:24
ogra_so:15:24
ogra_vol-dn while powering on15:24
ogra_select recovery and press power15:25
ogra_once in recovery run:15:25
ogra_ubuntu-device-flash --channel=ubuntu-touch/ubuntu-rtm/14.09-proposed15:25
ogra_thats all15:25
asacogra_: did you validate from bricked state?15:25
ogra_just doing15:25
Wellarkno --bootstap required?15:25
ogra_but downloading takes me 20min15:25
ogra_Wellark, not for upgrading15:25
Wellarkgood. people wont loose their data15:26
ogra_--bootstrap is obviously for bootstrapping ;)15:26
ogra_(which implies wiping)15:26
asacogra_: whats the fixed imge number on mako rtm?15:27
Wellarkwell, udf expects the device to be in certain states for diggerent options15:27
ogra_8315:27
Wellarkso as long as udf can flash from recovery without --bootstrap it's fine15:28
ogra_asac, i need to mow my lawn before it gets dark ... can i leave you with this (will be back in 30min/1h)15:28
asacogra_: yes all good15:29
asacif not i call15:29
ogra_oki15:31
ogra_my download still runs, should be done in 1515:31
ogra_i'll probably drop by inbetween15:31
asacogra_: ok so i have to wait unti lyou re back?15:35
asacto confirm that your instructions work?15:35
* asac waits15:35
sergiusensWellark: bootstrap is like "flash_all" from android15:54
asacogra_: so do we know how the mir silo got published?15:58
sergiusensasac: brendan sent an email to the list as a reply to "Don't update to RTM build"16:01
asacsergiusens: what does he say?16:01
sergiusensasac: That explains a lot then - yes this silo failed to install using the citrain tool (dist-upgrade). We were instructed by the lander that apt-get installing each package was the method they used and the 'right' way. I guess this was bad advice and we should have payed attention to what citrain did. We've been caught out by issues like this several times and I think this really brings to the fore the need to have a conversation about16:02
sergiusensconsistent installation steps for silos so that these kinds of oversights happen less often (preferably not at all). I opened this bug last week: https://bugs.launchpad.net/ubuntu/+source/phablet-tools/+bug/1378245. We can use it as a container for suggestions about how to make the output of the tool most accurately reflect what ends up in the image.16:02
ubot5Ubuntu bug 1378245 in phablet-tools (Ubuntu) "citrain could use a more accurate way to upgrade from silos" [High,In progress]16:02
ogra_asac, Mirv sent a summary16:02
ogra_to the first ML thread16:02
asacogra_: can you try to summarize the summary?16:04
asaci am sure its super detailed :)16:04
* ogra_ finds it most funny that the breakage was known by upstream and a fix existed already 16:04
ogra_asac, there was a silo that was tested by upstream, an issue was found and seemingly ignored (the bug i just mentioned) ... then it was handed to QA with the instructions sergiusens talked about ... QA folloed them and signed off16:06
asacogra_: whgich breakage? the "does not boot:?16:06
ogra_yes16:06
ogra_they noticed the wrong dependencies16:06
ogra_i wrote detailed mails about each of these bits in the "Don't update to RTM build" thread16:06
asaci am really getting flooded with details16:07
asachence those details i cannot digest really16:07
ogra_the wrong dependencies in turn caused that during rootfs build the mesa alternative for the mir driver was used16:07
asaci really rely on very precise super tight summaries16:07
sergiusensasac: ogra_ well at least we can agree it was signed off16:07
asacat least makse my life super harder16:07
ogra_yes16:07
asacerr easier16:07
ogra_asac, "wrong deps in Mir that were ignored caused the wrong driver to be used in the rootfs, upstream testing instructions worked around the breakage so that QA signed off"16:08
ogra_asac, thats your one line summmary :)16:09
asacogra_: cool. that was pretty good. where those special instructions just for that landing? e.g. didnt they use the normal wiki instructions>16:10
asacwell done summary :)16:10
ogra_you have to ask brendand or davmor2 for details on the testing plan16:10
ogra_i think there were specific instructions in the spreadsheet for this silo16:10
ogra_(as i understood it)16:11
ogra_heh16:12
asacok thanks16:12
asacguess its a first that someone has special install instructions for a silo16:12
asacseems easy to fix by just making it clear that special install instructions are never allowed :P16:13
* ogra_ has another 150m² to mow now :) 16:13
ogra_(after cigarette break)16:13
asacogra_: did you validate the recovery procedure?16:13
asacstill waiting for an ack16:13
ogra_one sec16:13
asacwould really like to go off and prep for travel16:13
* ogra_ needs to go to the office for that 16:13
asacshops are almost closed16:13
ogra_asac, all fine16:14
ogra_device is up and running16:14
asacogra_: ok one more time:16:14
asac1. connect your phone to your ubuntu desktop/laptop 2. boot phone into recovery (on mako: 1. volup + power, 2. select recover and press power) 3. on desktop/laptop run ubuntu-device-flash --channel=ubuntu-touch/ubuntu-rtm/14.09-proposed16:14
asacogra_: is that correct?16:14
ogra_yep16:14
ogra_thats what i did16:15
ogra_ogra@anubis:~/Devel/seeds/ubuntu-touch.utopic$ adb shell system-image-cli -i|grep "version version"16:15
ogra_version version: 8316:15
rsalvetihahaha16:15
ogra_on mako you shoudl end up with version 83 then16:15
rsalvetiit seems people are busy here16:16
ogra_quite16:16
rsalvetilet me see, mir + update alternatives?16:16
* ogra_ lost his whole day to that :P16:16
asacogra_: ok sent mail16:16
asachave fun16:16
ogra_rsalveti, how did you guess !16:16
rsalvetiyeah, had that yesterday when updating with apt-get16:16
asaci will leave support of follow up questions to the folks now16:16
asachave to really run16:16
rsalvetibut then went to bed thinking I was some sort of crazy16:16
ogra_yeah, safe travels16:16
* ogra_ thinks asac needs to move to a sane city ... in Kassel the shops only close at 10pm on saturdays16:17
ogra_Hamburg ... provinz ...16:17
ogra_anyway, back to lawn mowing so i get at least one thing done16:18
rsalvetiogra_: did we revert mir in the end?16:20
rsalvetilet me check changes16:21
rsalvetioh, the package got renamed16:21
Mirvrsalveti: nope, ogra got this in https://launchpad.net/ubuntu/+source/livecd-rootfs/2.25416:21
rsalvetidid someone test this before landing?16:21
rsalvetinot pointing fingers, just trying to understand what happened16:22
Mirvrsalveti: read the mailing list threads. QA signed it off originally because they were instructed to do a manual workaround (won't happen again...), but I'm not sure what's the story behind the mir stuck in -proposed and then two days later migrating to -release16:22
Mirvie this problematic Mir was published on Thu already16:23
rsalvetioh, right16:24
rsalvetiyeah, I remember we had a few issues with this landing16:24
rsalvetibut well, guess we're all fine now16:24
rsalvetilet me flash latest16:24
rsalvetimaybe the propose migration happened after meta was updated16:25
Mirveverything seems fine now again, aside from damage caused to dogfooders of course16:33
rsalvetiguess ci lab as well16:33
imgbot=== trainguards: IMAGE 277 DONE (finished: 20141011 17:00) ===16:59
imgbot=== changelog: http://people.canonical.com/~ogra/touch-image-stats/277.changes ===16:59
Wellarkhas there been any breakage reported yet on the taking down of the image server?17:29
Wellarkother than the bug I filed against ubuntu-device-flash17:29
slangasekogra_, asac: from the latest mails, I understand that a fixed image has been published; is there anything else needed this morning?18:13
cjwatsonMirv: it was stuck in 14.09-proposed for a while because it required ubuntu-touch-meta to be updated before it could migrate (due to dependencies on -android alternative packages from there, which had been renamed), and I therefore wanted to get the mir landing into utopic so that I could keep ubuntu-touch-meta in sync18:14
cjwatsonwe finished that off last night18:14
cjwatsonbut it appears nobody considered that livecd-rootfs changes might be needed18:15
cjwatsonI suspect the ultimate fix for this is per-silo image building18:15
cjwatsonwe could actually have done that even with today's technology, though it requires some setup18:15
cjwatsonWellark: I think it's harsh to say that you had no contact for the release team; I responded in under an hour from being SMSed, which OK may not be a stupendous SLA but I don't think it's bad for somebody not on duty on a Saturday18:17
slangasekogra_: right, I see your livecd-rootfs fix, I figured that's the approach you'd take to un-break it... I think that's fine for a quick fix, but I really don't like forcing things with update-alternatives.  I think we should instead fail the image build whenever this happens.18:18
cjwatsonWellark: and a community member took action 26 minutes after being asked on #ubuntu-release18:18
cjwatsonslangasek: agreed18:18
Wellarkcjwatson: at the time of the decision, we didn't have a contact18:23
Wellarkand there was no guarantee to have a contant on Saturday18:23
Wellarkthat was _not_ to blame the release-team not working on Saturday18:24
Wellarkbut there was no reliable ETA when we might get a hold on the team to do the image publishing18:25
Wellarkand we have IS for emergencies 24/718:25
cjwatsonright and I think that's the way it should be, but I'd like you to be a bit more careful about how you phrase it :)18:25
cjwatson(also, fixing ubuntu-rtm did not involve the release team)18:26
Wellarkcjwatson: I don't understand what you are referring to. Where should I have been "more carefil" in my phrasing?18:28
Wellark+ the utopic images were broken as well (as far as we knew), so it's irrelevant for the decision making if ubuntu-rtm does not involve the release team18:31
Wellarkmost of our dogfooders are using utopic-proposed images on n418:32
ogra_slangasek, its all fine for now ... will need some cleanup work on monday though18:39
ogra_slangasek, my biggest obstacle here is that we dont have  an easy "one click rollback" mechanism for either ... (images as well as silos) and i think we should have a planning session for this at the sprint ...18:42
ogra_anyway ... back into my evening ... i spent enough time online today :)18:43
cjwatsonWellark: I thought it sounded as though you thought you (plural) ought to have had a contact in the release team19:24
cjwatsonWellark: My understanding is that the utopic images failed to build rather than breaking in this particular way.  I did upload a fix for the build failure, but my fix didn't land independently, it was superseded by ogra_'s upload and so there was no image build with just that.  Is that incorrect?19:26
cjwatsonWellark: The cdimage logs indicate that there was no successful utopic build between 20141010 (log timestamped 2014-10-10 02:57 UTC, so before the mir landing) and 20141011.1 (log timestamped 2014-10-11 15:55 UTC, so after the fixes)19:27
cjwatsonSo I don't think utopic-proposed can have been broken for users at any point19:28

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!