=== shardy_afk is now known as shardy === shardy is now known as shardy_mtg === shardy_mtg is now known as shardy [14:18] smoser: Are you going to be able to look at __builtin__ (i.e. https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/310432) this week, or shall I pick up work on it? [14:19] Odd_Bloke, if you can make tests pass, then i'm good with it in trunk [14:19] but, no. i wont get there. right now i'm workingon [14:19] https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1611074 [14:19] smoser: OK, cool, I'll look at the tests. [14:19] and i'll hope to have some review from you on that... [14:19] its kind of a pita [14:19] smoser: Ugh, that bug. [14:19] the code that was there has always been racy [14:20] i do think that i have a reasonable plan [14:20] smoser: FYI, I'm out tomorrow, Monday and Tuesday. [14:20] you're on UTC hours or are you on this continent now. [14:21] smoser: UTC at the moment, flying to that continent tomorrow and back on Monday. [14:21] fun weekend. [14:22] Yeah, I'm having to do a quick trip to activate my permanent residency. === cpaelzer_ is now known as cpaelzer [14:39] Odd_Bloke, that doesnt really make sense. [14:39] :) [14:39] :) [14:39] Being with my Canadian wife (anywhere in the world) counts as "resident". [14:40] And my medical expires in December, so I'm having to make a trip before that. [15:58] smoser: Oops, you'll just have got a huge email from LP because of me; proposed merging my tests to your branch having forgotten that I rebased on to master. ¬.¬ [15:58] smoser: https://code.launchpad.net/~daniel-thewatkins/cloud-init/+git/cloud-init/+merge/311164 is your commit with an additional commit fixing the tests. [15:59] Odd_Bloke, nice. thank you! [16:43] Odd_Bloke: smoser: the "ephemeral0" disk in azure is mapped how? /dev/disk/by-label? by-id ? ie, what sort of kpath would such a device have ? [16:44] rharper: cloud-init ships udev rules that make it appear at /dev/disk/cloud/azure_resource [16:45] ok, lemme read those rules [16:45] rharper: That's only post-trusty, before xenial walinuxagent ships rules which name them slightly differently. [16:46] (Well, I think walinuxagent still ships those rules) [16:46] Odd_Bloke: ok, would you be able to get me access to a trusty and xenial instance in azure? I'm poking at a disk formatting and mounting bug there, https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1642383 [16:48] rharper: On it. [16:48] thanks [16:51] rharper: ubuntu@xenial-161117-1647.cloudapp.net [16:52] rharper: And ubuntu@trusty-161117-1648.cloudapp.net [16:52] I'll leave you to work out which one is which suite. ;) [16:53] Odd_Bloke: thanks! [16:53] rharper: I'm heading off for vacation, so could you shutdown the machines when you're done with them? [16:54] Odd_Bloke: yes! [16:54] Danke. :) [18:58] smoser: looks like we'll need this https://code.launchpad.net/~daniel-thewatkins/cloud-init/lp1460715 to fix disk_setup partitioning under xenial (sfdisk changes in 16.04) [19:02] rharper, yeah, there are other issues... i hope/expect to fix that bug with what i have for the other [19:02] part of my motivation to have you look at that one was so you'd understand my changes coming here. [19:02] and why they should work for both [19:03] but yea... using that dynamic path is part of this. [19:03] k [19:03] as well as making sure we wait for the device to appear [19:03] right, that's a 16.04 ism w.r.t kernel partition awareness under systemd [19:03] that sounded like the kpartx thingy [19:06] smoser: would really like to replace the cc_mounts/cc_disk_setup with curtin backends but that's not going to happen right now [19:10] rharper, yeah. that is absolutely true [19:11] and additionally we need to be able to take disk config in curtin like format. [19:11] y [19:11] another thing for the roadmap =) [19:12] the 33, 66, 88 thingy isn't non-parseable unless you read the help documentation, so moving to a more helpful format would be nice [19:53] rharper, [19:53] https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/311205 [19:53] Odd_Bloke, ^ also [19:54] + if os.path.isfile(devpath + suff + "2"): [19:55] that seems off, what if there are 3 ? [19:55] also, I think it's possible to have un-ordered partitions (gpt) for example [19:55] well, it woudl fail if there was a partition 1 and partition 3 but *not* a partition 2 [19:55] which is possible [19:55] (you can do that with mbr also) [19:55] yeah, it is hueristic for sure. [19:55] i could probably use a function in disk_setup [19:56] but this was what i did initially [19:56] we have a sysfs device partition counting in curtin [19:56] yeah, but that wont work on non-linux [19:57] the sfdisk stuff in disk_ probably would [19:57] not with -L Linux [19:57] so yeah, thats not the greatest bit of code, but it is better than what *was* there. [19:57] oh sure; I just don't know how far we want to go here [19:57] -L linux isn't "linux specific stuff" [19:57] hehe [19:57] its just "dont complaina bout stuff linux doesnt care about" [19:58] well, it's deprecated anyhow [19:58] the code that was there before didn't even check for a second partition at all [19:58] but, it *is* linux specific [19:58] ! [19:58] ? [19:58] i suspect that sfdisk -L works on freebsd [19:58] Stderr: "sfdisk: --Linux option is unnecessary and deprecated\nsfdisk: unsupported unit 'M'\n" [19:58] in 16.04 [19:59] upstream util-linux changed [19:59] in 2014 or so [19:59] ah. right [20:01] https://bugs.launchpad.net/cloud-init/+bug/1460715 is still needed for partitioning on xenial [20:03] the can_dev_be_reformatted is specifically there to help reprevent dataloss if someone "used" the disk ... [20:04] certainly it's possible to populate it the ntfs existing partition though? we can't protect against that? [20:11] rharper, if there are files on it, we leave it alone [20:11] (other tahn the DATALOSS_ file) [20:11] ok, I'm posting comments, the msg on the return True, is confusing [20:17] rharper, addressed that.. you're right. i [20:17] it should have said no important files [20:17] I thought so after reading the rest of the code [20:18] and I switch to util.log_time should be easy and announce the wait on the log [20:18] i think i might be racing systemd and mount now. [20:18] I've had several folks ask about making sure cloud-init announces when it's blocking ( which I think it does most of the time) but we need to do that here too [20:19] rharper, ubuntu@40.78.152.32 [20:19] er... smoser@ [20:19] ok [20:20] byobu or /var/log/cloud-init.log [20:20] mount -a failed as already mounted. [20:20] which should not happen until [20:20] defaults,nofail,x-systemd.requires=cloud-init.service,comment=cloudconfig [20:20] urg [20:21] maybe in new systemd world, a mount-a is going to fail in that scenario... i'm not sure, but i'm pretty sure it should *not* fail with already mounted [20:21] Nov 17 20:11:02 repro3 systemd[1]: Mounting /mnt... [20:21] Nov 17 20:11:02 repro3 systemd[1]: Mounted /mnt. [20:21] Warning: mnt.mount changed on disk. Run 'systemctl daemon-reload' to reload units. [20:22] $ journalctl -o short-precise --unit mnt.mount [20:22] -- Logs begin at Thu 2016-11-17 20:10:56 UTC, end at Thu 2016-11-17 20:22:01 UTC. -- [20:22] Nov 17 20:11:01.513996 repro3 systemd[1]: Unmounted /mnt. [20:22] Nov 17 20:11:02.345932 repro3 systemd[1]: Mounting /mnt... [20:22] Nov 17 20:11:02.820894 repro3 systemd[1]: Mounted /mnt. [20:22] check the timestamps with cloud-init log [20:22] well, the timestamps suck [20:22] Nov 17 20:11:02.868532 repro3 cloud-init[1037]: 2016-11-17 20:11:01,686 - util.py[WARNING]: Activating mounts via 'mount -a' failed [20:22] journalctl -o short-precise --unit cloud-init.service [20:22] will get you real timestamps [20:23] so, we were too late [20:23] you want short-monotonic here [20:23] as you said [20:23] sure [20:23] that too [20:23] as clock runs all over :) [20:23] hrm [20:23] that shows something different then [20:24] ~$ journalctl -o short-monotonic --unit cloud-init.service | grep mount [20:24] [ 15.757407] repro3 cloud-init[1037]: 2016-11-17 20:11:01,686 - util.py[WARNING]: Activating mounts via 'mount -a' failed [20:24] smoser@repro3:~$ journalctl -o short-monotonic --unit mnt.mount [20:24] -- Logs begin at Thu 2016-11-17 20:10:56 UTC, end at Thu 2016-11-17 20:22:01 UTC. -- [20:24] [ 14.406073] repro3 systemd[1]: Unmounted /mnt. [20:24] [ 19.680401] repro3 systemd[1]: Mounting /mnt... [20:24] [ 19.750265] repro3 systemd[1]: Mounted /mnt. [20:24] something is lieing [20:24] but it does seem like if we update fstab, we may need to systemctl daemon-reload [20:24] so generators trigger [20:24] which would do the fstab generator [20:24] and do the mount for us [20:24] IIUC [20:25] right. thats possible. [20:26] but both the old and the new fstab entry had not before cloud-init.service [20:26] so it really shuld not be doing anything until that is finished. [20:27] holy moley [20:27] the monotonic is not in order [20:27] journalctl -o short-monotonic [20:27] needs to sort [20:30] ew [20:30] that seems *wrong* [20:30] yeah. its probably because its just timestampping the different threads as the come in [20:32] so, something umounts /mnt right after cloud-init.service starts ( that's expected, right?) [20:32] [ 14.713559] repro3 cloud-init[1037]: Cloud-init v. 0.7.8 running 'init' at Thu, [20:32] [ 14.406073] repro3 systemd[1]: Unmounted /mnt. [20:33] actually before [20:33] not sure why [20:33] well, cloud-init *does* do that. or could [20:33] -local ? [20:33] if it was ntfs, then cloud-init mounts, reads, unmounts [20:33] well, this is the mnt.service saying [20:33] so, that happens after fstab generator runs [20:33] otherwise we wouldn't have a mnt.mount unit [20:33] probably not -local (looking for the datasource disk) but it coudl be [20:34] bbiab, picking up the kiddos, then back to help debug [20:34] yeah, everything happens after the fstab generator runs [20:58] smoser: back [21:00] i poked slangasek, he might be looking too [21:02] its kind of interesting/annoying that 'mount -a' failed [21:02] as i've never seen that before [21:02] "everything is already mounted" normally exits 0 [21:03] that seems new [21:03] if we run mount -a now, do we see ? [21:03] no [21:03] it returns 0 [21:03] right. [21:03] so, something is very strange [21:03] what's the RC? [21:03] so it could just be a really unlucky race [21:03] well, the return code should give us a hint [21:03] 32 [21:04] maybe it's ntfs return code [21:04] 32 [21:04] mount failure [21:04] that's so helpful [21:04] =( [21:05] https://linux.die.net/man/8/ntfs-3g.probe doesn't list 32 [21:05] so unlikely ntfs error [21:13] ov 17 20:11:02.868532 repro3 cloud-init[1037]: 2016-11-17 20:11:01,686 - util.py[WARNING]: Activating mounts via 'mount -a' failed [21:13] Nov 17 20:11:02.868575 repro3 kernel: EXT4-fs (sdb1): mounted filesystem with ordered data mode. Opts: (null) [21:14] smoser: that does look like a race , mount -a got the OK to mount it, but something mounted it before it could ? [21:14] that's what 50 ms? [21:15] thats way shorter than that [21:16] 6 places isn't nano, ? [21:16] * rharper sucks at subsecond placement [21:16] oh. but you're looking at the timestamp of the message that it failed. [21:16] sure, but I still think it's related [21:16] 20:11:01,652 [21:17] is 'Running' [21:17] [ 14.425648] repro3 systemd[1]: Stopped File System Check on /dev/disk/cloud/azure_resource-part1. [21:17] [ 14.484319] repro3 systemd[1]: Starting File System Check on /dev/disk/cloud/azure_resource-part1... [21:17] [ 14.560400] repro3 systemd-fsck[1140]: /dev/sdb1: clean, 11/25688 files, 8896/102400 blocks [21:17] [ 15.757407] repro3 cloud-init[1037]: 2016-11-17 20:11:01,686 - util.py[WARNING]: Activating mounts via 'mount -a' failed [21:17] yeah, that just seems completely bogus [21:18] [ 15.440692] repro3 kernel: EXT4-fs (sdb1): mounted filesystem with ordered data mode. Opts: (null) [21:18] something was "using" sdb1 (systemd-wise) [21:18] when cloud-init attempted to mount it [21:18] if fsck was still running, it would be seen as busy [21:18] ie, an open on the device [21:18] systemd-fsck [21:18] :-( [21:18] yeah [21:19] These services are started at boot if passno in /etc/fstab for the file system is set to a value greater than zero. [21:19] whichi s set to 2 [21:19] which triggers the service [21:21] we could try not setting the passno in the fstab entry and see if things just work [21:22] if so then we may need to mask systemd-fsck service prior to the mount (or as you see) it gets mounted automatically [21:22] ie, if we're system_is_systemd [21:22] we could skip the mount -a [21:24] right, on ubuntu we do not in theory need the mount -a now. [21:24] mask systemd-fsck service ? [21:28] :-( [21:28] another instance, i see: [21:28] 2016-11-17 21:20:04,125 - DataSourceAzure.py[DEBUG]: Azure ephemeral disk: All files appeared after ['/dev/disk/cloud/azure_resource'] seconds: 0 [21:28] 2016-11-17 21:20:04,125 - DataSourceAzure.py[DEBUG]: reformattable=False: device /dev/disk/cloud/azure_resource is not a file [21:29] which is very odd. [21:31] it's alink [21:31] right ? [21:31] right. but the check is os.exists [21:31] and it's /dev/disk/azure/resource [21:31] not azure_resource [21:31] err.. isfile [21:31] which isvalid [21:32] no its both [21:32] your instance says no [21:32] and cloud-init is the /dev/disk/cloud/azure_resource one. the other is from walinuxagent [21:32] ah [21:33] two paths [21:33] is suspect that's udev racing with symlink creation [21:34] but it shoudlnt ever stop existing [21:34] after it exists [21:35] i have to go... to teacher conferences [21:35] k [21:35] i'm using http://paste.ubuntu.com/23492436/ [21:35] to make it think its a resize [21:35] (just partitions it as 1 parittion and mkfs ntsf) [21:35] and then reboot ? [21:35] and then reboot [21:35] yeah [21:35] ok [21:36] here, i'll give you another instance [21:36] smoser@smoser1117x.cloudapp.net [21:37] shoot [21:37] isfile is just wrong [21:37] fix that [21:37] os.path.isfile("/dev/disk/cloud/azure_resource") [21:37] False [21:37] heh [21:37] islink [21:37] will work [21:37] >>> os.path.islink('/dev/disk/cloud/azure_resource') [21:37] True [21:38] well i dont want to know if its a link [21:38] cause that says it can be danling [21:38] you need readlink -f [21:38] I'll get a fix [21:38] i just changed to exists [21:38] and pushed [21:38] so that should be fine really [21:38] well, yeah [21:38] going afk [21:38] k [21:39] >>> os.path.realpath('/dev/disk/cloud/azure_resource') [21:39] '/dev/sdb' [21:39] >>> os.path.exists(os.path.realpath('/dev/disk/cloud/azure_resource')) [21:39] True [21:51] * rharper gives it a go [22:37] smoser: getting further; with the realpath and exists going, now we're looking to find sdb1 .. it's not immediately present when we're detecting if the devices can be reformatted .. [23:01] smoser: currently, sdb1 isn;t being detected as ntfs via the blkid util.find_devs_with(); debugging that [23:29] ah , looks like it's working, but we tried to mount /dev/sdb, instead of /dev/sdb1 (real_partpath) [23:35] and, wrong name of the semaphor file, config_disk_setup, not config_disk_config [23:35] almost there