[00:03] Bug #1748051 opened: [2.4, devel] [00:18] Bug #1748052 opened: [2.4, devel] ] Unable to write to plugin cache /usr/lib/python3/dist-packages/twisted/plugins/dropin.cache: error number 13 [00:34] Bug #1748055 opened: [2.4, devel] While commissioning/testing [01:36] my maas-proxy cache is consuming some disk space. is there an official way to clear out this cache or is "rm -r /var/spool/maas-proxy/*" an acceptable solution? === frankban|afk is now known as frankban [08:17] Is maas itself providing the cloud-init content when deploying a host? I'm trying to deploy a 17.10 host here and I'm getting "404 Not found http://old-releases.ubuntu.com/ubuntu artful-security Release". Well, artful isn't an old-release, so why is it trying to pick it from there? [13:15] Bug #1748187 opened: Only 16.04 Xenial available for commissioning [13:54] Bug #1748187 changed: Only 16.04 Xenial available for commissioning [14:24] Bug #1589140 changed: No WOL option in latest MAAS version for 16.04 [15:38] hi .. any update on bug 1673724 [15:39] bug #1673724 [15:56] ejat: seems fixed to me [16:08] ejat, are you seeing it on a modern version of MAAS? === frankban is now known as frankban|afk [19:35] roaksoax, https://bugs.launchpad.net/maas/+bug/1743144 is affecting HPe machines , not sure why that repository is added by default. [19:38] roaksoax, https://pastebin.ubuntu.com/26542719/ fyi [19:48] I need to deploy Debian 9, but it seems like `ifenslave` is not included by default. Is there a time in the curtin setup that I can make it install the package before it configures the NICs? [19:48] (That is, Debian 9's default ISO doesn't include it. Perhaps I need to make a new Debian cloud installer image from scratch?) [21:32] niedbalski: you can disable the config to install third party drivers on the settings === TJ- is now known as Guest27109 [21:32] niedbalski: or you can remove that from the config in /etc/maas/drivers.yaml [21:33] roaksoax, yeah, but shouldn't be this disabled per series? as this isn't really available for xenial [21:33] niedbalski: not relaly not. We have no way of knowing it is on the repository or not [21:34] niedbalski: but since there's options to disable/enable this or remove the use of the driver altogether [21:34] i think there's ways to not be affected === TJ_Remix is now known as TJ- [21:34] niedbalski: in fact, you could even change the repository where you get the drivers from [21:34] roaksoax, well, a mention in the documentation is worth then, it took me a few to discover it as by default third party drivers are enabled. [21:36] niedbalski: uhmm seems this section as removed from the docs [21:36] roaksoax, looks like [21:37] roaksoax, now I am hitting 1730456 :-) [21:39] niedbalski: what's your rackd.conf and your regiond.conf ? [21:41] roaksoax, https://pastebin.ubuntu.com/26543275/ [21:42] niedbalski: i bet that 10.10.1.7 is not the ip the machines can reach MAAS at [21:46] roaksoax, https://pastebin.ubuntu.com/26543312/ .. yes, that's not the address the machines are reaching (192.168.100.0/24) [21:47] niedbalski: so you have to options, leave rackd.conf as localhost and update regiond.conf correctly [21:47] niedbalski: or mofidy rackd.conf [21:48] roaksoax, do you see any evident correlation with the error that I just posted? I wonder if you know something I don't :-) [21:49] niedbalski: that one could be a clockskew thing [21:49] roaksoax, clock is aligned in both maas/deployed node [21:49] anyways, I am ntp syncing and modifying regiond accordingly. [21:50] yeah seems cloud-init is doing the right thing by fixing the clock skew [21:50] niedbalski: did you fix rackd.conf, restarted it and retried ? [21:50] roaksoax, probando doctor [21:59] roaksoax, https://pastebin.ubuntu.com/26543384/ [22:00] niedbalski: release the machine, or abort it and try ? [22:00] roaksoax, just did it. [22:00] niedbalski: it could be due to the clock skew that it cannot authorize [22:00] niedbalski: hwat about other mcahines ? [22:00] roaksoax, yes, is there something like a cache for tokens or similar? [22:01] no, we dont cache tokes [22:01] no, we dont cache tokens [22:01] each tim e a new token gets re-generated [22:01] niedbalski: that said, is this a commissioning or enlistment ? [22:01] niedbalski: err or deployment ? [22:01] roaksoax, thats ok, other machines (non hp) works ok, and this started to happen after I disabled the third party drivers [22:01] roaksoax, commissioning [22:02] niedbalski: the other one could be that its ready user data from the disk instead of the pxe process [22:04] roaksoax, the mac address remains the same, i deleted/created the machine with another name/uuid just in case. [22:05] niedbalski: yeah lets try that and see what happens [22:05] that's a strange error though [22:06] roaksoax, should I wipe out the disks before? [22:06] niedbalski: if you could that'd be good [22:06] roaksoax, might be the old userdata is being read from the disk [22:06] indeed [22:06] roaksoax, have you seen something like that? [22:07] niedbalski: nope, i personally havent, althjough we did fix a bug long time ago that required a new cloud-init so that it wouldn't read form disk [22:07] niedbalski: are your images the latest ? [22:07] roaksoax, I think yes, they are in sync with images.maas.io [22:08] i wonder if cloud-init could have regressed and no longer listens to this option we send it kernel params [22:08] roaksoax, let me see, I will wipeout the disk arrays [22:08] roaksoax, which option? [22:09] we send an option on the kernel command line to alwys use the maas datasource [22:11] roaksoax, commission with any non-available ppa, (commission fails), remove the ppa and retry commissioning [22:11] simplest reproducer. [22:17] roaksoax this memory bug is killing our performance. our clients are u happy with the delays. how do we step up the level of toubleshooting on 1744765 [22:33] xygnal: at this point, the only thing i can't think of is that's related to you running on top of vmware [22:33] xygnal: we have confirmed we have larger maas' (e.g. way many more machines in a single maas), running on hardware that dont exhibit these issues [22:34] niedbalski: aha! you weren't giving me enough info. So it is clear now. cloud-init fails to configure the archive, it tels that to maas and the machine gets marked as faled commissioning, the oauth keys expire and you see that in the logs :) [22:36] xygnal: on the same version that is [22:36] xygnal: so you could run a test by dumping your db, and importing it in a cleanly installed maas of the same version [22:36] xygnal: on different underlying hardware [22:37] roaksoax, https://pastebin.ubuntu.com/26543565/ [22:38] roaksoax, ok, so the first failure was ok, then I removed the archive (disabled third party drivers completely) and re-commissioned [22:39] niedbalski: yeah that makes sense [22:41] roaksoax, ok, failed the commissioning with the same error [22:41] i wiped out the disk in case it was using disk source [22:45] niedbalski: can you show all the cloud-init-output log ? [22:45] niedbalski: and all cloud-init.log too [22:45] roaksoax, oops :-) [22:46] roaksoax, q: does the oauth tokens gets expired in case of failure during any of the cloud-init stages? [22:47] niedbalski: yes, so if cloud-init sayas "hey maas i failed to configure this thing you told me to" maas sees the failure message from cloud-init, and marks it failed commissioning [22:47] niedbalski: and then you would see those errors that you cant access the metadata [22:48] roaksoax, gotcha, let me retry and upload the full cloud-init logs for you to look. [22:51] roaksoax, ok, now all went through .. i removed the maas squid cache, the problem was due to a hash mismatch while running the apt update phase on cloud-init, not sure how the proxy ended up that way. [22:52] roaksoax, deploying now, lets see :-) [22:53] cool [22:57] roaksoax how much memory on these bare metal region controllers you test on?