[00:52] Bug #1588547 changed: Generated bonding configuration is incorrect. [08:35] Bug #1588706 opened: MAAS should not add 'source /etc/network/interfaces.d/*.cfg' to /etc/network/interfaces === mup_ is now known as mup === kdavyd_ is now known as kdavyd === CyberJacob is now known as zz_CyberJacob [12:27] Bug #1588706 changed: MAAS should not add 'source /etc/network/interfaces.d/*.cfg' to /etc/network/interfaces [14:06] my successful deploy rate of 14.04 LTS is terrible [14:06] 10 identical nodes: 3/10 successful deployments [14:20] gimmic, using MAAS 2.0, right? [14:20] gimmic, we need to fix that rate and find out what is causing this crappy ratio [14:22] can i increase the logging with some sort of debug flag? [14:22] I'm about to fire off another 15 nodes at once [14:22] they're all identical hardware and in ready state [14:23] gimmic, we tend to log everything [14:23] gimmic, remind me, do the nodes commission fine? 20/20? [14:23] single disk R620's. Literally the only customization I'm doing is changing the hostname and static assignment of the network interface. [14:24] They all commission fine [14:24] gimmic, always, right? never any commissioning failures? [14:29] you know, a more global log viewer would be nice from the weberface [14:29] like a general event log [14:30] rather than host-specific only [14:33] gimmic, yeah, I hear you [14:33] gimmic, we have /something/ like that with remote rsyslog aggregation but I'm not sure it's working in 2.x [14:34] gimmic, anyway, when deployments fail, is the failure reproducible or rather random? [14:36] I fired off all ten last night before I left work and came back to the 3/10. Have not tshot it yet. [14:36] will poke at this next batch of 15 since I'm assigning IP/hostnames now. [14:36] okay [14:36] Is there a document anywhere that describes where each subcomponent logs to? I see the logs under /var/log/maas [14:41] gimmic, maas.io/docs has the architecture -- there are basically two main components, rackd and regiond [14:41] rackd talks to the nodes (pxe, ipmi, etc) [14:41] regiond hosts the API and Web server [14:43] Bug #1588846 opened: [2.0b6] builtins.ValueError: invalid literal for int() with base 10: '' [14:51] does anyone know how long 'Disk erasing' is expected to take on a 256GB SSD ? [14:52] is it a mult-pass secure wipe or something? [14:53] roaksoax: ^^ [14:56] dimitern: nope, but it iwll take a while [14:56] dimitern: it just wipes the whole disk [14:57] roaksoax: yeah, it just finished ~10m for that 256GB SSD; interestingly the other 2 nodes with 120GB SSDs are taking longer (all were started pretty much at once) [15:13] Bug #1588857 opened: [2.0b5] sudo: no tty present and no askpass program specified [15:22] Bug #1588868 opened: [2.0b5] While monitoring service 'dhcpd/tgt/dhcpd6/proxy' an error was encountered: Unable to parse the output from systemd for service [15:43] Bug #1588875 opened: [2.0-b6] Deploying a trusty (but not xenial) node frequently fails during storage setup of curtin [15:45] kiko: http://i.imgur.com/jzb7djY.png [15:46] fired this off about 30 minutes ago, all the nodes are hung in deploying state, same logs: [15:46] Node installation - 'curtin' failed: configuring storage Fri, 03 Jun. 2016 10:26:55 [15:46] Node installation - 'curtin' failed: configuring disk: sda Fri, 03 Jun. 2016 10:26:55 [15:46] Node installation - 'curtin' started: configuring disk: sda Fri, 03 Jun. 2016 10:26:54 [15:46] Node installation - 'curtin' started: configuring storage Fri, 03 Jun. 2016 10:26:54 [15:46] using LVM now [15:47] none of them have failed yet, but I suspect there's a timeout somewhere ticking down [15:47] gimmic: just filed that bug above for this [15:48] dimitern: I'm not sure it's LVM related [15:48] initially I was doing flat ext4 [15:48] and still run into it [15:53] gimmic: it is [15:54] gimmic: the default layout is flat, but I've created a VG on top of that [15:54] in order to emulate having 2 distinct partitions [15:57] Yeah, I'm saying I see the same error/behavior without using LV [15:57] even installing w/ just flat ext4 curtain hangs in the same way [15:57] ..curtin [16:18] gimmic: ah, I see .. not good :/ [16:25] Bug #1588868 changed: [2.0b5] While monitoring service 'dhcpd/tgt/dhcpd6/proxy' an error was encountered: Unable to parse the output from systemd for service [16:44] gimmic, can you feed into https://bugs.launchpad.net/maas/+bug/1588875 as well? [16:55] Bug #1588907 opened: [2.0b6] django.db.utils.IntegrityError: insert or update on table "piston3_consumer" violates foreign key constraint "piston3_consumer_user_id_4ac0863fa7e05162_fk_auth_user_id" [17:04] hm [17:31] Bug #1588914 opened: [2.0b6] MAAS writes DHCP multiple times while not much is going on [17:55] so just fyi [17:56] if I use ipmi to power cycle the failed deployment node, release it, and re-deploy [17:56] it seems to deploy fine [17:57] I suspect it's something with how the partition exit code is [17:57] when I redeploy, the partitions are already there [18:40] okay, confirmed.. If I just shut the node down and netboot it again [18:40] it works fine after re-deploying [18:40] so the partition creation is exiting with an error code, but completing === kwmonroe_ is now known as kwmonroe [19:22] Bug #1588875 changed: [2.0-b6] Deploying a trusty (but not xenial) node frequently fails during storage setup of curtin [19:36] roaksoax, smoser: is there a way for gimmic to stop the deployment process from rebooting and ssh in? [19:38] so my current 'workaround' is let it time out to failed deployment, mark the nodes as broken, mark them as fixed, deploy [19:38] that allows me to avoid fiddling with out of band ipmi resets [19:38] I have now deployed 30 nodes using that methodology [19:44] gimmic, 2 ways [19:45] a.) maas server change /etc/maas/preseeds/curtin_userdata [19:45] see 'power_state' there [19:45] comment that out will do [19:46] b.) ssh in during deplyoment and sudo touch /run/block-curtin-poweroff [19:47] gimmic, that will let you try and re-run the partitioning command and see what's failing [19:47] could it be a gpt/efi thing? [19:48] kiko, you can also just put it comissioning and have it not power itself off [19:48] and then do whatever you want [19:48] smoser, well, I think the problem is right now gimmic doesn't even know what curtin is doing [19:50] so doing it in the deployment phase is going to be easier [23:41] Bug #1588154 changed: [2.0b5] Deploying node fails when it has VLAN configuration