mup | Bug #1588547 changed: Generated bonding configuration is incorrect. <sts-needs-review> <curtin:Confirmed> <MAAS:Opinion by mpontillo> <https://launchpad.net/bugs/1588547> | 00:52 |
---|---|---|
mup_ | Bug #1588706 opened: MAAS should not add 'source /etc/network/interfaces.d/*.cfg' to /etc/network/interfaces <MAAS:New> <https://launchpad.net/bugs/1588706> | 08:35 |
=== mup_ is now known as mup | ||
=== kdavyd_ is now known as kdavyd | ||
=== CyberJacob is now known as zz_CyberJacob | ||
mup | Bug #1588706 changed: MAAS should not add 'source /etc/network/interfaces.d/*.cfg' to /etc/network/interfaces <curtin:New> <MAAS:Invalid> <https://launchpad.net/bugs/1588706> | 12:27 |
gimmic | my successful deploy rate of 14.04 LTS is terrible | 14:06 |
gimmic | 10 identical nodes: 3/10 successful deployments | 14:06 |
kiko | gimmic, using MAAS 2.0, right? | 14:20 |
kiko | gimmic, we need to fix that rate and find out what is causing this crappy ratio | 14:20 |
gimmic | can i increase the logging with some sort of debug flag? | 14:22 |
gimmic | I'm about to fire off another 15 nodes at once | 14:22 |
gimmic | they're all identical hardware and in ready state | 14:22 |
kiko | gimmic, we tend to log everything | 14:23 |
kiko | gimmic, remind me, do the nodes commission fine? 20/20? | 14:23 |
gimmic | single disk R620's. Literally the only customization I'm doing is changing the hostname and static assignment of the network interface. | 14:23 |
gimmic | They all commission fine | 14:24 |
kiko | gimmic, always, right? never any commissioning failures? | 14:24 |
gimmic | you know, a more global log viewer would be nice from the weberface | 14:29 |
gimmic | like a general event log | 14:29 |
gimmic | rather than host-specific only | 14:30 |
kiko | gimmic, yeah, I hear you | 14:33 |
kiko | gimmic, we have /something/ like that with remote rsyslog aggregation but I'm not sure it's working in 2.x | 14:33 |
kiko | gimmic, anyway, when deployments fail, is the failure reproducible or rather random? | 14:34 |
gimmic | I fired off all ten last night before I left work and came back to the 3/10. Have not tshot it yet. | 14:36 |
gimmic | will poke at this next batch of 15 since I'm assigning IP/hostnames now. | 14:36 |
kiko | okay | 14:36 |
gimmic | Is there a document anywhere that describes where each subcomponent logs to? I see the logs under /var/log/maas | 14:36 |
kiko | gimmic, maas.io/docs has the architecture -- there are basically two main components, rackd and regiond | 14:41 |
kiko | rackd talks to the nodes (pxe, ipmi, etc) | 14:41 |
kiko | regiond hosts the API and Web server | 14:41 |
mup | Bug #1588846 opened: [2.0b6] builtins.ValueError: invalid literal for int() with base 10: '' <MAAS:Triaged> <https://launchpad.net/bugs/1588846> | 14:43 |
dimitern | does anyone know how long 'Disk erasing' is expected to take on a 256GB SSD ? | 14:51 |
dimitern | is it a mult-pass secure wipe or something? | 14:52 |
dimitern | roaksoax: ^^ | 14:53 |
roaksoax | dimitern: nope, but it iwll take a while | 14:56 |
roaksoax | dimitern: it just wipes the whole disk | 14:56 |
dimitern | roaksoax: yeah, it just finished ~10m for that 256GB SSD; interestingly the other 2 nodes with 120GB SSDs are taking longer (all were started pretty much at once) | 14:57 |
mup | Bug #1588857 opened: [2.0b5] sudo: no tty present and no askpass program specified <MAAS:New> <https://launchpad.net/bugs/1588857> | 15:13 |
mup | Bug #1588868 opened: [2.0b5] While monitoring service 'dhcpd/tgt/dhcpd6/proxy' an error was encountered: Unable to parse the output from systemd for service <MAAS:New> <https://launchpad.net/bugs/1588868> | 15:22 |
mup | Bug #1588875 opened: [2.0-b6] Deploying a trusty (but not xenial) node frequently fails during storage setup of curtin <MAAS:New> <https://launchpad.net/bugs/1588875> | 15:43 |
gimmic | kiko: http://i.imgur.com/jzb7djY.png | 15:45 |
gimmic | fired this off about 30 minutes ago, all the nodes are hung in deploying state, same logs: | 15:46 |
gimmic | Node installation - 'curtin' failed: configuring storageFri, 03 Jun. 2016 10:26:55 | 15:46 |
gimmic | Node installation - 'curtin' failed: configuring disk: sdaFri, 03 Jun. 2016 10:26:55 | 15:46 |
gimmic | Node installation - 'curtin' started: configuring disk: sdaFri, 03 Jun. 2016 10:26:54 | 15:46 |
gimmic | Node installation - 'curtin' started: configuring storageFri, 03 Jun. 2016 10:26:54 | 15:46 |
gimmic | using LVM now | 15:46 |
gimmic | none of them have failed yet, but I suspect there's a timeout somewhere ticking down | 15:47 |
dimitern | gimmic: just filed that bug above for this | 15:47 |
gimmic | dimitern: I'm not sure it's LVM related | 15:48 |
gimmic | initially I was doing flat ext4 | 15:48 |
gimmic | and still run into it | 15:48 |
dimitern | gimmic: it is | 15:53 |
dimitern | gimmic: the default layout is flat, but I've created a VG on top of that | 15:54 |
dimitern | in order to emulate having 2 distinct partitions | 15:54 |
gimmic | Yeah, I'm saying I see the same error/behavior without using LV | 15:57 |
gimmic | even installing w/ just flat ext4 curtain hangs in the same way | 15:57 |
gimmic | ..curtin | 15:57 |
dimitern | gimmic: ah, I see .. not good :/ | 16:18 |
mup | Bug #1588868 changed: [2.0b5] While monitoring service 'dhcpd/tgt/dhcpd6/proxy' an error was encountered: Unable to parse the output from systemd for service <MAAS:Incomplete> <https://launchpad.net/bugs/1588868> | 16:25 |
kiko | gimmic, can you feed into https://bugs.launchpad.net/maas/+bug/1588875 as well? | 16:44 |
mup | Bug #1588907 opened: [2.0b6] django.db.utils.IntegrityError: insert or update on table "piston3_consumer" violates foreign key constraint "piston3_consumer_user_id_4ac0863fa7e05162_fk_auth_user_id" <MAAS:New> <https://launchpad.net/bugs/1588907> | 16:55 |
kiko | hm | 17:04 |
mup | Bug #1588914 opened: [2.0b6] MAAS writes DHCP multiple times while not much is going on <MAAS:New> <https://launchpad.net/bugs/1588914> | 17:31 |
gimmic | so just fyi | 17:55 |
gimmic | if I use ipmi to power cycle the failed deployment node, release it, and re-deploy | 17:56 |
gimmic | it seems to deploy fine | 17:56 |
gimmic | I suspect it's something with how the partition exit code is | 17:57 |
gimmic | when I redeploy, the partitions are already there | 17:57 |
gimmic | okay, confirmed.. If I just shut the node down and netboot it again | 18:40 |
gimmic | it works fine after re-deploying | 18:40 |
gimmic | so the partition creation is exiting with an error code, but completing | 18:40 |
=== kwmonroe_ is now known as kwmonroe | ||
mup | Bug #1588875 changed: [2.0-b6] Deploying a trusty (but not xenial) node frequently fails during storage setup of curtin <curtin:Confirmed> <MAAS:Invalid> <https://launchpad.net/bugs/1588875> | 19:22 |
kiko | roaksoax, smoser: is there a way for gimmic to stop the deployment process from rebooting and ssh in? | 19:36 |
gimmic | so my current 'workaround' is let it time out to failed deployment, mark the nodes as broken, mark them as fixed, deploy | 19:38 |
gimmic | that allows me to avoid fiddling with out of band ipmi resets | 19:38 |
gimmic | I have now deployed 30 nodes using that methodology | 19:38 |
smoser | gimmic, 2 ways | 19:44 |
smoser | a.) maas server change /etc/maas/preseeds/curtin_userdata | 19:45 |
smoser | see 'power_state' there | 19:45 |
smoser | comment that out will do | 19:45 |
smoser | b.) ssh in during deplyoment and sudo touch /run/block-curtin-poweroff | 19:46 |
kiko | gimmic, that will let you try and re-run the partitioning command and see what's failing | 19:47 |
kiko | could it be a gpt/efi thing? | 19:47 |
smoser | kiko, you can also just put it comissioning and have it not power itself off | 19:48 |
smoser | and then do whatever you want | 19:48 |
kiko | smoser, well, I think the problem is right now gimmic doesn't even know what curtin is doing | 19:48 |
kiko | so doing it in the deployment phase is going to be easier | 19:50 |
mup | Bug #1588154 changed: [2.0b5] Deploying node fails when it has VLAN configuration <MAAS:Invalid> <https://launchpad.net/bugs/1588154> | 23:41 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!