/srv/irclogs.ubuntu.com/2016/06/03/#maas.txt

mupBug #1588547 changed: Generated bonding configuration is incorrect. <sts-needs-review> <curtin:Confirmed> <MAAS:Opinion by mpontillo> <https://launchpad.net/bugs/1588547>00:52
mup_Bug #1588706 opened: MAAS should not add 'source /etc/network/interfaces.d/*.cfg' to /etc/network/interfaces <MAAS:New> <https://launchpad.net/bugs/1588706>08:35
=== mup_ is now known as mup
=== kdavyd_ is now known as kdavyd
=== CyberJacob is now known as zz_CyberJacob
mupBug #1588706 changed: MAAS should not add 'source /etc/network/interfaces.d/*.cfg' to /etc/network/interfaces <curtin:New> <MAAS:Invalid> <https://launchpad.net/bugs/1588706>12:27
gimmicmy successful deploy rate of 14.04 LTS is terrible14:06
gimmic10 identical nodes: 3/10 successful deployments14:06
kikogimmic, using MAAS 2.0, right?14:20
kikogimmic, we need to fix that rate and find out what is causing this crappy ratio14:20
gimmiccan i increase the logging with some sort of debug flag?14:22
gimmicI'm about to fire off another 15 nodes at once14:22
gimmicthey're all identical hardware and in ready state14:22
kikogimmic, we tend to log everything14:23
kikogimmic, remind me, do the nodes commission fine? 20/20?14:23
gimmicsingle disk R620's. Literally the only customization I'm doing is changing the hostname and static assignment of the network interface.14:23
gimmicThey all commission fine14:24
kikogimmic, always, right? never any commissioning failures?14:24
gimmicyou know, a more global log viewer would be nice from the weberface14:29
gimmiclike a general event log14:29
gimmicrather than host-specific only14:30
kikogimmic, yeah, I hear you14:33
kikogimmic, we have /something/ like that with remote rsyslog aggregation but I'm not sure it's working in 2.x14:33
kikogimmic, anyway, when deployments fail, is the failure reproducible or rather random?14:34
gimmicI fired off all ten last night before I left work and came back to the 3/10. Have not tshot it yet.14:36
gimmicwill poke at this next batch of 15 since I'm assigning IP/hostnames now.14:36
kikookay14:36
gimmicIs there a document anywhere that describes where each subcomponent logs to? I see the logs under /var/log/maas14:36
kikogimmic, maas.io/docs has the architecture -- there are basically two main components, rackd and regiond14:41
kikorackd talks to the nodes (pxe, ipmi, etc)14:41
kikoregiond hosts the API and Web server14:41
mupBug #1588846 opened: [2.0b6]  builtins.ValueError: invalid literal for int() with base 10: '' <MAAS:Triaged> <https://launchpad.net/bugs/1588846>14:43
dimiterndoes anyone know how long 'Disk erasing' is expected to take on a 256GB SSD ?14:51
dimiternis it a mult-pass secure wipe or something?14:52
dimiternroaksoax: ^^14:53
roaksoaxdimitern: nope, but it iwll take a while14:56
roaksoaxdimitern: it just wipes the whole disk14:56
dimiternroaksoax: yeah, it just finished ~10m for that 256GB SSD; interestingly the other 2 nodes with 120GB SSDs are taking longer (all were started pretty much at once)14:57
mupBug #1588857 opened: [2.0b5] sudo: no tty present and no askpass program specified <MAAS:New> <https://launchpad.net/bugs/1588857>15:13
mupBug #1588868 opened: [2.0b5] While monitoring service 'dhcpd/tgt/dhcpd6/proxy' an error was encountered: Unable to parse the output from systemd for service <MAAS:New> <https://launchpad.net/bugs/1588868>15:22
mupBug #1588875 opened: [2.0-b6] Deploying a trusty (but not xenial) node frequently fails during storage setup of curtin  <MAAS:New> <https://launchpad.net/bugs/1588875>15:43
gimmickiko: http://i.imgur.com/jzb7djY.png15:45
gimmicfired this off about 30 minutes ago, all the nodes are hung in deploying state, same logs:15:46
gimmicNode installation - 'curtin' failed: configuring storageFri, 03 Jun. 2016 10:26:5515:46
gimmicNode installation - 'curtin' failed: configuring disk: sdaFri, 03 Jun. 2016 10:26:5515:46
gimmicNode installation - 'curtin' started: configuring disk: sdaFri, 03 Jun. 2016 10:26:5415:46
gimmicNode installation - 'curtin' started: configuring storageFri, 03 Jun. 2016 10:26:5415:46
gimmicusing LVM now15:46
gimmicnone of them have failed yet, but I suspect there's a timeout somewhere ticking down15:47
dimiterngimmic: just filed that bug above for this15:47
gimmicdimitern: I'm not sure it's LVM related15:48
gimmicinitially I was doing flat ext415:48
gimmicand still run into it15:48
dimiterngimmic: it is15:53
dimiterngimmic: the default layout is flat, but I've created a VG on top of that15:54
dimiternin order to emulate having 2 distinct partitions15:54
gimmicYeah, I'm saying I see the same error/behavior without using LV15:57
gimmiceven installing w/ just flat ext4 curtain hangs in the same way15:57
gimmic..curtin15:57
dimiterngimmic: ah, I see .. not good :/16:18
mupBug #1588868 changed: [2.0b5] While monitoring service 'dhcpd/tgt/dhcpd6/proxy' an error was encountered: Unable to parse the output from systemd for service <MAAS:Incomplete> <https://launchpad.net/bugs/1588868>16:25
kikogimmic, can you feed into https://bugs.launchpad.net/maas/+bug/1588875 as well?16:44
mupBug #1588907 opened: [2.0b6] django.db.utils.IntegrityError: insert or update on table "piston3_consumer" violates foreign key constraint "piston3_consumer_user_id_4ac0863fa7e05162_fk_auth_user_id" <MAAS:New> <https://launchpad.net/bugs/1588907>16:55
kikohm17:04
mupBug #1588914 opened: [2.0b6] MAAS writes DHCP multiple times while not much is going on <MAAS:New> <https://launchpad.net/bugs/1588914>17:31
gimmicso just fyi17:55
gimmicif I use ipmi to power cycle the failed deployment node, release it, and re-deploy17:56
gimmicit seems to deploy fine17:56
gimmicI suspect it's something with how the partition exit code is17:57
gimmicwhen I redeploy, the partitions are already there17:57
gimmicokay, confirmed.. If I just shut the node down and netboot it again18:40
gimmicit works fine after re-deploying18:40
gimmicso the partition creation is exiting with an error code, but completing18:40
=== kwmonroe_ is now known as kwmonroe
mupBug #1588875 changed: [2.0-b6] Deploying a trusty (but not xenial) node frequently fails during storage setup of curtin  <curtin:Confirmed> <MAAS:Invalid> <https://launchpad.net/bugs/1588875>19:22
kikoroaksoax, smoser: is there a way for gimmic to stop the deployment process from rebooting and ssh in?19:36
gimmicso my current 'workaround' is let it time out to failed deployment, mark the nodes as broken, mark them as fixed, deploy19:38
gimmicthat allows me to avoid fiddling with out of band ipmi resets19:38
gimmicI have now deployed 30 nodes using that methodology19:38
smosergimmic, 2 ways19:44
smosera.) maas server change /etc/maas/preseeds/curtin_userdata19:45
smosersee 'power_state' there19:45
smosercomment that out will do19:45
smoserb.) ssh in during deplyoment and sudo touch /run/block-curtin-poweroff19:46
kikogimmic, that will let you try and re-run the partitioning command and see what's failing19:47
kikocould it be a gpt/efi thing?19:47
smoserkiko, you can also just put it comissioning and have it not power itself off19:48
smoserand then do whatever you want19:48
kikosmoser, well, I think the problem is right now gimmic doesn't even know what curtin is doing19:48
kikoso doing it in the deployment phase is going to be easier19:50
mupBug #1588154 changed: [2.0b5] Deploying node fails when it has VLAN configuration <MAAS:Invalid> <https://launchpad.net/bugs/1588154>23:41

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!