tdihpHi mpontillo03:43
tdihpAfter https://askubuntu.com/questions/873701/maas-node-unable-to-resolve-its-own-hostname updated, I'm still trying to get the commissioning working.03:48
tdihpthe current problem is that the storage info is still missing after commissioning, with a valid 00-maas-07-block-devices.out output (/dev/sda is listed).03:50
tdihpWhat else can I try to diagnose the situation?03:50
mpontillotdihp: I would run this script on the commissioning node while SSH'd in to see where the failure is.04:46
mpontillotdihp: that is, the commissioning node executes (pretty much) those commands to figure out what disks are on the system04:47
mpontillotdihp: so you should be able to tell where it's going wrong. I think lamont's theory was that it was timing out trying to contact an iSCSI disk specified by a name rather than an IP address04:47
tdihpThanks mpontillo, I'll try.04:50
mpontillotdihp: updated https://gist.github.com/pontillo/0b92a7da2fba43fb5dce705be2dcf38b -- now includes also the 'sudo' commands05:09
tdihpmpontillo: I've tried the script06:25
tdihpIt will take more than 2 minutes, with more than 6 sudo calls06:26
tdihpHowever the 00-maas-07-block-devices.err file just has 4 lines of sudo errors.06:27
tdihpmpontillo: Oddly though the script you provided did not reproduce those sudo error messages06:27
mpontillotdihp: my script redirects sudo's stderr to /dev/null =)06:29
mpontillotdihp: can you humor me and do two more things... (1) tell me the output of `iscsiadm -m session`, and (2) if you can determine that open-iscsi is trying to connect to MAAS via a hostname rather than an IP address (either iscsiadm or mount should tell you), run "sudo dpkg-reconfigure -plow maas-rack-controller" on the MAAS server, and make sure the MAAS URL06:32
mpontillois set to an IP address and not a hostname06:32
tdihpmpontillo: And yes, After removing those "2>/dev/null" I saw the identical errors06:32
tdihpmpontillo: the iscsi output: tcp: [1],1 iqn.2004-05.com.ubuntu:maas:ephemeral-ubuntu-amd64-generic-xenial-daily (non-flash)06:33
mpontillotdihp: lamont's theory, by the way, was that the sudo errors are a red herring; I think he thought the actual issue was probing an iSCSI target by its hostname rather than an IP address06:34
tdihpmpontillo: I see. The iscsi target is in ip though.06:37
mpontillotdihp: yeah. I see that. I wonder if it was passed into the kernel that way. try:06:38
mpontillocat /proc/cmdline | tr ' ' '\n' | grep iscsi_target_ip06:38
mpontillotdihp: leave off the grep and you should also see a root= line that has the iSCSI /dev/ path in there.06:39
mpontillotdihp: hm. so when you tried my script, did storage information come back as expected?06:39
tdihpmpontillo: yes, the script did output sda sda1 and sdb06:41
tdihpThe only line(s) with any hostnames are: iscsi_initiator=pure-mammal -- ip=::::pure-mammal:BOOTIF06:42
mpontillotdihp: so it's curious that you don't have that same data when you commission then. in my setup, when I browse to the node details page, scroll down to commissioning details, and click on "00-maas-07-block-devices.out", I see a JSON object which is a list of two dictionaries (each disk in the system)06:43
tdihpmpontillo: Yes, my "00-maas-07-block-devices.out" also lists two items, curious truly :D06:45
mpontillotdihp: ok, what state is the node in? perhaps the commissioning is failing at some other point?06:47
tdihpmpontillo: BTW I've figured a way to tap to the HTTP calls the node made to the MAAS server06:47
mpontillotdihp: oh yeah? what was your method? I've captured packets in the past, but that was a pain. later I noticed we had a mass_get.py script or similar which made the calls, heh06:47
tdihpmpontillo: the node is now in "ready", just with "storage" as 0 GB06:48
tdihpmpontillo: Yeah I wish I could use mitmproxy too.06:48
tdihpmpontillo: https://sysdig.com/blog/decode-your-http-traffic-with-sysdig/ the echo_fds bit06:49
mpontillotdihp: so 00-maas-07-block-devices.out -- I assume it doesn't contain sizes then?06:49
mpontillotdihp: do the sudo commands in my script get the block size? (just noticed a typo, the second one should be bsz: and not size64:)06:50
mpontillotdihp: ah nice, that tool looks interesting06:50
mpontillotdihp: and I didn't use a MitM method, I just used "sudo tcpdump" on the rack controller, btw06:51
mpontillotdihp: but I had to wade through a huge packet capture ;-)06:51
tdihpmpontillo: So the point was, between the "starting 00-maas-07-block-devices" request and the report, there was less than 80 secs06:53
tdihpmpontillo: which matches the lag of 4 sudo calls (2 devices)06:55
mpontillotdihp: well, the important thing to me is whether or not you're getting the data. if sudo is slow, that's not good. but I guess it's aborting before it's able to get the data. which is weird... on my system, I still get the "unable to resolve" error, but it works immediately06:56
mpontillotdihp: I suspect that is because I am getting a NXDOMAIN response back from the DNS server, and it's able to fail fast, whereas in your case, for some reason, the packets don't make it, and it times out06:57
tdihpmpontillo: the 00-maas-07-block-devices.out does include size for sda like: "SIZE": "500107862016",06:58
mpontillotdihp: well that's interesting. so I assume you have a 500 GB disk then06:59
mpontillotdihp: if that's the case then I'm surprised it isn't being reported correctly.07:00
tdihpmpontillo: Yeah, MAAS still reports "No storage information"07:00
mpontillotdihp: is there anything interesting under /var/log/maas/rsyslog/<hostname>/* on the MAAS server?07:06
mpontillotdihp: another thing you could do is look at: curl $(cat /proc/cmdline | tr ' ' '\n' | grep cloud-config-url | cut -d= -f2-)07:08
mpontillotdihp: you should see under datasource: and reporting: the URLs used to communicate back to the MAAS server. though I expect they're okay given that you seem to have commissioning output.07:09
mpontillotdihp: at this point, I would also check /var/log/maas/regiond.log, and check for any Python tracebacks. maybe that could provide a clue why the size couldn't be calculated07:09
tdihpmaasserver: [error] Error while calling DescribePowerTypes: Unable to get RPC connection for rack controller 'nv750' (k4am4k).07:12
mpontillotdihp: are the region and the rack on the same machine? is that consistent or a one-off?07:12
mpontillotdihp: that could happen if, for example, MAAS is restarting07:13
mpontillotdihp: what you might do is "tail -f" the log and try commissioning again, then see if any tracebacks occur during commissioning07:13
tdihpmpontillo: It's the same machine07:14
mpontillook. probably a red herring then, unless you have firewall rules I don't know about ;-)07:14
tdihpmpontillo: and its not one-off07:15
mpontillotdihp: hm, well that is weird then. if you go to Nodes > Controllers and click your controller, is the status all green?07:16
tdihpmaasserver.rack_controller: [critical] Failed configuring DHCP on rack controller 'id:1'.07:17
tdihpmpontillo: oops07:17
tdihpmpontillo: It seems the rack controller is offline on MAAS07:18
mpontillotdihp: well that's an interesting clue, however it seems DHCP is working, or you wouldn't be able to PXE boot?07:18
mpontillotdihp: ok, I would "tail -f rackd.log" and then "service maas-rackd restart"07:18
tdihpmpontillo: Yes DHCP/TFTP seems working07:18
tdihpmpontillo: Oh, My IP have changed07:19
tdihpmpontillo: the rackd is fixed, I'll try commision again07:21
mpontillotdihp: great. hoping it works this time =)07:21
tdihpmpontillo: I guess I still have to try harder07:28
tdihpmpontillo: is it possible that the MAAS server failed to update the node status when commissioning?07:31
mpontillotdihp: you can look at the node event log and find out07:32
tdihpmpontillo: there is a burst of events like this: pure-mammal ureadahead[463]: ureadahead:events/fs/open_exec/enable: Ignored relative path07:37
mpontillotdihp: that might be a red herring https://bugs.launchpad.net/maas/+bug/164383807:47
mpontillotdihp: any chance you can pastebin your regiond.log and rackd.log?07:47
mpontillotdihp: maas.log too for completeness07:48
tdihpmpontillo: Sure I can.07:48
tdihpmpontillo: https://1drv.ms/u/s!Ap2xOCxdN0NAjCHA48CAfrmpySNc07:53
tdihpmpontillo: Can you download the file?07:54
tdihpOh, you mean there's a real pastebin tool07:54
tdihppls wait, I'll paste there07:55
mpontillotdihp: no that's fine, I grabbed it and was looking through08:01
mpontillotdihp: haven't found anything interesting though08:02
mupBug #1656717 opened: Juju -> MAAS [2.2+] API integration needs to account for null spaces <juju:Triaged> <MAAS:In Progress by mpontillo> <https://launchpad.net/bugs/1656717>08:03
mupBug #1656717 opened: Juju -> MAAS [2.2+] API integration needs to account for null spaces <juju:Triaged> <MAAS:In Progress by mpontillo> <https://launchpad.net/bugs/1656717>08:04
mupBug #1643001 changed: Moonshot iLO4 'Power HW address' prevent ipmitool from working <MAAS:Expired> <https://launchpad.net/bugs/1643001>08:05
tdihpmpontillo: Haven't mentioned but the commissioning did work for only one time, and I could never reproduce =)08:08
=== frankban|afk is now known as frankban
mpontillotdihp: when was the last time you reopened the browser tab with your MAAS UI? can you try force-reloading it? I may have just spotted the bug. it's possible that if the amount of storage changes, the UI won't refresh properly. it may be that your node commissioned just fine.08:15
mpontillotdihp: the fact that the node is in "Ready" state means that MAAS thinks the node is usable. it would be in "Failed Commissioning" state if MAAS thought otherwise.08:15
mpontillotdihp: if that doesn't work, I would try deleting all the storage devices and recommissioning the node (though it SHOULD work without doing that; MAAS tries to update existing devices though, and I'm wondering if that step somehow goes horribly wrong)08:17
tdihpmpontillo: Yes! Strange enough after having a new browser access, the storage size has shown!08:19
* mpontillo groans08:19
tdihpI'll try delete the node and do another check08:20
mpontillotdihp: glad we're finally through that =)08:20
* mpontillo should sleep soon; it's after midnight here08:20
tdihpmpontillo: Thanks really. Good night, cheers08:20
mpontillotdihp: happy to help; have a good day.08:21
jlec_Hi there08:51
jlec_What is the current way of creating custom CentOS images with MAAS-2.x?08:51
brendandjlec_, CentOS is available by default in MAAS now09:31
brendandno custom images needed09:32
jlec_brendand: yes I know. But the official images provided by canonical are older versions. I need 7.2/310:09
jlec_secondly I like to change the base image deployed.10:10
jlec_And lastely only 6.6 picks up an IP, but the 7.0 fails to do so10:10
mupBug #1656717 opened: Juju -> MAAS [2.2+] API integration needs to account for null spaces <juju:Triaged> <MAAS:In Progress by mpontillo> <https://launchpad.net/bugs/1656717>10:37
mupBug #1643001 changed: Moonshot iLO4 'Power HW address' prevent ipmitool from working <MAAS:Expired> <https://launchpad.net/bugs/1643001>10:38
YoofyI see the maas-builder-image are deprecated, do you know how made custom image ?10:55
MrLeauDoes anybody have problem commissioning nodes using 16.04 with an offline MAAS install?12:47
MrLeauI'm getting Hash sum mismatch, I think, it's because its renaming the files in /var/lib/apt/lists12:49
=== dannf` is now known as dannf
mpontilloMrLeau: I have seen that before when my mirror was out of date. For me it meant that my images were more up-to-date than my mirror.16:44
mpontilloMrLeau: MAAS installs packages from the archive during commissioning, for things like checking lldp connectivity.16:44
MrLeauYes, it is very likely that my mirror is out of date, but I found if I delete everything from /var/lib/apt/list, apt will update it, it works16:48
MrLeauIf there was a way to clean /var/lib/apt/list I think it would fix my problem but the commission script runs after the standard apt-get update16:49
=== frankban is now known as frankban|afk
jlechi again. now that the maas-image-builder is deprecated, what is the state of the art way to build custom images?20:17
pmatulisjlec, i don't think there is a way afaik20:40
jlecpmatulis: but how does canonical do it to preovide the image to the community?20:41
jlecThere is something around curtin, but no real docs anywhere20:42
=== frankban|afk is now known as frankban
plarsI have a small maas system with http://maas.ubuntu.com/images/ephemeral-v2/daily/ as the boot image sync url, and a while back when I first set it up, I was able to use it to boot with either xenial or zesty.21:00
plarsbut now if I try to boot start node with distro_series=zesty, it tells me:21:00
plars{"distro_series": ["'zesty' is not a valid distro_series.  It should be one of: '', 'ubuntu/trusty', 'ubuntu/wily', 'ubuntu/xenial'."]}21:00
plarsbut distro_series=xenial works just fine21:01
plarsbrendand: iirc you had helped me with this a while back when it first worked, any ideas?21:01
pmatulisplars, what version of maas are you using?21:32
plarspmatulis: 1.9.4+bzr45921:33
pmatulisplars, oh. you haven't up'd to the 2.x series. i only really started using maas with 2.0 and 2.121:34
plarspmatulis: for various reasons, I'm reluctant to update from that. We are trying to use an existing dns/dhcp server in this lab rather than letting maas handle that function. This worked for a long time, then broke in an update a while back. Folks here were able to help me convince it to (mostly) quit trying to force us down that path but warned me that21:35
plarsfuture updates might break it completely21:35
plarsand this did previously import the zesty images and let us deploy with it at one time, so I'm not sure when it stopped21:36
plarsor why21:36
pmatulisplars, maybe go directly to the URL and see if zesty is there21:37
plarspmatulis: I see it on that url21:37
plarsthere was an update a few days ago it seems21:37
plarsto that image21:37
pmatulisplars, i suggest filing a bug then21:38
=== frankban is now known as frankban|afk

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!