/srv/irclogs.ubuntu.com/2017/11/23/#ubuntu-kernel.txt

=== himcesjf_ is now known as him-cesjf
=== Elimin8r is now known as Elimin8er
Laneybah11:11
Laneynow I can't reproduce https://bugs.launchpad.net/bugs/1730717 very well11:11
ubot5Launchpad bug 1730717 in linux (Ubuntu Bionic) "Some VMs fail to reboot with "watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [systemd:1]"" [High,In progress]11:11
Laneyis there something I can install on all the test VMs to put them into high load easily?11:11
Laneyguess I can write a userdata thing to install stress-ng and run that?11:15
apwright that thing will make it sad in all sorts of different ways, is it cpu load you need11:29
Laneynot sure11:31
Laneywe were suspecting it happens more when the cloud is 'busy'11:31
Laneybut don't know in which way specifically11:31
LaneyI'll just turn on all the things I guess11:31
* Laney has cleared out lcy01 from adt instances for the time being11:32
LaneyGREAT now I can't schedule instances at all11:47
Laneyubuntu@laney-test1:~$ uptime 11:57:35 up 3 min,  2 users,  load average: 509.30, 210.40, 79.2711:57
Laneythat's working then :-)11:57
apwheheh12:09
TJ-Any ideas how to further diagnose an ecryptfs mount failure, when the internal mount() call returns "No such file or directory". This is on bare-metal and both source and target are accesible.12:16
TJ-Is there something internal via  debugfs I could watch?12:17
Laney50 stressy units coming our way12:44
apwTJ-, hmmm13:29
apwTJ-, nothing in dmesg at the time it occurs ?13:31
apwTJ-, all the thingsi n ecryptfs that could return ENOENT seem to report something in dmesg first13:33
apwTJ-, otherwise it is generic13:33
TJ-apw: no; it's actually quite a severe problem I've been working on for a few days with an Ubuntu 17.04 user. They recently used the GUI to change their user password, after which they could not log-in. Turns out the encrypted home was no longer being unlocked, suggesting the GUI somehow didn't rewrap the ecryptfs passphrase. The user has the original key (hex passphrase) but that wasn't working either.13:35
TJ- Then I spotted the Private.sig entries were not the current keys, so wrote a script to automate test and mount. We found /sbin/mount.ecryptfs_private was failing due to the underlying "mount()" call reporting "No such file or directory". So, looking for what the real cause is.13:35
TJ-This was discovered via strace13:35
apwTJ-, and nothing in dmesg, hmm13:36
TJ-apw: No13:36
apwwhat were the parameters to mount when it failed13:36
apwin your strace output, PM me if they might be sensitive13:36
TJ-They're ellided unfortunately but I used the source-code to figure out the entire mount command and we tried that manually, but of course that wouldn't work since mount needs calling with sudo, but the root user doesn't have the user's keyring attached13:37
TJ-apw: The user account name is "a1". Everything about it checks out (ownership, permissions). Here's the strace: http://paste.ubuntu.com/26023608/13:38
TJ-This'd be the equivalent manually: sudo mount -t ecryptfs /home/a1/.Private /home/a1  -o rw,nosuid,nodev,relatime,ecryptfs_fnek_sig=782cb407b85d0079,ecryptfs_sig=769688550d78ced9,ecryptfs_cipher=aes,ecryptfs_key_bytes=16,ecryptfs_unlink_sigs13:39
apwand .Private exists13:40
TJ-apw: oh yes, everything is there. Pastebin's available if you want to check13:40
apwno i trust yoy13:40
TJ-I may have missed something :)13:41
TJ-Interesting aside - ) had to make strace setuid-root to be able to capture the trace, else te setreuid() failed13:42
apwTJ-, so if those signatures were not for keys the filesystem contains13:43
apwthen you would get ENOENT13:43
TJ-The key sigs matched the ones in Private.sig 13:44
apwthose are outside the filesystem though13:44
apwthat is nothing to do with what the filesystem has inside13:44
apwanyhow, maybe not that, i wonder what else13:46
TJ-Right. The thing is, a few days ago when trying to manually mount, the user may have accidentially mistyped the hex passphrase. The mount succeeded but obviously the contet was garbage. I got him to unmount immediately, but the few files that got updated show up in .Private/ because the key-sig encodings changed for those files (in the filenames)13:47
apwso they hve mounted using the wrong passphrase over that directory13:48
TJ-Yes, once.13:48
apwwhen you say they show up in .Private, when ?13:49
apwor are they the only things in there now13:49
TJ-But You can see it here... starting line 55 http://paste.ubuntu.com/26023177/13:49
TJ-you'll see the original files-sigs prior to Nov 20th, and then the 15-ish changed when the wrong key was used13:49
TJ-I wonder, is there a way to convert the filename encoded signatures back to hex so they can be matched to the Private.sig entries?13:52
=== ricotz_ is now known as ricotz
apwok there is definativly no report for ecryptfs in dmesg, if that is so13:55
TJ-no13:55
apwthen ecrypt_fs mount did not fail, it would have reported something itself always if it fails13:55
TJ-right, which is why I was asking for inspiration to debug this further :)13:55
TJ-The user has a keepassxc database in there, else he'd just redo the entire thing13:57
apwwhere is . in that mount, is that home ?13:58
apwhave you tried mounting it somewhere else13:59
apwit woudl return ENOENT if . was a deleted directory14:02
TJ-Yes, it is. The "." is hard coded in mount.ecryptfs_private! That was the first thing I chased down, thinking it was wrong14:03
apwbut you could like14:03
apwmkdir tmp14:03
apwcd tmp14:03
apwand run it there ?14:03
TJ-circa line 687 of src/utils/mount.ecryptfs_private.c:: if (mount(src, ".", FSTYPE, MS_NOSUID | MS_NODEV, opt) == 0) {14:04
TJ-Tried that too; doing a  "pushd /tmp" just before the call to /sbin/mount.ecryptfs_private (in my test script)14:04
apwyou likely could not mount it over /tmp14:05
apwfor other reasons14:05
apwit will also return ENOENT if the "." in this context is marked dont_mount, which also seems to be removed/renamed things14:05
apwhave you tried mounting a ramfs over the "."14:06
apwthat might tell you if any of those tests would fail14:06
TJ-No, but it doesn't work that way in mount.ecryptfs_private - it calls getpwid(), then does a chdir() to the home, *then* uses "."14:06
apwyou could try the tmpfs thing then14:07
apwsee if "." is sick for soem other reason14:07
TJ-No, haven't tried mounting ramfs over ., but . will always be $HOME (as in what getent passwd a1 shows)14:07
apw        printk(KERN_ERR "%s; rc = [%d]\n", err, rc);14:07
TJ-I've run the same tests exhaustively on multiple test user accounts whilst messing up the sigs etc, and not been able to recreate this mount fail14:08
apwas we have that ^ in ecryptfs_mount, i don't see how it can be ecryptfs related, so it should fail for other things14:08
TJ-I wondered if it was keyring related, but the user wrote down the original hex key and is using that now14:09
apwhow did you look for the errors from btw14:09
TJ-errors from...? I'm not sure I understand what you're asking?14:10
apwin dmesg14:11
TJ-oh, got the user to pastebin the log14:11
apwhave you got that pastebin ?14:11
TJ-Not in a tab right now... I only kept the ones that had 'interesting' info - e.g. something related. We got through about 60 pastebins!14:12
apwas the messages are not tagged for ecryptfs14:12
TJ-But, I told him yesterday I'd seek advice then we'd do another debug session, so I can collect that14:12
apwit would say liek "Getting sb failed; rc=N"14:13
apwrather than being obviously related14:13
apwrubbish errors for the win14:13
TJ-I was reading every line *very* carefully, also had "dmesg -w | tail" to capture anything that happened whilst the mount command was running14:13
TJ-I'll capture a dmesg run with the user next time I make contact14:14
TJ-I can get him to boot with 'debug' too which might add something14:14
TJ-in case it's of interest, this is the test/diagnostic script I've created: http://iam.tj/projects/ubuntu/ecryptfs-regenerate-wrapper_signatures_mount_test14:16
apwi cannot see anything which would make mount fail outseide of ecryptfs that would not i believe stop any mount14:18
apwand if it was inside, then it must seemingly be logged in dmesg14:18
apwso i guess confirming dmesg is clear, and that mounting tmpfs or something in the same place first14:19
apwto confirm something can be mounted there14:19
apwother than that, hrm, you need some debug in your kernel i recon14:19
TJ-Yeah... it's a weird one. What annoys me is, the origin of all this is the GUI failing to rewrap the passphrase on password-change; and I've hit that myself a few times at random since 14.04 at least but never figured out why14:20
TJ-well actually not even failing to rewrap... it does rewrap, but doesn't use the new user password, which is worse, so trying to unwrap with old or new password fails14:21
apwno i don't think i have ever really been confortable with ecryptfs and its madness14:21
apwi am gald we are moving to using native encryption on the filesystem14:21
TJ-Thanks for your help on this, I'm glad it's not just me missing something stupidly obvious14:22
TJ-I was planning on bugging Tyler with it :)14:22
apwpossibly still worth it14:23
TJ-Yes... I'm going to write up a cogent bug report that doesn't look like a novel :)14:23
apwTJ-, i do wonder if there is any milage in moving the things with the wrong FNEK out of there14:24
apwi can't see it checking if any of them are bad as part of mount, but14:24
TJ-apw: I was planning on making an entire copy to another location, so we can test safely :)14:25
TJ-The 'wrong key files' issue was the user logged in at the console and the $HOME/.ecryptfs/auto-mount file was there so pam_ecryptfs did the mount. As soon as I realised I had that deleted14:26
TJ-apw: I think you're onto something with the different keys. Looking at the kernel source, all the -ENOENT are in keystore.c and most seem related to calls to ecryptfs_find_auth_tok_for_sig() ... looking at calls to that function, one that stands out is ecryptfs_parse_tag_70_packet() which parses the signatures out of the filename. Now, if the lower superblock is mounted and then it is trying to decrypt14:52
TJ-one of those filenames encoded with the possibly-mistyped-hex-key that might be triggering -ENOENT, but related to the keyring, not the file-system14:52
apwthat should be testible, make two test mounts and umount htem, then copy a file from one to the other14:53
apwand see if that blows its brains up14:54
TJ-Yes14:55
TJ-I'm going to do that here after I've run the Huskies into the ground :)14:56
apwif it does that is one for the faq14:56
TJ-I need the fresh air after all this digging14:56
apw(of doom)14:56
apwheh i know the feeling, lick14:56
TJ-hehehe yeah14:56
apwluck, and lick if there are huskies14:56
TJ-We've got 50kmh wind gusts just now so I should take a hang glider out and let them tow me :D14:57
apwheh, a plan indeed, and much more fun than reading kernel code slowly14:58

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!