/srv/irclogs.ubuntu.com/2023/02/09/#ubuntu-server.txt

=== polymorp- is now known as polymorphic
=== remolej4 is now known as remolej
Guest47	Hello everyone! Interesting problem I'm wrestling with: I have a user with 60TB of video data in two locations, one properly organized, the other location...not. The data within is reasonably certain to be the same. Doing a recopy would take a ridiculous amount of time. Just wondering if there's a way to script walking the "correct" system into a	03:36
Guest47	text file (a "map") and then on the remote system, reading the "map" and recreating the original layout by simply moving inodes around	03:36
arraybolt3	Guest47: What you're trying to do is possible, but will require a bit of programming/scripting.	03:38
arraybolt3	I probably can't write the script now, but what I'd do is something like this:	03:38
arraybolt3	1. On the "good" machine, build a tree with the file and directory structionre. Get a SHA256 hash of every file in the tree.	03:39
arraybolt3	2. On the "bad" machine, create a similar tree, making the hashes.	03:39
Guest47	If there isn't an existing bit of code you all might know of, what are your thoughts on: "find $1 > map.txt" run on both systems, then a diff between the two maps. Finally, parse the diff file so if correct-folder-Ab exists, mv file-a folder-A. If the folder doesn't exist, mkdir -p, then move... simplistic, I know :-/	03:39
Guest47	oh	03:39
arraybolt3	3. Match the hashes up against each other.	03:39
arraybolt3	Then you can be sure what file is what and be able to move it where it goes.	03:39
arraybolt3	If the filenames are guaranteed to be unique and identical filenames on both systems indicate identical files, you can skip hashing and just use the filenames.	03:40
arraybolt3	Guest47: The way you're mentioning sounds fairly straightforward. Depending on how the data on the servers works, it might be that easy.	03:40
arraybolt3	I don't know for sure how exactly the data is going to behave though, so my suggestion should work in any event (so long as the videos are bit-for-bit identical).	03:41
Guest47	arraybolt3 thanks for the thoughts! I'm hopeful file names are distinct, but without some kind of checksum, no way to be certain right?	03:41
arraybolt3	True.	03:41
arraybolt3	And sha256sum'ing 60 TB of data is going to take looooooong.	03:41
Guest47	Is there something faster to checksum than sha? Is crc32 still considered reasonable?	03:42
arraybolt3	If your code can cope with hash collisions, then something like a CRC or perhaps MD5 might be sufficient.	03:42
arraybolt3	But if you do that, you have to be ready for hash collisions.	03:42
sdeziel	Guest47: I'd also check if `rsync --fuzzy` would work for this. I don't know how flexible it is to find similar files in various directories	03:43
arraybolt3	MD5 is too likely to generate hash collisions to just be blindly trusted. whereas SHA256 is so unlikely to have a hash collision that it should be able to be trusted without any further checking.	03:43
arraybolt3	(At least this is what I have been led to believe from my research.)	03:43
arraybolt3	(If I'm wrong, someone please let me know.)	03:43
arraybolt3	Guest47: If rsync --fuzzy can't be made to work, I'd benchmark the speed of SHA256 on your hardware, then benchmark the network connection, and use whichever one's faster.	03:46
arraybolt3	If you have a 1 Gbps connection between the two, then probably the hashing will be quicker. If you have 100 Gbps, then you might want to just reclone and call it a day.	03:46
Guest47	Odds are probably pretty low of a collision if comparing the simple sum, and the filename.. Any collisions still found I guess I'd just log	03:46
Guest47	Sadly, it's fully remote, one system in LA, other near San Diego	03:47
Guest47	Best one could hope for is 1gbps, but likely less	03:47
arraybolt3	Blah.	03:48
Guest47	Does copy time exceed programming/debugging time? It never ever does hahaha	03:48
sdeziel	arraybolt3: on some hardware, sha512sum is ~2x faster than sha256sum	03:48
arraybolt3	sdeziel: Oh really? Didn't know that. That would be quite handy.	03:48
Guest47	What is the "sum" command hashing?	03:49
arraybolt3	That's a 16-bit checksum.	03:49
arraybolt3	You are going to have so many many many collisions with that.	03:49
sdeziel	for 60TB I'd try the rsync approach of checking the metadata (size, name, etc)	03:49
sdeziel	comparing the size alone should give you a list of likely copies	03:50
Guest47	well... "man sum" seems to show the 16bit sum, and the file size. So that could be kind of useful (maybe?)	03:51
arraybolt3	sdeziel: Heh, what do you know, my system is one of the ones that can sha512 faster than it can sha256.	03:51
sdeziel	arraybolt3: re sha512sum being faster, it's apparently due to operating on larger blocks (https://crypto.stackexchange.com/questions/26336/sha-512-faster-than-sha-256)	03:51
arraybolt3	(It also makes my system make an odd whining noise? That's creepy.)	03:51
arraybolt3	There's also blake2b.	03:52
arraybolt3	That one beats the sap out of sha256 and sha512.	03:52
Guest47	brb laundry. Before I go, can I just say that I really appreciate your inputs on this? Thanks arraybolt3 and sdeziel	03:53
arraybolt3	Guest47: Sure thing!	03:53
arraybolt3	Yeah, seeing the speed of b2sum, I'd use that. It makes long hashes so it will probably not have hash collisions, and it's operating at mind-bending speeds for me - almost a full GiB/s when being piped in /dev/zero.	03:53
sdeziel	Guest47: np. Lastly, I wouldn't use `sum` as you end up reading 60TB from the disks anyway... checking the size would pull way less from the filesystem and give you a good enough first indicator	03:54
arraybolt3	And piping it an ISO file, still way way faster than SHA.	03:54
sdeziel	I'd keep the heavy hash computation for files that are close in size to disambiguate them	03:54
arraybolt3	(There's also b3sum which appears to be even faster, https://github.com/BLAKE3-team/BLAKE3)	03:58
Guest47	2.54s for b2sum vs 1.95s for sum on the same 225mb file. Hilariously, sum took 1m33.99s on an 8.01gb file, b2sum took 1m18.26s on the same file	04:41
Guest47	I'll take a wack at writing the utility to do this tomorrow, I'll drop by and share it with you guys when I finish if you'd like	04:47
arraybolt3	Thanks! Good luck!	04:50
alkisg	Guest47: when doing such benchmarks, you should make sure they use the same cache. Either drop caches, or cache the whole file	05:35
=== xispita is now known as Guest4041
=== xispita_ is now known as xispita
pvh_sa	hey everyone... I have a server affected by this bug - https://bugs.launchpad.net/ubuntu/+source/cloud-initramfs-tools/+bug/1958260 - as the bug discussion points out, this only happens if the /lib/modules dir for your kernel version is empty in initramfs - see this code: https://git.launchpad.net/cloud-initramfs-tools/tree/copymods/scripts/init-bottom/copymods#n52	11:42
-ubottu:#ubuntu-server- Launchpad bug 1958260 in cloud-initramfs-tools (Ubuntu) "cloud-initramfs-copymods hides the full list of modules from the system" [High, Incomplete]		11:42
pvh_sa	unfortunately since the server is now in this state it seems difficult to get it unstuck. I would like to manually intervene either in the initramfs or this script that is in there so that I can get my system back into a cleanly booting form...	11:44
athos	bryceh: this is an interesting git-ubuntu case in https://code.launchpad.net/~bryce/ubuntu/+source/nmap/+git/nmap/+merge/437077	12:27
athos	while it is my first time dealing with such case, I suppose it is quite common...	12:28
athos	rbasak: when the patch gets uploaded, should we expect the current jammy-devel branch to be completely replaced by whatever is in bryceh's branch?	12:29
=== justache is now known as deliriumt
=== deliriumt is now known as justache
=== cpaelzer_ is now known as cpaelzer
=== sdeziel_ is now known as sdeziel
rbasak	athos: yes the branch pointer will just be updated	14:13
rbasak	athos: https://bugs.launchpad.net/git-ubuntu/+bug/1852389	14:13
-ubottu:#ubuntu-server- Launchpad bug 1852389 in git-ubuntu "Branch pointers do not follow deletions, breaking ubuntu/devel and such" [Wishlist, New]		14:13
=== otisolsen70_ is now known as otisolsen70
baldpope	good morning all, attempting to setup apt-cacher-ng, server is all setup and now adding auto-apt-proxy to our local nodes, have configured the hostname/dnsdomainname correctly on the node, but when running auto-apt-proxy, nothing is returned on my first node (it does return on the apt-cacher-ng node)	17:21
baldpope	running nslookup with a SRV type for the _apt_proxy._tcp.domain.name returns as expected	17:23
sdeziel	baldpope: if I understand correctly, you have to install the auto-apt-proxy package on your machines to (eventually) have them use your apt-cacher-ng, right? Instead of using a package to auto-detect, wouldn't it be simpler to deploy an apt.conf snippet saying to use a proxy?	17:27
sdeziel	`/etc/apt/apt.conf.d/01proxy`: `Acquire::http::proxy "http://apt-cacher-ng.domain.name:3142/";`	17:27
baldpope	looks like that would accomplish the same thing?	17:29
baldpope	thanks sdeziel	17:32
bryceh	athos, I don't know if it's common, this is the first time I've come across it myself	18:45
=== chris15 is now known as chris14
=== remolej4 is now known as remolej

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!