/srv/irclogs.ubuntu.com/2023/02/09/#ubuntu-server.txt

=== polymorp- is now known as polymorphic
=== remolej4 is now known as remolej
Guest47Hello everyone! Interesting problem I'm wrestling with: I have a user with 60TB of video data in two locations, one properly organized, the other location...not. The data within is reasonably certain to be the same. Doing a recopy would take a ridiculous amount of time. Just wondering if there's a way to script walking the "correct" system into a03:36
Guest47text file (a "map") and then on the remote system, reading the "map" and recreating the original layout by simply moving inodes around03:36
arraybolt3Guest47: What you're trying to do is possible, but will require a bit of programming/scripting.03:38
arraybolt3I probably can't write the script now, but what I'd do is something like this:03:38
arraybolt31. On the "good" machine, build a tree with the file and directory structionre. Get a SHA256 hash of every file in the tree.03:39
arraybolt32. On the "bad" machine, create a similar tree, making the hashes.03:39
Guest47If there isn't an existing bit of code you all might know of, what are your thoughts on: "find $1 > map.txt" run on both systems, then a diff between the two maps. Finally, parse the diff file so if correct-folder-Ab exists, mv file-a folder-A. If the folder doesn't exist, mkdir -p, then move... simplistic, I know :-/03:39
Guest47oh03:39
arraybolt33. Match the hashes up against each other.03:39
arraybolt3Then you can be sure what file is what and be able to move it where it goes.03:39
arraybolt3If the filenames are guaranteed to be unique *and* identical filenames on both systems indicate identical files, you can skip hashing and just use the filenames.03:40
arraybolt3Guest47: The way you're mentioning sounds fairly straightforward. Depending on how the data on the servers works, it might be that easy.03:40
arraybolt3I don't know for sure how exactly the data is going to behave though, so my suggestion should work in any event (so long as the videos are bit-for-bit identical).03:41
Guest47arraybolt3 thanks for the thoughts! I'm *hopeful* file names are distinct, but without some kind of checksum, no way to be certain right?03:41
arraybolt3True.03:41
arraybolt3And sha256sum'ing 60 TB of data is going to take *looooooong*.03:41
Guest47Is there something faster to checksum than sha? Is crc32 still considered reasonable?03:42
arraybolt3If your code can cope with hash collisions, then something like a CRC or perhaps MD5 might be sufficient.03:42
arraybolt3But if you do that, you *have* to be ready for hash collisions.03:42
sdezielGuest47: I'd also check if `rsync --fuzzy` would work for this. I don't know how flexible it is to find similar files in various directories03:43
arraybolt3MD5 is too likely to generate hash collisions to just be blindly trusted. whereas SHA256 is so unlikely to have a hash collision that it should be able to be trusted without any further checking.03:43
arraybolt3(At least this is what I have been led to believe from my research.)03:43
arraybolt3(If I'm wrong, someone please let me know.)03:43
arraybolt3Guest47: If rsync --fuzzy can't be made to work, I'd benchmark the speed of SHA256 on your hardware, then benchmark the network connection, and use whichever one's faster.03:46
arraybolt3If you have a 1 Gbps connection between the two, then probably the hashing will be quicker. If you have 100 Gbps, then you might want to just reclone and call it a day.03:46
Guest47Odds are probably pretty low of a collision if comparing the simple sum, and the filename.. Any collisions still found I guess I'd just log03:46
Guest47Sadly, it's fully remote, one system in LA, other near San Diego03:47
Guest47Best one could hope for is 1gbps, but likely less03:47
arraybolt3Blah.03:48
Guest47Does copy time exceed programming/debugging time? It never ever does hahaha03:48
sdezielarraybolt3: on some hardware, sha512sum is ~2x faster than sha256sum03:48
arraybolt3sdeziel: Oh really? Didn't know that. That would be quite handy.03:48
Guest47What is the "sum" command hashing?03:49
arraybolt3That's a 16-bit checksum.03:49
arraybolt3You are going to have so many many many collisions with that.03:49
sdezielfor 60TB I'd try the rsync approach of checking the metadata (size, name, etc)03:49
sdezielcomparing the size alone should give you a list of likely copies03:50
Guest47well... "man sum" seems to show the 16bit sum, and the file size. So that could be kind of useful (maybe?)03:51
arraybolt3sdeziel: Heh, what do you know, my system is one of the ones that can sha512 faster than it can sha256.03:51
sdezielarraybolt3: re sha512sum being faster, it's apparently due to operating on larger blocks (https://crypto.stackexchange.com/questions/26336/sha-512-faster-than-sha-256)03:51
arraybolt3(It also makes my system make an odd whining noise? That's creepy.)03:51
arraybolt3There's also blake2b.03:52
arraybolt3That one beats the sap out of sha256 and sha512.03:52
Guest47brb laundry. Before I go, can I just say that I really appreciate your inputs on this? Thanks arraybolt3 and sdeziel03:53
arraybolt3Guest47: Sure thing!03:53
arraybolt3Yeah, seeing the speed of b2sum, I'd use that. It makes long hashes so it will probably not have hash collisions, and it's operating at mind-bending speeds for me - almost a full GiB/s when being piped in /dev/zero.03:53
sdezielGuest47: np. Lastly, I wouldn't use `sum` as you end up reading 60TB from the disks anyway... checking the size would pull way less from the filesystem and give you a good enough first indicator03:54
arraybolt3And piping it an ISO file, still way way faster than SHA.03:54
sdezielI'd keep the heavy hash computation for files that are close in size to disambiguate them03:54
arraybolt3(There's also b3sum which appears to be even faster, https://github.com/BLAKE3-team/BLAKE3)03:58
Guest472.54s for b2sum vs 1.95s for sum on the same 225mb file. Hilariously, sum took 1m33.99s on an 8.01gb file, b2sum took 1m18.26s on the same file04:41
Guest47I'll take a wack at writing the utility to do this tomorrow, I'll drop by and share it with you guys when I finish if you'd like04:47
arraybolt3Thanks! Good luck!04:50
alkisgGuest47: when doing such benchmarks, you should make sure they use the same cache. Either drop caches, or cache the whole file05:35
=== xispita is now known as Guest4041
=== xispita_ is now known as xispita
pvh_sahey everyone... I have a server affected by this bug - https://bugs.launchpad.net/ubuntu/+source/cloud-initramfs-tools/+bug/1958260 - as the bug discussion points out, this only happens if the /lib/modules dir for your kernel version is empty in initramfs - see this code: https://git.launchpad.net/cloud-initramfs-tools/tree/copymods/scripts/init-bottom/copymods#n5211:42
-ubottu:#ubuntu-server- Launchpad bug 1958260 in cloud-initramfs-tools (Ubuntu) "cloud-initramfs-copymods hides the full list of modules from the system" [High, Incomplete]11:42
pvh_saunfortunately since the server is now in this state it seems difficult to get it unstuck. I would like to manually intervene either in the initramfs or this script that is in there so that I can get my system back into a cleanly booting form...11:44
athosbryceh: this is an interesting git-ubuntu case in https://code.launchpad.net/~bryce/ubuntu/+source/nmap/+git/nmap/+merge/43707712:27
athoswhile it is my first time dealing with such case, I suppose it is quite common...12:28
athosrbasak: when the patch gets uploaded, should we expect the current jammy-devel branch to be completely replaced by whatever is in bryceh's branch?12:29
=== justache is now known as deliriumt
=== deliriumt is now known as justache
=== cpaelzer_ is now known as cpaelzer
=== sdeziel_ is now known as sdeziel
rbasakathos: yes the branch pointer will just be updated14:13
rbasakathos: https://bugs.launchpad.net/git-ubuntu/+bug/185238914:13
-ubottu:#ubuntu-server- Launchpad bug 1852389 in git-ubuntu "Branch pointers do not follow deletions, breaking ubuntu/devel and such" [Wishlist, New]14:13
=== otisolsen70_ is now known as otisolsen70
baldpopegood morning all, attempting to setup apt-cacher-ng, server is all setup and now adding auto-apt-proxy to our local nodes, have configured the hostname/dnsdomainname correctly on the node, but when running auto-apt-proxy, nothing is returned on my first node (it does return on the apt-cacher-ng node)17:21
baldpoperunning nslookup with a SRV type for the _apt_proxy._tcp.domain.name returns as expected17:23
sdezielbaldpope: if I understand correctly, you have to install the auto-apt-proxy package on your machines to (eventually) have them use your apt-cacher-ng, right? Instead of using a package to auto-detect, wouldn't it be simpler to deploy an apt.conf snippet saying to use a proxy?17:27
sdeziel`/etc/apt/apt.conf.d/01proxy`: `Acquire::http::proxy "http://apt-cacher-ng.domain.name:3142/";`17:27
baldpopelooks like that would accomplish the same thing?17:29
baldpopethanks sdeziel 17:32
brycehathos, I don't know if it's common, this is the first time I've come across it myself18:45
=== chris15 is now known as chris14
=== remolej4 is now known as remolej

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!