Reading through an article, What every computer science major should know, I came across a couple of interesting coreutils brain teasers under “The Unix philosophy” to play with of an evening. They turned out to be a great way to learn about some more tools, and to brush up on some of those I already knew.
Find the five folders in a given directory consuming the most space.
This one was pretty simple, though I still learned a new trick! First, I count
the amount of space each directory is using, then sort from largest to smallest,
strip the first line (which will be .
), then cut out the directory sizes.
I couldn’t find a good way to display the human readable size without counting twice, though this serverfault answer suggests that it might be possible with GNU coreutils >= 7.5.
But, I did learn something new! I wasn’t aware that you could pass +x
to
tail
to have it start from the xth line.
Report duplicate MP3s (by file contents, not file name) on a computer.
This one was quite a bit more complicated! At a high level, it seemed like what I wanted here was to hash each file, then compare hashes across all files to find duplicates.
Hashing each file is trivial, as is figuring out which of the hashes are duplicates, but mapping them back to files is a little difficult (given that I’d also decided I was only going to use coreutils for this!)
The solution I ended up with is far from ideal - I end up hashing all the files
twice. My original solution also contained a second call to xargs sh
to get
a subshell, till Theo showed me this cool bash-only subshell trick!
The core idea is that we find hashes for each file, cut out all but those that are repeated at least once, then search for files with any of those hashes.
I learned a bunch from this task!
- I finally bothered to look up why
find
has such strange syntax, and as a consequence never works as I expect. Turns out, the-name
part is a pattern, which has to come after the folder you’re searching in. - I discovered that
find
supports-exec
as an option, and that it’s terrifying and should only be used with lots of care. - While looking these up, I discovered that a better command to find your IP
address than just running
ifconfig
isipconfig getifaddr en0
. Relatedly,ipconfig
is a thing on non-Windows systems! xargs
is much less intimidating than I always thought it was, it turns out it’s really easy to use!grep
can take patterns from a file! I’d seen this before, but it totally slipped my mind till Theo brought it uptr '\\n' ''
is the best way to get rid of newlines from stdin (from an earlier solution)<(...)
is a magic bash trick to create a pseudo-file, that will act as a file object, without actually touching disk (via the magic of/dev/fd/
)
Take a list of names whose first and last names have been lower-cased, and properly recapitalize them.
I did some research into this, and discovered that I think this is only possible
with the GNU version of sed
. Unfortunately, I don’t have that, so I resorted
to cheating:
Find all words in English that have x as their second letter, and n as their second-to-last.
This one was also very familiar to me - I often use my computer to cheat at scrabble.
This does break in the case of the second-to-last letter preceding the second letter, but since this can only happen in the “xn” case (and that isn’t a word), I don’t think it’s a huge deal.