More believable for a Linux OS

nifty · 1 year ago

More believable for a Linux OS

@okamiueru · edit-2 1 year ago

I don’t think the Unix philosophy of having lots of small tools that do one thing and do it well that you compose together has ever been achieved

Why do you think this might be the case? It’s not remotely accurate, which suggests that you must understand it very differently than I do. To some extent, I am curious.

I’ll give you a recent example. Which is just from yesterday. I had a use case where some program had a memory leak, which would eventually lead to the system running out. So, I “built a program that would monitor this and kill the process that used the most memory”. I don’t know how complicated this is in windows and PS, but it took about 2 minutes in Linux, and it very much leverages the Unix philosophy.

Looks something like this:

get_current_available_memory_mb() {
    cat /proc/meminfo | grep MemAvailable | grep -oP '\d*' | xargs printf "%d / 1024  \n" | bc
}

Functionality based on putting together very small pieces that do their things well.

/proc/meminfo is a file pipe that gives you access to information related to memory usage.
cat just outputs data from a file or a named pipe, here the latter
grep lets you filter stuff. First time the relevant line. Then again to strip out the number with a regex.
xargs does one thing well, and lets you pass that on to another command as arguments, instead of stdin.
printf formats the output, here to express the numerical operation of dividing the value by 1024 as “[number] / 1024”
bc evaluates simple mathematical operations expressed in text

Result: 1 file pipe and 5 simple utilities, and you get the relevant data.

The PID of the process using the most memory you can get with something like:

ps aux --sort=-%mem | head -n2 | tail -n1 | awk '{print $2}'

Same sort of breakdown: ps gives you access to process information, and handles sorting by memory usage. head -n2 just keeps the first two lines, but the first one is a header so tail -n1 keeps the second line. awk is used here to only output the second column value. And, you get the relevant data. Also, with simple tools that leverage the Unix philosophy.

You then check if the available memory is below some threshold, and send a kill signal to the process if it does. The Unix way of thinking also stops you from adding the infinite loop in the script. You simply stop at making it do that one thing. That is, 1. check remaining memory. 2. if lower than X, kill PID". Let’s call this “foo.sh”.

You get the “monitoring” aspect by just calling it with watch. Something like watch -n 2 -- ./foo.sh.

And there you go. Every two seconds, it checks available free memory, and saves my system from freezing up. It took me 10 times longer to write this reply, than to write the initial script.

If memory serves me correctly, PS also supports piping, so I would assume you could do similar things. Would be weird not to, given how powerful it is.

I could give you an endless list of examples. This isn’t so much a case of “has ever been achieved”, but… a fundamental concept, in use, all the time, by at least a dozen people. A dozen!

Also yesterday, or it might have been Saturday. To give you another example, I scratched different itch by setting up a script that monitors the clipboard for changes, if it changes, and now matches a YouTube URL, it opens that URL in FreeTube. So… with that running, I can copy a YouTube URL, from anywhere, and that program will immediately pop up and play the video. That too, took about 2 minutes to do, and was also built using simple tools that do one thing, and one thing well. If you wanted it to also keep a local copy of that video somewhere, it wouldn’t be more effort than the 10 seconds it takes to also send that URL to yt-dlp. One tool, that does that one thing well. Want to also notify you when that download is complete? Just add a line with notify-send "Done with the thing". What about the first example, if you want to get a OS level notification that it killed the process? Just add a line to notify-send, same tool that does that same one thing well.

None of this takes much effort once you get into it, because the basic tools are all the same, and they don’t change much. The whole workflow is also extremely iterative. In the first example, you just cat meminfo. Then you read it, and identify the relevant line, so you add grep to filter out that line, and run the command again. It’s now a line containing the value, so you add another grep to filter it out the number, and again, run it. “Checks out”. So, you pipe that to printf, and you run it. If you fuck something up, no biggie, you just change it and run it again until that little step matches your expectations, and you move on.

AnyOldName3 · 1 year ago

I think you’ve misunderstood my complaint. I know how you go about composing things in a Unix shell. Within your post, you’ve mentioned several distinct languages:

sh (I don’t see any Bash-specific extensions here)
Perl-compatible regular expressions, via grep -P
printf expressions
GNU ps’s format expressions
awk

That’s quite a lot of languages for such a simple task, and there’s nothing forcing any consistency between them. Indeed, awk specifically avoids being like sh because it wants to be good at the things you use awk for. I don’t personally consider something to be doing its job well if it’s going to be wildly different from the things it’s supposed to be used with, though (which is where the disagreement comes from - the people designing Unix thought of it as a benefit). It’s important to remember that the people designing Unix were very clever and were designing it for other very clever people, but also under conditions where if they hit a confusing awk script, they could just yell Brian, and have the inventor of awk walk over to their desk and explain it. On the other hand, it’s a lot of stuff for a regular person to have in their head at once, and it’s not particularly easy to discover or learn about in the first place, especially if you’re just reading a script someone else has written that uses utilities you’ve not encountered before. If a general-purpose programming language had completely different conventions in different parts of its standard library, it’d be rightly criticised for it, and the Unix shell experience isn’t a completely un-analogous entity.

So, I wouldn’t consider the various tools you used that don’t behave like the other tools you used to be doing their job well, as I’d say that’s a reasonable requirement for something to be doing its job well.

On the other hand, PowerShell can do all of this without needing to call into any external tools while using a single language designed to be consistent with itself. You’ve actually managed to land on what I’d consider a pretty bad case for PowerShell as instead of using an obvious command like Get-ComputerInfo, you need:

(Get-WmiObject Win32_ComputerSystem).FreePhysicalMemory / 1024

Even so, you can tell at a glance that it’s getting the computer system, accessing it’s free physical memory, and dividing the number by 1024.

To get the process ID with the largest working set, you’d use something like

(Get-Process | Sort-Object WorkingSet | Select-Object -Last 1).Id
# or
(Get-Process | Sort-Object WorkingSet)[-1].Id

I’m assuming either your ps is different to mine, or you’ve got a typo, as mine gives the parent process ID as the second column, not the process’ own ID, which is a good demonstration of the benefits of structured data in a shell - you don’t need sed/awk/grep incantations to extract the data you need, and don’t need to learn the right output flag for each program to get JSON output and pipe it to jq.

There’s not a PowerShell builtin that does the same job as watch, but it’s not a standard POSIX tool, so I’m not going to consider it cheating if I don’t bother implementing it for this post.

So overall, there’s still the same concept of composing something to do a specific task out of parts, and the way you need to think about it isn’t wildly different, but:

PowerShell sees its jurisdiction as being much larger than Bash does, so a lot of ancillary tools are unnecessary as they’re part of the one thing it aims to do well.
Because PowerShell is one thing, it’s got a pretty consistent design between different functions, so each one’s better at its job as you don’t need to know as much about it the first time you see it in order to make it work.
The verbosity of naming means you can understand what something is at first glace, and can discover it easily if you need it but don’t know what it’s called - Select-String does what it says on the tin. grep only does what it says on the tin if you already know it’s global regular expression print.
Structured data is easier to move between commands and extract information from.

Specifically regarding the Unix philosophy, it’s really just the first two bullet points that are relevant - a different definition of thing is used, and consistency is a part of doing a job well.