Part 3: Hiding the Machines

Operating systems are where things start sounding familiar to normal people. It is software that gets run (almost) first when a computer starts. There are standardised protocols but in short: when a machine is turned on it first runs a bit of code that it keeps in a special memory chip built into the motherboard (usually EEPROM), BIOS. This does the basic checks – is the CPU plugged in, is there any RAM, is it working, is there a disk etc; sets up communication buses, synchronises various clocks, and so on. It then shows a picture to advertise the motherboard manufacturer, and waits a little in case the user wants to launch a simple GUI to fiddle with what configuration is available at that level. Then it goes to a standard place on the disk and looks for instructions about where to find the code for the operating system, and starts executing it. Yay.

Operating systems introduce concepts like file, folder, program, shared library,windows… Up to that point it’s all (so called) bare metal – RAM addresses, disk sectors, instructions, bus, pins, and such. From that point on, pretty much all interaction that other programs or humans have with the machine is actually interaction with the operating system.

There are really only two flavours of them that matter these days: Windows & Linux. Pretty much everything else (Apple OS, Android, Kindle OS … ) is a more or less customised version of Linux.

A Tiny Bit of History

There were many other operating systems over the years. And there was Unix. It was developed in parallel with C, in Bell Labs in America. They got funding by promising it will help the company with their admin work, which is why it had things like a spell checker, various file editing tools, calculators, and such from early on. It was incredible compared to everything else around; more stable, faster, easier to use, faster to boot, infinitely expandable, easier to write code for. Along the way, to make it all that, they developed C and a powerful scripting language (shell scripting). Most programming languages you see these days trace their lineage to one or both of these.

It was all command line driven, so was everything then – this is late 60s, early 70s. However, there was no open source then, and it was expensive, intended for commercial users running powerful (then) machines. To their defence, there was little else available to run code on 🙂

It was a huge success.

Then along came IBM, with the idea of making affordable PCs for everyone, and employed Bill Gates to write an operating system for them. He did it from scratch, mostly on his own, and wrote a crappy baby version of Unix, called DOS. This is still what you get when you run windows command line.

Apple on the other hand, pretty much rewrote Unix, but decided that the command line is ugly and added a windows based GUI on top. Then everyone started adding window based GUIs, and Windows was born.

In the meantime, Linus Torvalds, just some geek, organised a group of other geeks online (in newsgroups, there was no real internet back then) to write their own version of Unix, and make it free. There were others around, but this one was the only one that was fully communal, no ownership, totally decentralised (led by Linus, but he was really just a techie there); so they made Linux. It was raging success – incredibly flexible too, so all else died off. Apple’s Mac OS is not directly based on Linux, but it is another copy of Unix – so similar that it only matters when geeking out. Importantly, they are all in the core written in C. The reason is that there wasn’t much else available to do the job, there still isn’t really; and it matters because it explains the style of much of the internal APIs.

There are many differences between Windows and Linux, but as they chased each other over decades, most are now generally quite subtle in most cases; except that when you start pushing it, Linux is faster, more stable, more secure, and needs a lot less to run (less disk space, less CPU power, less memory … it runs just fine on computers on which Windows crawls).

Why is this?

Firstly, it is just better. It was very well thought through to start with, then every line of code was scrutinised hundreds of times.

Secondly, because there is still a crucial difference in the core design. From the outset Unix was built to be modular, open ended and extensible. Composing functionality of simple independent components in standardised ways, and ability to install and configure them independently was the core of the design philosophy. It was a conscious decision and is pervasive. Similarly for security – proper user permissions setup was built into it from the start. It makes it more stable too because it extends throughout; so programs running separately cannot easily affect each other, even programs have permissions. Finally, internal layers of code are very clearly separated, and again very modular.
What this means in real life is: no lint. Everything starts off streamlined, you have to work really hard to add garbage. For geeking out occasions: “Unix Philosophy” is a thing (https://en.wikipedia.org/wiki/Unix_philosophy). Geeky, but probably the most influential set of ideas in software engineering ever; worth checking out.

The downside is that Unix is so componentized and so open ended and configurable, that out of the box it makes no decisions for you – want a word processor? There’s 20 free ones, pick what you like and figure out how to set it up. Want a windows system? Sure, literally 100s of options, pick one, install it. It’s great, if you like researching available software options, but it used to be a lot of work.

It’s better now. They caught up with the idea of user friendliness through “Linux distributions” – pre-packaged configurations that do make decisions for you and come with all these choices made for you and ready to use out of the box. (Ubuntu, Debian, Mint … there’s loads of those too) They are now actually quite good, super user friendly, free, pretty, well supported. But it took them decades. Nowadays there are specialised distributions that don’t even mention Linux, but they still are Linux – Android, various smart TV OSs, Kindle OS, anything else you can think of.

Microsoft took a different approach: people want garbage and don’t want to think about it, and even when they do, they are too lazy to add it themselves: here, we’ll do it for you. What that meant, together with constant wars to monopolise the market, is that Windows still has to lag around code written 30 years ago and live with decisions they made then, and lots of it. It is also very monolithic, back to front; some of it on purpose even; for example: when browser wars were in full swing, Microsoft on purpose tangled up their code with Internet Explorer so much that most of it is still there. They did this so that they can argue in anti-monopoly lawsuits that they simply must make Internet Explorer the default browser.

They did improve a lot over time, major parts of Windows were completely rewritten, they even opened the source, but they will never shake all this shameful legacy off. Be that as it may – they did help a lot make computers accessible, and using windows is the single most transferable IT skill around. Also the stuff they built around it is amazing – Excel, Word, their compilers.

Apple took the same approach, except for being elitist; yeah, their stuff is great, looks pretty, works out of box, but few can afford it.

Once you put the hippie neoliberalism to one side, they all play a role. Windows will be running on desktops for a long time, Linux will be running on everything else, Mac OS will be there for those who want pretty things and can pay; and they will all talk over the same networks. And that’s not so bad.

Where it becomes messy is that in practice code is written to work for a particular operating system – very, very rarely does code know anything about the lower levels of the hardware (more about this imminently). Lots and lots has been done to make it easier to write portable code – programs that will just run under any operating system you choose to run it on. Python is really good at it, so is Java (in fact, that was the entire purpose of Java), modern C and C++ are ok if you are careful. But you can never assume it, sometimes it’s a nightmare to port from one OS to another; it’s not only code, they all have very different file systems, deal with libraries differently, security too. Up until ten or so years ago, making sure your code was portable to start with was a lot of work, few could afford to spend time on it, and there is a lot of code around that is older than that. Add to that the fact that all these systems get regular version upgrades that are not backwards compatible, that in Linux you have so much freedom of choice that in practice there are no guarantees that code will run when you move it from one distribution to another, and … it’s hard. One of the reasons for Python’s popularity is that portability is much, much easier than in almost anything else; it’s there by default, it very rarely breaks.

Ok, back to work.

Operating System Concepts

Once the operating system is there, it abstracts away the hardware. This is a big deal. The big deal. Except very rarely, you don’t write code for a real computer, you write code for an operating system. Unless you really insist on it, you don’t even have access to the real memory, disks, CPU features, screens and so on. Compilers do, so that they can optimise things for you, and the OS does so that it can run things for you. It is all very well documented, and if you need to make compilers or OSs make them use this feature of hardware or that you can, but even that is abstracted away. As we go through the list here it will become clearer just how far this goes. The key point is that this is crucial for any portability whatsoever: as long as your code runs on Windows, it will run on any Windows machine (ok, as long as the machine has enough memory etc). It wasn’t always like that.

Drivers

Drivers are the abstraction layer between the operating system and individual pieces of hardware. There are conventions and very low level binary interfaces for each component of a computer and each peripheral; drivers specify the details of this to the operating system, or actually implement code for the interfacing of hardware into the OS. They are provided by manufacturers. These days there are centralised registries so they get installed automatically for you; Windows actually ships with loads of them just in case; you rarely have to do anything about it, but they are there.

Files

There are no such things as files or folders at the hardware layer. It is all completely imagined and implemented by the operating system. Windows and Linux have entirely different implementations of file systems, also a couple of versions of them each. They do all follow the same basic tree structure, but there are differences in syntax (‘\’ on windows, ‘/’ on Linux) and the layout is very different. Windows has the notion of drives, the tree structure has one top for every physical drive (C:, D: … ); Linux hides all this completely, you can if you really care figure it out, but you don’t really know where a folder is physically; there is just a tree of folders with a single root.

Also, in Linux the concept of a file is far more generic – it is a main unit of data storage OR transfer. For example, there are files in there that represent your screen, you write to the file, you draw on the screen. There is one for keyboard, mouse etc – you read from them, you see raw input from keyboard or a mouse. People say that in Linux everything is a file … not quite but almost.

Anyway, the crux of the matter is that every time you access a file in any way, you are asking the operating system to let you work in this imaginary structure.

I/O

Input/output … mostly it is about files, but it is way more generic than that. Unix introduced the concepts of streams. A stream is basically something you can keep reading data from in chunks, normally quite small chunks. Many languages implement this idea directly. It is extremely powerful, and one of the core features of Unix.

Unix also introduced the idea of “standard input” (stdin), by default it reads keystrokes from the keyboard, but, if you write your code to take input from standard input, then later you can redirect this to use output of another program instead, or a file, or internet … whatever will provide data in the right format. There is also standard output (stdout), by default it’s the command line (terminal) screen, but again can be redirected easily. This way you can set up “pipes” between programs and link them super easily: redirect output of one to be the input of another. So you write small programs that each do a simple thing and hook them up together to do a complicated thing.

There is also “standard error” stream (stderr), where errors and logs should get written. By default it’s the same as stdout. Used a lot more rarely but helps sometimes.

All operating systems adopted these, some better than others, but they are there. Reading from standard input and writing to standard input are what most programming languages documentation will use to talk about print and input statements.

The idea of internet streams is the same, though they don’t work anywhere near as smoothly or easily.

It extends further, in most languages you can make almost anything into a stream … read values from a list, sure there’s a stream from that; from a string, of course; same for writing. Some people are so clever that they do this all the time. For most of us it’s not actually that useful, too much abstraction for no good reason. It has a place, of course, but why it makes people think they are cool when they use these all the time, I’ll never know.

Most of the time though, you figure out what functions or libraries, and/or what syntax a particular language uses for a particular kind of input or output and learn it and use it. It’s often quite different from one language to another, but always relies on the OS to deal with it for you.

Programs/Apps

This is a surprising one. But to hardware there isn’t really a concept of a program, just some instructions that it executes in order, once it knows where to start. OS creates and implements this idea of machine code (numbers, binary stuff) being stored in a file, with certain conventions about where code starts, and some bits and pieces like that. The file itself is called “executable”, on windows they always have extension .exe (no rules on linux).

When you execute a program what happens is that the OS finds the file for it, allocates some stack space for it (see later), loads it in to RAM (or a portion of it that is currently executing), and tells the CPU to go to where the code from the program file was loading and start executing it. Once this starts happening, it’s called a process.

It can also load multiple programs into memory and switch between them (hooray, multitasking). There are algorithms figuring out how to switch between processes, mostly it’s just round-robin, but you can give them priorities; it gets clever when something is intense or particularly sleepy.

Static Libraries

This is really a C concept, but C is so intimately tied to OS implementations that there’s no difference really. In all programming languages you can write code that is reusable (doh). You can take that code, compile it into machine code separately, and once compiled, save it into a separate file. Then once you start compiling your program, the code from this file gets built into your own program, becomes a part of it.

Not all languages use this directly, many implement their own versions of it from scratch. Because this is so focused on C, it’s just too difficult to integrate the code from the static library into your own code.

Dynamic\Shared Libraries

This is where it becomes cleverer. Instead of integrating the code into your own program, how about you instead load the file that has the functions you need at runtime (once your program is already executing), and then call the functions you need directly?

It can work with pretty much any language – you are not mixing code, you are just switching execution from one place in RAM to another (from where your code is, to where the shared library is loaded). For those languages that can do object oriented programming, you can also expose classes in this way.

This happens ALL the time. It is the primary way of sharing functionality. In fact, all OS functionality is provided this way, so there’s actually little choice – as soon as your program writes or reads anything, eventually it calls a function in a shared library. Sometimes they call it “binary interface”, because the code in the shared libraries is already compiled.

Different programming languages deal with this differently in the syntax details, but there is always a way to do it. Often they call it the language’s interface to C. Java has something called JNI for this (Java-Native Interface), Python has something called PyObject and so on.

In most languages this works both ways; there are ways to invoke Python code from C or C++. Often this doesn’t feel natural, or looks ugly; because you are applying abstractions from one language to another, but that can’t be helped.

There are tools and libraries to help make it smoother. Swig is the oldest and most popular one – you write your C or C++ code, add a single file to tell Swig what to expose, and then run Swig as a separate application and it generates interface files in almost any target language you like, with code in the target language that wraps all this nastiness up so that it looks like you are invoking code in the target language; but the code that you are invoking does all this interfacing with the shared C++ code. There are many others that target specific languages. It’s fiddly, but the ability to share functionality across the board is supported by every operating system, and it is crucial.

Why does it even work? At the end of the day, all code that gets executed gets translated to machine instructions. You need a little bit of conventions about how to pass parameters to functions at that level (we will talk about this later, but they exist); but ultimately you tell the CPU “here’s where your next instruction is in memory, do your thing”. The address of the first instruction in a function is called “function pointer” in C, C++ … most other languages.

This is insanely powerful. Not only can you share functionality, but it also lets you establish two-way communication between unrelated pieces of code. Because two pieces of code can pass function pointers to each other dynamically, a pointer is really just a number, the index of a cell in RAM. Then two pieces of code (programs or libraries) that never heard of each other can call each other. This is how events or pushing data works.

Say you have a library A that receives data from the internet, maybe it polls a database every second to check if there is something new there. It then has a function exposed via a shared library interface to anyone who wants to use it, call it “subscribe”. This function takes as a parameter a pointer to another function. We document somewhere what parameters this other function takes, but that is really it. Then you have a program B which wants to receive the information about the new data. That program implements a function “receive” that follows the documentation from A – it takes the prescribed parameters. Then program B loads library A, invokes its function “subscribe”, and passes the pointer to function “receive” to it. When library A sees new data, it invokes this function “receive”. It knows nothing about it, other than it takes certain parameters – doesn’t really care. The function “receive” is implemented by you in program B, again without any knowledge about how library B works.

There is a lot of detail to it; not all languages implement data types the same way, they don’t handle memory the same way and so on, but it can all be dealt with. Generally this is hidden from you by third party libraries that take care of this binary level interfacing, really just to hide the mess.

Another important thing to remember is that you are dealing with function pointers all the time. It matters because they always refer to addresses in the machine memory – this does not work across machines. The simplicity of passing pointers around makes it super efficient, but one machine cannot know what is going on in another machine’s RAM (RAM is a very busy place – hundreds of programs use it at the same time, and it may look very different on different machines even physically – different sizes, different operating systems, different technologies etc). For networks there are equivalent concepts, but we will talk about that later.

Philosophically, while the idea of a function exists in human minds even when writing assembly code or thinking about algorithms, it is very thin – really just a way of thinking about how to split and reuse bits of code; you have to implement this all yourself if you work at that level; the concept of a function is made concrete by Operating System layer, primarily to make this mechanism work.

Many programming languages have their own version of all this too, specially for the code written in that language. They call it different names (modules, libraries, beans … all the same idea but implemented variously, using the language in question features). It is tempting to mix the two, and often you can – sharing a code in a library is often all you need to know. But the distinction is important – dynamically loaded libraries work on the OS level. This is also how C and C++ share binary code, naturally, because all these OSs are implemented in C (or C++). All other code sharing happens at the next level of abstraction – implementation of programming languages; and is specific to the language in question.

On Windows these libraries are called DLLs (Dynamically Loaded Library), on Linux just shared libraries (they come with extension .so). There are standards on each OS about how to load these, how to invoke functions and pass parameters.

Commonly, a program would run with anything between 5 and 50 of these libraries loaded. What often happens is that there are interdependencies between them. As with all code though, there will be many versions of each released. Keeping this all in sync is hard, and when it goes wrong it’s called “DLL hell” (no, really). Mostly when people talk about problems with incompatible versions of software, this is what it’s about. It’s way worse on Windows (hence DLL hell), but really, still quite bad everywhere – Linux does give you more control over it, but it’s still hard. You just gotta be careful all the time; and hope it’s enough, it ain’t always.

They are sometimes a little painful to build and use; because there will always be some details that are specific to abstractions that are lower level than your code; mostly about how the OS you are on implements them. You will end up googling how to do it when you need it. I really like this write up https://caiorss.github.io/C-Cpp-Notes/DLL-Binary-Components-SharedLibraries.html (no idea who wrote it or maintains it, it’s just a very good reference).

Memory

At the machine hardware level, memory normally refers to RAM, and it is fairly simple – an array of cells, addressable via an index (pointer).

Operating systems add a layer of abstraction to it. In usage, it is still the same, and much of it maps to the actual RAM, but the implementation is complicated in order to enable two important features.

Virtual Memory

We mentioned this, here it is in more detail.

RAM is relatively expensive per megabyte. Disk space is far cheaper. Many programs happily fit all their needs in the size that is available in RAM, but not always. Also, run more of them at once and you quickly start running out. This includes the operating system itself – its components are also code, so they also need RAM.

So what operating systems do is assign a file on the disk to be “virtual memory”. Then they map it as an extension of RAM. The entire file is broken into equal sized blocks, called “pages”. As long as your code in an application accesses memory through operating system functions, it has no idea where it actually lands, it just uses some memory space it was given. If it happens to land on the part that is in the virtual memory file on the disk, the operating system stops the execution of your code, then drops something else from RAM if it’s full, puts the page from the file where the required memory block is into RAM, then lets your code run again. Along the way it remaps pointers from your code from the original ones to the new ones, depending on where it placed the page you are looking at.

It helps a lot, almost nothing you use daily would run without it. The trouble is, it’s real slow. Disks are slow compared to RAM, the algorithms to figure out where to fit pages and which ones to drop are slow, copying data from disk to RAM is slow. Computers often have a light that lights up when you are accessing the disk, hard disks make noise – you actually can hear or see this when a computer is struggling; it buzzes, fans switch on too, lights blink. This is why.

This is also why optimising how much memory your code uses is often more important than optimising actual algorithms. A lazy answer is “buy more RAM”; it helps of course, more RAM is the best way to make a computer faster really, but if you are not careful you will run out eventually; also, why should you not help people by saving them a little money with a bit of effort?

This is also why “memory leaks” matter (we’ll talk about those more later). Also why SSD drives make everything better – they are faster than hard disks.

Multi-tasking

When you run multiple programs with a single CPU, what actually happens is that the OS keeps switching between them. Run one a little, then another, then another, back to the first one and so on. The CPU just keeps executing stuff, and the OS keeps swapping the instructions it executes. The thing is, each of these programs also needs to be accessing memory; and it has no idea about all these other programs.

So the OS does this memory mapping for it. As far as your code knows, it just runs happily on some CPU, using a linear block of memory – you don’t write it any differently because other stuff will be running. OS allocates blocks of memory for it in RAM, remaps all the addresses your instructions refer to, and lets it run when it wants.

Then it keeps track of everything and keeps moving things about as needed, using virtual memory if need be. Your code knows nothing about it, CPU knows nothing about it, happy days.

These algorithms are complicated, lots and lots of work was done to improve them over years. Linux is still a lot better at it than Windows, part of the reason why it runs faster.

When there are multiple CPUs, or multiple cores, the same thing happens, only execution of commands can happen in parallel. It’s more complicated, more work for hardware and OS to try and make sure cores get their instructions and data from and to RAM while not stepping on each other’s toes; lots of fancy hardware and algorithms are there to deal with it, but generally you don’t see much of it. OS deals with it to make things faster in a generic way. There are situations where you may, and can, and do, optimise for this – if you know how many cores your target machine has, some server somewhere, you can use this knowledge to parallelise your code very efficiently. E.g. make sure there is not much else running on the machine, then make sure there are exactly as many processes as there are cores, then each core deals with one and no swapping happens. It’s really fine level tuning, and very rare, but it does happen sometimes – importantly, on clouds, kubernetes (or similar) tries to do this for you.

Because of all this mess happening in memory, it is technically possible that one application may write into memory allocated to another. That’s catastrophic, the other application doesn’t know how to interpret the data … generally everything crashes. The OS is in charge of not letting this happen. Linux was always great about it, Windows used to be totally crap about it; improved a lot lately though so you don’t see it often. Mostly if it happens it’s a security weakness and gets used maliciously.

Heap and Stack Memory

These are parts of dealing with memory, but are so important we’ll look into them separately.

Heap and stack are types of data structures, we’ll talk about it more later, but here this refers to two separate use cases of how programs acquire and use memory while they are running. Operating systems implement different ways for dealing with these two use cases and the implementations vary, but they always use stacks for one use case, heaps for the other; hence the names.

When an application is first launched by an OS, it is assigned a certain amount of memory to use. It is not large, I think these days about 1-2Mb. This is where things like local variables, function parameters, global variables etc get stored. Programming languages hide this from you, this is the stuff that happens automatically in your code when you declare a variable or call a function. It is very standard and it is very fast to access it. (we talked about this in Part 1). But it has to always be there for a program to run, so it can’t be very large. You rarely run out of stack space, but it does happen. You can request more of it when you run code, but 99.99% of the time this happens because there is something wrong with your code (almost always you have infinite recursion somewhere, we’ll talk about this when we talk about programming languages).

When a program needs more space, it can ask the OS for more memory to be allocated to it dynamically. This is what happens in Java (or C++) when you use the “new” keyword. What OS does then is go through the memory that is free in RAM, and looks for a block that is large enough to contain the size of memory you asked for and allocates it to your program. It’s yours to use now. Programming languages have various ways of dealing with this, but generally they will then populate this memory with some bit pattern that represents an object of a class, or put some text in it or something. In high level programming languages like Python, or even Java, you don’t really see this, the language layer takes care of it for you. In others (C, C++), you have to do more or less of this yourself.

The thing is, this memory is now allocated to you and nothing else can use it. So When you are done with it, you should release it – tell the OS it’s free to use again. If you don’t, then once you are done with it, it’s just some unused memory that nobody else can access. This is what a memory leak is. As a program runs, there is a good chance that the bit of code that leaks memory will get executed again and again (because most code does), and slowly you start exhausting the supply. In many languages you are completely isolated from this, you literally have to try super hard to create a memory leak; Python is one of those, Java another.

Because of having to handle all this packing, assigning, releasing, moving about, finding the right size chunks … of memory, this is much slower to do than to use stack memory. The most efficient implementation uses a data structure called heap. So people talk about allocating memory on the heap vs stack, using heap memory vs stack etc.

In most cases, it is fast enough anyway, but when you can, you should optimise for this; it does actually help a lot if you pay a little attention to it.

Multi-threading

You can make use of multitasking features of the OS within a single application using threads. OSs provide this functionality, various languages provide different interfaces to it; often it looks just built into the language or a library.

It is a fairly high level abstraction. In general, you write a function and you tell the operating system to start running that function in parallel to the rest of your code. It then does the same trickery as above, swapping execution in or out, assigning threads of execution to different cores if they are available and so on.

The difference is that from within your code, you will want to pass some data to the threads that you spawn, or even get some data back from them. For example, slow running functions triggered by a GUI. You click a button and something happens. That something may be going to a far away database for its data, or may take a long time to execute just because it’s complicated. What you normally do is spawn a thread to run this task and get on with your life, so the GUI is still usable. When the task is finished, just before the thread dies, it somehow needs to tell the thread that is running your GUI that it’s done, results are ready and where they are.

Many ways to do that, many pitfalls (you have to synchronise all this), it’s not really very hard, just a bit boring and you have to be very careful to avoid deadlocks between multiple threads and such. It is considered an advanced programming topic, but not really, just needs some thinking, reading, and practice. There are standardised abstractions, data types, and procedures, a bit of terminology around it too. We’ll deal with that separately somewhere.

Some languages, like Python did for a long time, not sure if it still does, ignore OS threading mechanism and implement their own; but abstractions are all the same, you don’t really know this while coding.

As OSs got better and more features were added, and disks became faster, it became clear that it is often way simpler to just launch another process, run another copy of the program, instead of dealing with all this synchronisation mess – with processes, OS deals with all that for you. Also, if a thread crashes, the whole program crashes; if another program that you launched crashes, yours is still running and can pick up the pieces – much more stability with complex stuff.

Why wouldn’t you do that all the time? It is more costly – a program takes up more space than a single function, somehow you need to pass data between processes (and you either use files, which are slow, or built in OS features that help, but they are not standard across OSs). However – Google tried this with Chrome, and it works really well (this is why when you look in the task manager you see so many instances of Chrome). Ever since then, using multiple processes instead of multiple threads is becoming more and more popular, and that is a good thing. We still need threads sometimes, like when you need 100s of them to parallelise something on a GPU for example, or when your task is simple and easy to set up, but you should always ask yourself if it’s worth using a process instead.

GUI

The idea of windows based user interface came up in the early 70s, in Xerox research labs (so did the mouse). Cute Quora post about it, with a picture too: https://www.quora.com/Who-came-up-with-the-Windows-operating-system-before-Microsoft-Who-invented-it-Did-anyone-else-use-it-before-Microsoft-bought-the-rights-for-it-from-them

Everyone caught on as soon as they could that this is way better than typing weird stuff all the time; and it does of course seem obvious now, but really it was extremely imaginative, and it’s not surprising that it didn’t really occur naturally – someone invented it intentionally. Over years they all caught up with the design, it’s all pretty standard these days. An important distinction with Linux is that the windowing part of the system (desktop manager they call it) is just another flexible component communicating with the rest of the system via a standard protocol of messages and functions (X, or X-windows). In the linux world you can change these very easily and completely change the look and feel of the desktop. There are hundreds if not thousands of options. Apple uses the same technology but doesn’t let you change things, their philosophy throughout is that the beauty of their design is the crucial part of the product (they basically convinced everyone that they know best what beauty means). Windows made a mess of it, partly on purpose, and you can’t ever untangle anything, and you cannot replace anything, so they invented “customization” to market the ability to change the background picture as an invention that will make you happy.

Overall, marketing and efficiency ate deeply into the understanding that we now use ever changing and responsive paintings to make the miraculously engineered hardware, relying on quantum effects, do things only humans could ever do. It is however all still there.

So how does it work?

Firstly, there is a lot of technical design thought put into it. All these systems design basic graphical elements in great detail – window is a rectangle placed at certain coordinates and of certain size, it has a border of a given thickness, title bar of certain size, menu bar, minimise/close/maximise buttons … and so on. Every detail is measured. Then there is a list of “widgets”, similarly specified – buttons, lists, edit boxes, menus, and so on. There are specialised kinds of windows that have a “canvas” , an area where your code can draw its own stuff (anything that doesn’t fit into usual widgets). You can extend this yourself if you really want to; implement your own kinds of windows or widgets (which are technically also treated as windows, just with special behaviour). There are interfaces for drawing things like borders, letters, arrows, icons … should you want to do it yourself.

On top of that is a layer of code that is in charge of actually painting these things, it can draw lines, borders, shadows, letters … basic painting elements for the widgets, the idea being that they all end up with a consistent look.

The interaction with your code is by “events” – a user clicks a button and this fires an “event” into your application, to which you react by executing appropriate piece of code. The idea of event based programming existed before, but this is really where it shines and what made it clear that it is a worthy paradigm.

There are many frameworks that simplify GUI implementation and implement events using function pointers; basically you call a function in the library and pass it a pointer to your function that deals with the event, with maybe some more parameters that basically tell it: “yo, framework, when the user clicks this particular button, you execute this function for me”.

At the lowest level though, the OS implementation works by polling. The OS constantly monitors the keyboard and the mouse, and when something of interest happens (a button click within the area of the screen assigned to a button, for example), it creates small data structures called messages. For example a button click on a button would end up generating something like: LeftButtonDownMessage at coordinates such and such and within widget x, LeftButtonUpMessage at coordinates such and such and within widget x. If the click was fast enough, the OS will figure this out and generate something like LeftButtonClickMessage at coordinates such and such and within widget x. This is all encoded in three or four numbers per message.

These are then placed on a “message queue” – a queue data structure that, erm, contains messages. They sit there until something removes them. There are lots of implementation details – can all programs access the messages in the queue that are meant for some other program? Can they put their own messages on the queue? Stuff like that – in most cases you actually get a lot of freedom with this if you want, let’s you create add-ons and stuff. Rarely done these days, but it was helpful with older systems where lots of features were missing.

With all this in place, the structure of an application that has a GUI is along the lines of:

At startup, tell the OS what to draw for you – windows, menus, buttons, other stuff.
Run a message loop (aka message pump).

Message pump is just a function that loops forever and checks the message at the front of the message queue. If the message is for this application, it removes it from the queue and does what it wants to (“ha, button clicked – calculate something”). It normally then sleeps a little (just to not be too busy), gets the next message and so on forever, until it receives a message that tells it that the “quit” or “close” button was clicked. Then it dies.

For quite a few years now, seeing these in the wilderness is extremely rare. They are always there, but are wrapped into libraries that try to make it easier, or at least deal with this stuff in some boilerplate code. Even with the wrapper libraries, GUIs are hard to write – there is lots of detail that needs to be taken care of, and it needs to be done efficiently. Some tools make it very easy (visual basic style – you design your GUI with drag and drop tools, it puts placeholder functions that handle relevant events in the code for you). They are cool for simple stuff, but at the end of the day, they have to make too many assumptions about the details – they are never pretty enough.

None of the native GUI stuff is portable at all. Libraries that wrap it up (and often enrich it) very often are, though. But always, to make them beautiful, you have to do a whole lot of stuff yourself, and it will be system dependent.

To finish this part – command line interfaces are still very powerful and in many cases can be designed to feel really good. They are much faster and easier to implement, and they are far more portable.

Debugging

Debugging features are always to some extent supported by the operating system. At the very least, debuggers need a way to pause and restart execution, get information about what is running now, what memory it addresses and so on. Debuggers interface with operating systems to do the low level basics. They add a whole lot of prettiness and usability on top.

Other Stuff

There is a whole lot more – access to time, access to internet, audio, keyboards etc. All sorts of tools that are deeply integrated into the OS (task managers, command line shells, desktop interfaces, ability to install new programs, … ).

Over time, for speed and standardisation, some of the OS functionality migrated to hardware; or direct support for some of the features was built into hardware. If you dig into it, some things do get mentioned in hardware documentation that really is OS stuff. But who cares.

Essentially, you can assume that everything you ever interact with, either through code or as a user, is actually the operating system, not the hardware directly. You would very, very, very rarely be wrong.

Part 4: Machines Talk to Each Other

Tata Says