-->

ap/xxxxx

The Executable's Song

Computation is a game of life and death, and it's not just about fatal software bugs. Martin Howse examines how free software enables us to trace the nature and culture of the executable as it impacts on the physical.

________

Under a proprietary model, the executable is all, bypassing all that is cultural or textual, and ignorant of any code above buried instruction sets. With a transparency plagued only by always impending crash and implied viral burn, the icon is the action; a conditioned reflex with GUI designers very much as Pavlov's eager pupils. It's all a game of click and run, or, with the cynical humour of small-time peripheral pedlar's, plug and pray. Just remember to keep your data safe as the creaking ship of an insecure OS tosses and heaves on an ever angry networked ocean.

Yet back in the day Unix users could well be regarded as furnished with a tougher deal, given the plain sailing on the opaque waters of closed code before the Net storm, the revenge of a bullying binary become promiscuous. Unix users, and by descent, GNU/Linux users are well famed for bare-backed metal riding, herding naked process through an intricate codescape. We can choose to kill processes, track down zombies and fork and die in a realm of execution under which parent processes outlive their children. The parent process is a spirit medium awaiting signals from its offspring from the other side. And readily available free source code is riddled with such references which could well rival the lyrics of many a death metaller. Open source as open history and open discourse allows us to trace this stench of death within computation, to ask why giving the breath of life to a new process is termed execution and how the already dead refuse to be killed. As programmers, with a few conditionals we can readily question the Shakespearean terms of sleep, or even usleep, with C code sleep() function specified in seconds. Echoing the title of many a B grade horror film, we can choose to scream and die, or rather simply print an error message before crashing out. And crash itself, explored in issue fifty, remains with achingly obvious impact on computation.

The Game of Life

The list of these terms, however metaphorical, could well be extended within living code, contemporary free computation, whilst at the same time we can look back over the history and culture of computing to view how such references perhaps embed earlier involvement with a very deadly game; cybernetics and computation viewed primarily as a theory and technology of war. It's no coincidence that the very first processes modelled by computers were both those of ballistic trajectory and atomic events, in the latter instance leading towards the design of the hydrogen bomb. To compress a story which could occupy many pages, these twin technologies found later application within a cold war arms race which was equally well simulated as computational process, with John Von Neumann's Monte Carlo methodology, spurred on by his interest in games theory, as ultimate model within both technology and politics; the Prisoner's Dilemma as hardcore coding exercise for a technology and society of destruction. Yet what of another programming game so beloved of legendary hackers such as Richard Gosper, Richard Stallman's mentor at MIT in the 60s; the Game of Life? Another history could well be written in this light, perhaps with mention of computing pioneer Konrad Zuse, who makes plain that his first prototype programmable computer, designed and built before all others, had nothing to do with German wartime efforts.

At the heart of the culture of free software lies a playfulness which is far distant from the death drive of Von Neumann, the model for Stanley Kubrick's crazed Dr Strangelove. And it's worth remembering that the free rolling ITS (Incompatible Timeshare System) which lies at the roots of the GNU system attempted to prevent malicious system crashes by simply making it all too easy, taking all the fun out of the enterprise with a basic kill system command.

At the same time, Alan Turing's Bletchley Park decryption efforts, practical applications of the Turing machine so relevant to any theory of computation, were rapidly exploited for cold war use. And it's worth remembering that Von Neumann leaves his legacy, the architecture named after him, within almost all contemporary processors or CPUs. Under such a system, data and software are treated as equivalent, stored in the same memory, with other means employed to distinguish matters. This design opens coding up to the spiralling domain of self-modification and the exhilarating land of the metacircular. Programs can write programs and modify their own processes, their inner workings in unheard of manners. Yet the price paid for the self coding of automata, for the unification of data with code, is crash. All living programs are mortal and in some way Von Neumann's design says something very interesting about life within computation. After all Von Neumann was famously interested in self replicating automata, and with data or the description of possible offspring, and code or process, the physical machine itself, on an equal footing, his architecture readily embeds self-reproduction. It's a necessarily viral architecture with replication abounding.

The politics of process

Yet, reducing this deadly crash potential as symptom of a flattening of privilege means introducing hierarchy into the equation. Segmentation, segregation, and a division into kernel and user modes mark splits which render the process political at a number of levels. The division between so called real and protected mode embedded within all x86 compatible CPUs exposes the degree of control which is afforded hardware vendors. The hierarchy of user and kernel mode heralds a rough division between user, or rather application programmer, and kernel coder. The kernel hacker dictates how less enlightened app monkeys code GUIs which users are then forced to adapt to. It's a loose chain of command, which, travelling up from hardware, through the layers of abstraction within software, arrives at the interface; a return journey which process undertakes and which can well be traced. At the same time a similar route is ventured at boot time; a trip from bare hardware, through BIOS in real mode, straight to protected mode as the OS bootstraps the commandline interface into existence.

Already, there's a good deal on offer here, material ripe for analysis and using a free software OS implementation, with ready source code access under grep and friends, and with open tools such as the GNU debugger GDB and strace system call tracing tool we can readily ask essential questions, interrogating the political nature of computational process. With /usr/src/linux as our starting point we can well illuminate computation through analysis of what happens at the micro and macro levels; the magical transformation from code to action which process marks in a journey from user to kernel space and back again, passing through the analogue barrier of noise at the electrical level of the CPU. With reference to the parallel journey at boot time into protected mode, we can grep through code under /usr/src/linux/arch/i386/boot to find exactly when the transition is made.

Fork and die

Yet before we're ready to walk the code, or strace the process, an enlightening affair both for debugging and performance tuning, it's worth asking what process encompasses under computation, and what exactly constitutes a process within the specific terms of a Unix-like OS. In the first instance the transition from code to execution, from command to result, marks the process. Many writers have also described computational process with reference to magic; the process as an invisible spirit conjured up by a sorcerer's spells, or programs, yet still obtaining quite physical effects in the real world.

Process is the only entry point into crash and the virus. At the same time computers simulate and model processes from the real world, and the relation between the two marks process as interface. Drilling down to a more specific context, and sticking with the commandline, process and program, or executable become separated. Under the terms of a Unix operating system, a program is run, that is to say executed, within a process. Process is that which renders the executable, a bland ELF (Executable and Linking Format) file with additional libraries, runnable, passing code to the kernel by way of the ubiquitous interface of the system call which functions as a speakeasy style hole in the wall between user and kernel spaces.

The process keeps track of the state of the running program. GNU Linux users will no doubt be familiar with the common ps command which details a vast array of such information regarding processes. All that runs, and in some instances, as we'll see in the case of zombies, which doesn't run, is process, assigned a unique ID by the kernel. But how do such magical processes come into being, if we need a process to execute and manage code, and where does this process come from? In Unix there is but one parent process, init, whose invocation we can again trace to source outlining the boot code. Init is the ancestor, or parent, of all processes, which are known as child processes. Init has a process identifier, a PID, of one. It's our original unit, from which the shell is spawned. And the shell, from which we issue commands, can further deliver offspring which are themselves capable of birthing processes. Such a spawning is known as forking, under which the existing process is duplicated with the child receiving new parameters. The parent is responsible for the child process, hence the common Unix geek joke that "children are forked by their parents who then wait for them to die..."

Process creation and termination can be readily tracked down to fork.c and exit.c under the main kernel directory of /usr/src/linux which well provide clues to process management under GNU Linux. The attributes of a process are crammed into a data structure called task_struct which allows for scheduling (see sched.c) and memory management, enabling switching between processes without losing their state and context. After all it's worth remembering that one main role of a multitasking kernel is to control when the process has access to the bare metal of the CPU. The kernel dictates the physicality of execution, acting as key interface between code and the real world. Without OS the executable is just junked data.

Yet forking doesn't quite take us the full distance into the land of execution. Forking means that we have two copies of the same program running. One of them usually executes the binary making use of one of the exec() family of system calls which will replace the process with another process. The exec() system call locates the binary image of the executable, loads it, and runs it, taking care of binary formats and shared libraries.

Happy families

The great family tree of init can be examined on any live system by issuing the simple pstree command. On the author's system we see init on the far left, with multiple descendants, such as the various system daemons which kick off at boot time. We can follow the branch which marks our current process from init, through login, Bash, startx, xinit, X into window manager Ion and thence from xterm, and Bash into GNU Emacs.

Another view of process is offered using the strace command, making use of the ptrace system call which can well be used to investigate all aspects of the running system, followed by the name of the process we wish to trace. We can use the -f switch, amongst a roster of other options, to examine children forked from our process. As well as showing off the wonderfully interrogative nature of a free software environment, such examples also expose the course interactions of the shell, operating system, libraries, abstraction layers and the binary. We can learn a lot from such code and tools, though unfortunately, as the relevant man page remarks, tracing good old grandfather init is forbidden. The same manual also makes explicit that what we are viewing is the flow of data between user and kernel space, with the action very much on the latter side. The interface is the system call, with examples such as open, read and write omnipresent given the ubiquitous file mentality of Unix.

And what of zombies, those uncared for, unkillable children of the process world marked with a disparaging Z in the table of processes returned by ps? As always irresponsible parents are very much to blame and, behind them, lazy coders once again. Under Unix law parent processors must hang around for their children to throw them, by way of the OS, the exit() system call, signalling their departure. Yet if for whatever reason this call never reaches the parent, or the parent refuses to reap the zombie with earlier use of wait(), then we have a zombie. And if the parent has departed before her child exits, the zombie is passed up the tree of process, eventually to reach init, the parent who is always waiting; happy families indeed.

Reality bites

Yet though we have traced process and examined basic process management at the level of the operating system, it's very much a case of peeling away one complex layer to reveal another abstraction. Computation can be viewed as obscuration (see OS Nature). The magic of the executable still remains as long as we remain solely with software, attending little to an entry into the processor or CPU itself obscured by inscribed software, or microcode, and the obfuscations of pipelining, cache and pipeline. With process passing on the baton of system call, to the kernel which throws a return value back, we're no nearer the heart of execution. And with a reported 719,000 CPU clock cycles required simply to start a process under GNU/Linux it's looking like we're entering distinctly complex terrain, which it is the very job of the OS to mask. Strace will only reveal the blackboxed encounter with the kernel.

Yet free software entices this entry into the microscopic, which with both GDB, a common crash investigation tool, and the ptrace system call in hand we can well undertake, with both tools allowing us to dig deep into registers and the like at a very low level.

With overarching view of Von Neumann inspired architecture in place we can begin to descend into the dark heart of the binary, proceeding as we've seen down from process fork and the OS interface of the system call, with in most instances exec or rather execve, as can be gleaned from the first line of say strace ls, kicking off the action. At this point we need to consider what exactly constitutes an executable and how we arrive at such a beast. Rather than clouding the issue with reference to interpreters, themselves enchained by the binary, we'll stick with common or garden compilation under C. The sheer size of a executable for a simple hello world implementation provides us with a clue that there's a good deal going on here, and coders attempts to hack by hand the size of a basic binary down to its very core prove highly illuminating as to content and organisation of the ELF format file. With this view of the executable file, called by execve, in place we can jump right into the processor.

In contrast to strace, GDB, used in a manner which could well be described as reverse engineering and well entering into that bit obsessed domain, can be coaxed into providing us with a vast level of process and processor detail. We can set all manner of breakpoints to surrender execution into the textual hands of the debugger. Once arrested it's possible to interrogate the executable, walking through the code step by step. We can view running code as C source, or disassemble into assembly mnemonics. From here it's possible to dump portions of memory and investigate CPU registers. We're more or less at the bare metal, of operations carried out within the minute internal architecture of the CPU.

Yet it's a rather static view of execution; a portrait of execution on ice which could well be spliced frame by frame and then projected as moving image to reconstruct a complex microcosm and interplay, a tiered almost holographic affair. How to grasp or even represent after the fact such a fine and complex hierarchy of detail in constant motion with OS further shuffling the CPU deck as it juggles processes? The ptrace process tracing system call, central to GDB itself, with some wrapper code allows us to take things one dynamic and reflective step further.

Ptraced

Ptrace allows one process to spy on and even control the execution of another process, going so far as to change the image of another process, access private data structures, play with the core CPU registers or even kill the process outright. It's the ultimate interrogative tool which well exposes the dynamic nature of hierarchical process, deeply entwined with the kernel and impacting on a supremely low level architecture. Ptrace is a concrete tool which can well be used for artistic exploration of what could readily be termed OS nature; a joy in the simple terms of expression which well suits the nature and philosophy of Unix. It's a radical exposure of how an OS builds on the low level givens of technology; a natural landscape able to be interpreted and reframed. Ptrace can be used in highly imaginative manner, for example plotting the changing CPU registers which a given process uses.

We can ptrace a process forked from our own code, or attach to an already existing process, stepping through the code one bare instruction at a time by passing PTRACE_SINGLESTEP to the ptrace() call. Yet there are restrictions on ptrace use. As we can imagine it's forbidden to trace or control the veteran init process and no process can control itself; simply fork a copy for the purposes of rather vertigo inducing self-examination.

Instruction set

Yet we're still sitting very much on one side of the thorny fence of execution, still enjoying our stay in the land of human readable code and manpaged tools and system calls. We've yet to stray totally into the bland terrain of the machinic. With ptrace() and friends allowing for an overmapped view of registers and memory, rendering dynamically visible the architecture of an elaborated Turing machine, we can begin to see how the crossover into the physical, the central point of a journey which returns to the interface and which could well be termed as interface, is very much a question of rendering the executable into its most basic components. The simplicity of a Turing machine and that of a contemporary CPU, though masked by talk of cache, pipelining and other optimisations, are parallel. Under the hegemony of the GUI it's easy to forget that it's all about bits; decoding and encoding, descending and ascending through hardware and software abstractions. The compiled executable under process and ptrace() can be pinned down to a single instruction, a pattern of ones and zeros travelling at high speed on microscopic parallel wires into the CPU, a realm of heat subject to the tyrannies of noise. The processor decodes instructions, in some instances into microcode which is further broken down into a base instruction set, transmitting the binary encoded results to the relevant embedded electronic execution unit; tiny purpose built devices responsible for a specific function. This is where we really hit the metal. Multiplexing and demultiplexing are the order of the day, with bits on the wire expressing a vast array of possibilities.

Binary becomes a living code through a five volt injection into the chipset. Machine code is transformed into lines of power with select mechanisms removed from meaning as code is detached from the equation. All of computation is possible thanks to the binary modulation of the physical, in this case electricity. On or off. Killing the power destroys or edifice as under the white interrogative light of free software we view the executable in parallel as binary and as living code within a rich culture of computation.

OS nature

Under a proprietary OS the system is quite obviously obscured, conditioning the user under a supposedly natural interface; the programmer as god hiding behind veiled error messages masking a very human failure. The error message is considered as a natural sign, a true symptom with some link to an underlying natural cause. A psychosomatic response issued by a body of code, a componentless purely binary body. It appears to the casual user as set in stone, filled with dark intent as it seemingly hails from the other side of crash land. After all how can a crashing system report its own death?

Yet, its arbitrary nature is well belied by free software with apparent code, searchable as to exact message and cause. The decisions of a now humbled programmer or collective are exposed alongside colloquial comments. Rather than viewing a message as totally meaningful, as imbued with an oracular aura, free software reveals the truly error prone and noisy realm of analogue code. Coders write error messages, not computers, and programmers can always make mistakes. Error codes need make little sense, with lazy grasp-all cases embracing all manner of ills under the same banal message.

Exposure is a key concept for both free software and what could be referred to as code art, rendering a geological aesthetic, which may be contrasted with the art of the landscape, along the rich seam of executable or runnable strata meeting the readable and writable. Technique and technology are revealed neither as ends in themselves nor as tools for the production of stock works but rather as an explosive and extremely gratifying self exposure of ideas. In the words of Wildean programmer, Alan J. Perlis, prefacing one of the greatest computer science works, Structure and Interpretation of Computer Programs, "The source of the exhilaration associated with computer programming is the continual unfolding within the mind and on the computer of mechanisms expressed as programs and the explosion of perception they generate." Computers model process, execution is process and free software exposes all the processes, social, cultural, economic and otherwise which wrap up code.

key links

Von Neumann: http://www.redfish.com/dkunkle/vonNeumann

Prisoner's Dilemma: http://www.prisoners-dilemma.com

ITS and GNU: http://www.gnu.org/philosophy/stallman-kth.html

GDB: http://www.gnu.org/software/gdb/gdb.html

GDB tutorial: http://www.unknownroad.com/rtfm/gdbtut/gdbtoc.html

strace: http://www.liacs.nl/~wichert/strace/

ptrace: http://www.linuxjournal.com/article/6100

ELF and tiny programs: http://http://www.ubergeek.org/~breadbox/software/tiny/home.html

ptrace example: http://www.1010.co.uk/self2.tar.gz

Structure and Interpretation of Computer Programs: http://mitpress.mit.edu/sicp/full-text/book/book.html

Updated: 2006-09-08

index xxxxx