ReadyExec

Introduction

The Problem

With the rise of large-language scripting in languages such as Python and Perl, we begin writing such programs to execute in a variety of situations. One of these situations is the replacement of many shell-style programs, which are executed repeatedly, but have low cost, since the underlying programs (grep, cut, etc.) are able to be compiled down to a very efficient level, namely machine code.

However, once we start writing these tiny programs in non-natively-compiled languages, and instead rely on interpreters, we find that the cost of running these 'small' programs repeatedly adds up. A constant small cost incurred once is hardly anything; a constant small cost n times can start to add up. One popular place this hit evidents itself is when Python or Perl programs are used in combination with systems such as procmail.

Now, it's generally not merely the fact that the Python/Perl program being executed is written in these languages; in fact, the interpreters themselves tend to have good startup and run-time compile timings. Also, the execution time tends to be good.

The killer, however, is when your 'small' script loads in 10 other modules because of the great functionality they offer. These modules have to be re-compiled/interpreted each time your program is run. This gets costly, and incurs a lot of I/O (expensive).

One way to avoid this problem, then, is to avoid using third-party modules, and writing the minimum functionality you need yourself. However, we're then destroying good reusability. Software quality goes down when each person is re-implementing a RFC 822 module or a GnuPG interface.

Another possibility is to write a client-server protocol. A thin client, likely written in C, makes requests to a heavy daemon process that does all the thinking. However, who wants to write another C client? We're trying to avoid C as it is. And designing yet another protocol? Ick.

So what to do? We'd like to use third-party modules, but their load-time is so expensive when repeatedly incurred. And we don't want to have to re-create a client-server protocol.

Enter ReadyExec.

A Solution - ReadyExec

ReadyExec is a client-server framework, but not like most query/retrieve systems, where the application code is the one implementing queries and requests to a server. No, in ReadyExec, your scripting code which was quite expensive to startup multiple times works within the ReadyExec framework without any special code or knowledge of ReadyExec.

To make this a little easier to understand, let's walk through an example of ReadyExec being used. Let's assume that your application code can be executed by the function (in Python lingo) 'foo.bar'. (foo is a module, bar a function within that module).

What ReadyExec does is startup a code-serving daemon which is told two things:

a function to execute that runs the application code, in our case, foo.bar.
a user-specified unix-domain socket to listen on (e.g., /tmp/foo.bar)

So the ReadyExec daemon starts up and begins listening. And waits. It waits for...

The client program of ReadyExec, readyexec, is the 'pipe', or 'conduit' through which a parent process can 'execute' foo.bar. readyexec is called with one minimal argument for itself along with many others as foo.bar wants in its sys.argv. To be clear, the arguments given to readyexec are the following:

the location of the ReadyExec daemon socket which it should talk to (in our case, /tmp/foo.bar
any number of 'command-line' arguments for foo.bar, as if it was being 'executed' on the command line.

Back on the server end, the daemon receives the request from readyexec, and will soon start executing the foo.bar function. But before it does so, here is where ReadyExec comes in handy: it connects the stdin/stdout/stderr of readyexec to the stdin/stdout/stderr of foo.bar, and ensures that when foo.bar is run, it sees the same arguments that were passed the readyexec command, as if it were being called from the command line directly.

In doing this, foo.bar can read the stdin that readyexec reads in, and when it writes to stdout or stderr, readyexec writes these out in turn.

Furthermore, whatever exit code foo.bar exits with, readyexec will exit with the same code.

Thus, we have created a system where foo.bar code is loaded once, but able to be run multiple times, as transparently to itself as if it were being run from the command line.

How to setup your code for use with ReadyExec

It is very easy to get your application code working with ReadyExec. ReadyExec currently only supports Python, so we'll deal with that.

First, let's understand the limitation of ReadyExec, however. ReadyExec works when your application gets its input from sys.argv and the sys.stdin file, and gives output to sys.stdout, sys.stderr.

Let's say that your script is called blarg. blarg is your Python script that reads in standard input and writes to standard output the number of lines it came across (you are re-implementing wc -l). What you need to do is move the blarg code into a function in a module, and place that module where Python can find it. The function should not require any parameters; it should just start executing your blarg code. Let's say that you put this function into module 'foo'.

Next, execute readyexecd.py /tmp/blarg foo.blarg. It will sit there waiting for connections to /tmp/blarg Now, instead of executing your script blarg from the command line, execute readyexec /tmp/blarg. Your foo.blarg code in the server will start executing, reading in from the standard input of readyexec, and when finished, will write the number of lines it came across, as if readyexec was blarg.

Why should I use ReadyExec?

ReadyExec will allow you to write high-startup-costing programs that need only be loaded once, without having to re-implement your own client-server mechanism. ReadyExec is completely neutral to the what it 'conduits'; it tries to be as transparent as possible for both the end developer making use of it, and the end user executing code via it.