Hacker Superstitions

I’m currently taking a course called “Macbeth from Page to Stage”. We’re discussing superstitions in theatre, and it got me thinking about hacker myths. I wrote this for the assignment, but I thought others might find it interesting.

As a hacker and programmer, I’m part of a culture that is both dismissive of and fascinated by the supernatural. Of course, as people who have to perform extreme rationality and logic in order to be taken seriously, nobody really believes in ghosts or spirits or divination – and yet we have the “demo gods” to whom laptops, chickens, and goats are (figuratively) sacrificed in order to ensure the proper operation of live demonstrations at conferences. We have the custom of conference goons giving shots of vodka (or sometimes whiskey) to first-time conference speakers (and the absolute conviction that those who don’t or can’t drink them will have their demos fail, their projectors turn off, or their audio equipment be full of feedback).

We also have a strange level of personification of the machines and computer programs on which we work – machines are said to be “fighting over” resources, protocol handlers sometimes “get confused” when given incorrect input, and the phrase, “this subroutine’s goal in life is…” is quite common. The personification of the these machines isn’t literal, but it’s not totally figurative either. Sometimes computers do things we just can’t understand.

We call the most skilled hackers “wizards”. Hacking on compilers or writing machine code directly is “deep wizardry”, and doing so maliciously (or sometimes just in a way nobody else can understand) is “black magic”. We also have our koans, little stories about “disciples” becoming “enlightened” or the exploits of the “masters” in the heyday of the AI Lab. Consider this (from the New Hacker’s Dictionary):

A novice was trying to fix a broken Lisp machine by turning the power off and on.
Knight, seeing what the student was doing, spoke sternly: “You cannot fix a machine by just power-cycling it with no understanding of what is going wrong.”
Knight turned the machine off and on.
The machine worked.

Such stories from the MIT AI Lab abound, and are the foundation for much of the folklore of the hacker community.

Finally, some notable hackers who have died are considered to haunt our systems. These legends are then blamed (or applauded) for everyday events. For instance, at DEF CON 24, all the Bally’s Casino ATMs were broken, so someone hung a sign saying “Barnaby Jack Was Here”, Barnaby Jack being a hacker who died under mysterious circumstances a few months after demonstrating a remote, Internet-based attack which could entice an ATM to literally spit out cash onto the floor.

I don’t think I personally believe in these things – that the AI Lab masters were bodhisattvas, that the Demo Gods are watching over us, that my thirty thousand lines of C are alive, or that Barnaby Jack still haunts Vegas ATMs. But I do still participate in the customs, still make the jokes, still run the fortune command on Stallman’s birthday every year. So whether or not a ghost has ever taken over my computer, the supernatural has certainly affected my life.

If you enjoyed this post, you might like some of my other musings on software and retro hardware, or some of my technical content on Rust or Python. Also, please feel free to comment on this post with some of your own thoughts or favorite myths!

Paper Rocket Engines

Why?

I’ve been interested in rocket engines for some time now, especially the design of combustion chambers. I’m no physicist, and I’ve never claimed to be, nor do I have a budget. On the other hand, there’s no way to test out one’s understanding of combustion but to try it – so I decided to build a small rocket engine.

A First Attempt

The easiest way to build an engine is simply to roll up a tube of paper and fill it with butane, and light it. This will produce a little tiny bit of thrust out both ends. Capping one end directs this thrust out the other end, and boom. A tiny, pointless rocket engine.

I built a number of these, with increasing sophistication. I found that, with the addition of two spools from Scotch tape reels to reinforce the ends, I can create a double-walled chamber that won’t burn my hands. Here’s how it works.

Take four index cards and align them. Then roll them up and slot them into the slot that runs around the circumference of the tape roll. Place the other tape roll on top and use Scotch or duct tape to secure them, with a pressure seal! You can blow into either end with one hand over the other to see if the seal is good.

Now cap one end with some layered tape. Duct tape works, but Scotch lets you see the combustion inside. Then take a fifth index card and wrap it around the outside, securing it with tape. The air between the inner and outer wall will absorb radiated heat and protect your hand.

Hold the engine with the open side up and place a lighter upside-down with the head inside the engine. Depress the release valve without igniting and allow butane gas to enter the engine. It’s heavier than air, so it will fall to the bottom.

Now flip the engine so the open end is just a few degrees down from horizontal and hold the lighter two to three centimeters from the opening, and ignite it. Whoosh! Thrust.

I added a conical nozzle to the end, but I made little effort to optimize its geometry; this is a project for the future.

A photo of the bottle connected to the engine.
This simple rocket actually kicks pretty hard when the bottle is filled with butane.
Just the engine, with the bottle detached.
Just the engine. This can be sealed over and fired by itself for a very tiny impulse.
A schematic diagram of the engine, the engine with oxygen and fuel feeds, and the engine with nozzle and bottle.
These are some schematics I sketched while working on this tiny engine. The two-tube-feed idea is a no-go, for reasons that I hope are obvious when one considers the problems with gaseous oxygen being fed through a flammable tube.

A Simple Fuel Tank

In order to produce some real impulse, we need much, much more butane/air mix to feed through our engine. I made a first stab at this problem by taking a simple Pepsi bottle and drilling out the end of the cap. I then removed the seal at the end of the engine and glued the cap into place such that it sealed with the top Scotch tape roll. I could now simply fill the “fuel tank” about half full, screw the engine onto the top, and tip it so that the butane begins to flow. Lighting the engine then resulted in an appreciable impulse – and a very hot Pepsi bottle, because the bulk of the combustion occurred in the “fuel tank”! Not at all what we want.

I’m currently trying to find a good gas pump, or some way to atomize some kind of liquid fuel, which, along with an 80mm fan to feed atmospheric oxygen, will create what is essentially a jet engine – I’ll post about that if I ever get it working.

Session Types

What are session types?

Session types are, essentially, a technique for using a rich type system like that of Rust or OCaml to express semantic meaning and prevent the representation of certain kinds of illegal states, especially with respect to causality.

What is the use-case?

Let’s take the somewhat contrived example of a system representing packaging and shipping boxes. I want to create a Package datastructure, pack data into it, close it (preventing adding data), address it, and then ship it. It makes no sense to send an un-addressed Package, or to insert data into a closed one.

We could represent this like this:

struct Package<T> {
    is_closed: bool,
    is_addressed: bool,
    data: T
}

We would then have lots of runtime typechecking of whether boxes submitted for shipping were addressed and such, which costs time and is error-prone. Wouldn’t it be nice if the compiler could enforce this?

How is it implemented?

The easiest way to implement this is with three different types: an OpenPackage, a ClosedPackage, and an Addressed Package.

struct OpenPackage<T> {
    pub contents: T
}

Crucially, the contents field is pub. You can poke and prod and change that data all you want.

This has a few capabilities: pack, which takes control of whatever is supposed to go in the package and creates an OpenPackage around it, unpack, which destroys the OpenPackage and gives back its contents, and finally close, which converts the OpenPackage to a ClosedPackage.

 

impl <T: Sized> OpenPackage<T> {
    fn new(contents: T) -> Self {
        OpenPackage::<T> {contents: contents}
    }

    fn pack(contents: T) -> Self {
        println!("\tPut some data in a package.");
        Self::new(contents)
    }

    fn unpack(self) -> T {
        println!("\tPackage unpacked.");
        self.contents
    }

    fn close(self) -> ClosedPackage<T> {
        println!("\tPackage closed.");
        ClosedPackage::<T>::new(self.contents)
    }
}

This leads naturally to the ClosedPackage struct:

structClosedPackage<T> {
contents: T
}

Very similar, but without the pub attribute. This means that a ClosedPackage isn’t in danger of having its contents manipulated in any way.

ClosedPackages can be opened again, yielding an OpenPackage, or addressed, creating an AddressedPackage.

impl <T: Sized> ClosedPackage<T> {
    fn new(contents: T) -> Self {
        ClosedPackage::<T> { contents: contents }
    }
    fn open(self) -> OpenPackage<T> {
        println!("\tPackage opened.");
        OpenPackage::<T>::new(self.contents)
    }
    fn address(self, address: String) -> AddressedPackage<T> {
        println!("\tAddressed a closed package.");
        AddressedPackage::new(self.contents, address)
    }
}

Finally, the AddressedPackage struct represents one with a specified destination. I used a String for the address here, but it would be trivial to create a generic version.

struct AddressedPackage<T> {
    contents: T,
    pub address: String
}

To understand the access controls here, just think of a physical package. The address is on the outside; anyone can read it or cross it out with a sharpie. The contents, however, are sealed away.

This struct can be turned back into a ClosedPackage by receiveing it:

impl <T: Sized> AddressedPackage<T> {
    fn new(contents: T, address: String) -> Self {
        AddressedPackage::<T> { contents: contents, address: address }
    }
    fn receive(self) -> ClosedPackage<T> {
        println!("\tPackage recieved.");
        ClosedPackage::<T>::new(self.contents)
    }
}

Finally, I created an example function to “send” the package somewhere.

fn send_package<T: Sized+std::fmt::Display>(package: AddressedPackage<T>) -> Result<String, String> {
    // Save the address.
    let address = package.address.clone();

    // Destroy the package to get at the contents
    let contents = package.receive().open().unpack();
    println!("Destination recieved: {}", contents);
    // Success! Theoretically this function could fail, but not with this implementation.
    Ok(format!("Sent package to {}", address))
}

fn main() {
    // Make a box and then destroy it.
    let contents: String = "Here is some data.".into();
    let package = OpenPackage::pack(contents);
    // The package owns its contents.
    // println!("{}", contents); is invalid.
    println!("{}", package.unpack());

    // Now, make a box, close it, and address it.
    let contents: String = "Here is some MORE data.".into();
    let package = OpenPackage::pack(contents);
    let closed_package = package.close();
    // We now can't unpack the package to get to its contents. This is an error:
    // let contents = closed_package.unpack();
    // because ClosedPackage doesn't have .unpack()
    // Also, package is no longer valid, so no duplication can occur.
    // Finally, we can't send_package() this package; we have to address it.
    let addressed_package = closed_package.address("6902 East Pass, Madison, WI".into());
    // Now we can send the package.
    println!("{:?}", send_package(addressed_package));
}

This ends up printing out:

$ ./session_types
    Put some data in a package.
    Package unpacked.
Here is some data.
    Put some data in a package.
    Package closed.
    Addressed a closed package.
    Package received.
    Package opened.
    Package unpacked.
Destination received: Here is some MORE data.
Ok("Sent package to 6902 East Pass, Madison, WI")

In the real world, the Rust crate hyper makes heavy use of session types to ensure the integrity of HTTP requests and responses.

socketserver: the Python networking module you didn’t know you needed

I occasionally spend time randomly surfing the Python standard library docs; there is a lot of useful functionality included in the language’s standard distribution, such as, for instance, the socketserver module, which I didn’t know about until this evening and which is one of the most useful I’ve seen in a while. As ever, the docs are straightforward in their self-description:

The socketserver module simplifies the task of writing network servers.

This is something of an understatement. To demonstrate this, here is a simple CaaS (capitalization as a service) server written with socketserver and one with socket:

import socket

with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
    s.bind(('', 50006))
    s.listen(1)
    conn, addr = s.accept()
    with conn:
        while True:
            data = conn.recv(1024)
            if not data: break
            conn.sendall(data.upper())

And here is the same functionality with socketserver:

import socketserver

class CaaSHandler(socketserver.StreamRequestHandler):
    def handle(self):
        data = self.rfile.readline()
        self.wfile.write(data.upper())

if __name__ == "__main__":
    server = socketserver.TCPServer(('', 50007), CaaSHandler)
    server.serve_forever()

Both of these take connections synchronously and sequentially, capitalize the data they recieve, and return it. The main difference is that the socketserver version can accept as much data as there is memory, while the socket version can accept only a limited amount (1024 bytes in this example).

This is because socketserver‘s StreamRequestHandler provides the file-like objects rfile and wfile which expose all the normal luxuries of Python 3 files, like readline and read. The parent class of the handler you write will deal with setting the buffer size, looping until a newline or EOF is encountered, and dealing with client-first and server-first protocols. We could just as easily add a welcome message/prompt to the program; just make the CaaSHandler class look like this:

class CaaSHandler(socketserver.StreamRequestHandler):
    def handle(self):
        self.wfile.write(b"Enter some data to be capitalized:\n")
        data = self.rfile.readline()
        self.wfile.write(data.upper())

without any changes to the client’s behavior. Adding that functionality in the socket version is somewhat nontrivial; how, for instance, one would handle both clients that expect to send data first and clients that expect to receive it first is less than obvious.

The second useful facility that socketserver provides is the xxxServer classes. I use TCPServer here, to which I passed a tuple of (hostname, portnumber) and the name of my handler class, CaaSHandler. I could also have used UDPServer for datagrams or UnixStreamServer/UnixDatagramServer for Unix sockets.

The socketserver module also provides mixins for threading and forking servers, which makes writing asynchronous network services much less painful than using socket and threading or even asyncio.

Full Stack By Example 02: Dice… on the Web!

In the last post, we created a simple set of functions that can generate dice rolls with the appropriate probability distribution. Now, we’re going to make it available on the web!

To do this, we’ll need the Flask web framework. Luckily, Python has a great package manager that makes installing things such as Flask really simple:

pip install flask

Depending on your system setup you might need to add sudo in front of this command for it to work.

Now that Flask is installed, open up the file with your dice roller code in it.

As a reminder, it should look like this:

def roll_die(number_of_sides):
    " Simulate rolling a single die with number_of_sides sides"
    import random # We need randint from the random module
    # This is exactly what we were doing by hand before: a random number between 1 and the number of sides
    random_number = random.randint(1, number_of_sides)
    # Give back the number we generated
    return random_number

def roll_dice(number_of_dice, number_of_sides):
    " Return a random number in the same distribution as rolling number_of_dice dice of number_of_sides sides "
    import random # we need randint from the random module
    accumulator = 0 # This variable will "accumulate" a value as the loop runs 
    for roll_number in range(number_of_dice):
        # We don't actually use roll_number for anything, it's just a placeholder
        # You could use it for debugging messages if you want
        accumulator += roll_die(number_of_sides)
    return accumulator

That’s our “business logic” – the part of the program that does actual work for the user. We’ll add a bit of additional code before it to set up the Flask framework (at the top of the file):

from flask import Flask  # Brings in the Flask object from the module we installed
app = Flask(__name__)   # Creates a new Flask app.
app.config['DEBUG'] = True

This imports the Flask object, which is the main way we will interact with the Flask library. Then, we create a new app (called app) and set it to DEBUG mode.

Finally, add the following below the definition of roll_dice:

@app.route("/")
def main():
    return "Result of roll: {}".format(roll_dice(1, 6)) 

if __name__ == "__main__":
    app.run()

This is a little more complicated, so let’s break it down. First, @app.route("/") tells Flask that the following function defines what should be done when someone fetches the index page of the website. Below that is the function definition, which does only one thing: roll some dice using our previously-defined dice rolling function, put the result in a string, and return it. Below that is a bit of “magic” code; __name__ is a variable that Python sets for you so your code can check whether it’s the main program or a library. This if checks if the current script is the main one, and if so, it runs the app.

If you’ve followed along so far, you should be able to run python3 filename.py, where filename is whatever you called the file, and see something a bit like this:

 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
 * Restarting with stat
 * Debugger is active!
 * Debugger pin code: 118-607-219

Your debugger code will be something different. If you go to localhost:5000 in your favorite web browser (Chrome, Firefox, etc), you should see a simple page with “Result of roll:” and then a number. You’ll notice that every time you refresh the page, you get a new dice roll, and in the console window, something like this appears:

127.0.0.1 - - [21/Oct/2016 16:38:01] "GET / HTTP/1.1" 200 -

Let’s analyze this. First is an IP address (127.0.0.1). That tells you who accessed your web server. Then is the date and time, followed by the request they sent: GET / HTTP/1.1. This is in three parts: GET is the “method”, / is the resource being asked for, and HTTP/1.1 is the version of the protocol being used. Finally, the number 200 represents the status code returned by the server. 200 means “OK” – everything went through as expected.

Let’s break it! Try, instead of asking for localhost:5000, typing in localhost:5000/this_page_is_not_real. You’ll get back an error saying 404 Not Found, and in the console you’ll see something like:

127.0.0.1 - - [21/Oct/2016 16:44:00] "GET /this_page_is_not_real HTTP/1.1" 404 -

Most of this is the same as the successful request, but you’ll see that / has been replaced with /this_page_is_not_real, because we requested a different resource, and the 200 OK status has been replaced with 404, which means Resource Not Found.1 We asked for a page that doesn’t exist, and the server told us so.

Congratulations! You’ve just build a web service. Granted, it doesn’t do much, but it works, and acts just like every other web site out there as far as your web browser is concerned. In the next post, you’ll create a form that allows the user to specify what kind and how many dice to roll.

Rewriting tinyhttpd in Rust, Part One

In 1999, J. David Blackstone, or, as he is know online, jdavidb, was taking CSE 4344 (Network Concepts) at UT Arlington. Those were the glory days of Sparc Solaris, and Blackstone wrote, for his college course, a C program called tinyhttpd. It is, essentially, a very short version of the immensely complex programs that seem run the world these days: web servers. Unlike the million-line behemoths (think Apache, nginx, et cetera), tinyhttpd is a HTTP 1.1 web server in 532 lines of well-commented C.

HTTP 1.1 is a ubuqitously supported protocol that is useful for a great many applications, and in this modern era of embedded (a.k.a “Internet of Things”) computing applications, small web servers have never been more important.

This program is also a small, manageable example of a legacy application – an old program written for an obsolete operating system that still gets the job done, but exposes any organization using it to not only the cost of maintaining ancient operating systems and hardware, but also to the risk of the security vulnerabilities present in tinyhttpd itself and the software it needs to run.

 

For the purposes of these posts, I’ll be looking at tinyhttpd from the perspective of a company that uses it internally, and wants to transition to a more modular, portable, and maintainable design, rather than one which either ships it as a product or buys it as a product from another company and wants to replace it; these situations are similar, but have additional challenges.

The first thing to do is to analyze the existing source. I’ve gone ahead and created a GitHub repository to host both the old and new source code, and I’ll link to specific commits in these posts. For instance, here is the commit with nothing but the unmodified source of the legacy app.

The first thing to do is to build the existing app. In order not to clutter the repository with object files, I created a .gitignore file from GitHub’s default C gitignores. Now all I have to do is run make, right?

11:32:58: leo [~/Projects/rtinyhttpd/legacy]
$ make
gcc -W -Wall -lpthread -o httpd httpd.c
/tmp/ccbqEOVd.o: In function `main':
httpd.c:(.text+0x1a85): undefined reference to `pthread_create'
collect2: error: ld returned 1 exit status
Makefile:4: recipe for target 'httpd' failed
make: *** [httpd] Error 1

What’s this, it doesn’t compile? Well, you’ll remember I mentioned it was written for an ancient version of Sparc Solaris – that’s the whole reason we’re rewriting it. Luckily, the original author anticipated this. Looking at legacy/httpd.c (where the error is), I see this comment at the top:

/* This program compiles for Sparc Solaris 2.6.
 * To compile for Linux:
 * 1) Comment out the #include <pthread.h> line.
 * 2) Comment out the line that defines the variable newthread.
 * 3) Comment out the two lines that run pthread_create().
 * 4) Uncomment the line that runs accept_request().
 * 5) Remove -lsocket from the Makefile.
 */

I made a note of this in my analysis folder and made those changes – except that they didn’t apply. The makefile didn’t have -lsocket, and there was only one occurrence of pthread_create. They did make the app build, but it didn’t work!

In order to figure out what’s happening, I looked up pthread_create on man7.org. It’s part of the POSIX threading API, and it is definitely available on Linux. Furthermore, if we look at the main() function, we can see why commenting out those lines caused a problem – it’s an infinite loop that does nothing but accept connections!

while (1)
 {
 client_sock = accept(server_sock, (struct sockaddr *)&client_name, &client_name_len);
 if (client_sock == -1)
 {
 error_die("accept");
 }

 // Commented out in order to build on Linux
 /* if (pthread_create(&newthread , NULL, (void *)accept_request, (void *)&client_sock) != 0)
 {
 perror("pthread_create");
 }
 */
 }

So, we need to get POSIX threads working to make this app run properly. (Note that this problem isn’t an uncommon one when looking at legacy apps; there is often not a good set of build instructions.)

In our case, luckily, this is easy: just revert the commenting and change -lpthread in the Makefile to -pthread, as mentioned on the manual page.

Doing this allows the app to build and run correctly, binding to port 9999. When I open localhost:9999 in my web browser, I get a page back. Success!

Now that we have a compiling and running version of the legacy tinyhttpd, it’s time to go through the source code. Luckily for us, tinyhttpd is entirely contained in a single file. Let’s start off with the top:

/* J. David's webserver */
/* This is a simple webserver.
 * Created November 1999 by J. David Blackstone.
 * CSE 4344 (Network concepts), Prof. Zeigler
 * University of Texas at Arlington
 */
/* This program compiles for Sparc Solaris 2.6.
 * To compile for Linux:
 * 1) Comment out the #include <pthread.h> line.
 * 2) Comment out the line that defines the variable newthread.
 * 3) Comment out the two lines that run pthread_create().
 * 4) Uncomment the line that runs accept_request().
 * 5) Remove -lsocket from the Makefile.
 */

Here is some information which will often be included in legacy programs – some short information about the author and purpose of the program, and some (in this case out of date and inaccurate) information about building and running the program. Removing the misleading lines makes this section a lot more concise and is probably a good idea.

Skipping the #includes, which aren’t very helpful in this case, we find two #define statements:

#define ISspace(x) isspace((int)(x))

#define SERVER_STRING "Server: jdbhttpd/0.1.0\r\n"

The SERVER_STRING definition is pretty straightforward; it’s an identifier of the software, which will be sent to clients. In our version, I would prefer to not include the \r\n terminator in the definition itself. As to the ISspace definition, though, I’m not immediately sure. A quick search of the source shows no definition of a function isspace taking an integer, so it’s probably coming from one of the includes.

If this program had multiple files, I’d search through them next; but, as there are none, I’m going straight to the Internet. Turns out, it does just what you’d expect – it checks if a given integer represents whitespace or not. This definition simply allows calling it directly on char values without writing out an explicit cast every time. I’ve made a note of this in my analysis documents.

After the head macros, we can see explicit definitions of all the functions used in the program.

void accept_request(void *);
void bad_request(int);
void cat(int, FILE *);
void cannot_execute(int);
void error_die(const char *);
void execute_cgi(int, const char *, const char *, const char *);
int get_line(int, char *, int);
void headers(int, const char *);
void not_found(int);
void serve_file(int, const char *);
int startup(u_short *);
void unimplemented(int);

Because they have no comments, these definitions are not particularly useful, so let’s go down to the bottom of the page and look at which functions are called in the program’s entry function, main().

int main(void) {
 int server_sock = -1;
 int client_sock = -1;
 u_short port = 9999;
 struct sockaddr_in client_name;
 socklen_t client_name_len = sizeof(client_name);
 pthread_t newthread;

 signal(SIGPIPE, SIG_IGN);

 server_sock = startup(&port);
 printf("httpd running on port %d\n", port);

 while (1)
 {
   client_sock = accept(server_sock, (struct sockaddr *)&client_name, &client_name_len);
   if (client_sock == -1)
   {
     error_die("accept");
   }

   if (pthread_create(&newthread , NULL, (void *)accept_request, (void *)&client_sock) != 0)
   {
     perror("pthread_create");
   }
 
 }

 close(server_sock);
 return(0);
}

Let’s break this down further. This function takes void, meaning that the program has no arguments or command line options. This probably means it’s not very customizable, something I’d like to change in the rewritten version.

After the function signature come the definitions of some local variables: server_sock and client_sock, port, client_name, client_name_len, and newthreadserver_sock and client_sock are just ints, but they represent file handles, as we’ll see in a moment. port is clearly a port number. client_name is the address of the client, and client_name_len is its length.

Below that, the program uses signal() to ignore SIGPIPE, the signal that programs receive when they write to a file handle which has been closed. It seems to me that this should be handled more appropriately in the rewrite.

Immediately afterward, the server_sock variable is filled by the result of the function startup, which is given a pointer to the port number. This seems odd to me – why does it need a reference and not just the value? – so I look at that function’s definition. It is commented with:

/**********************************************************************/
/* This function starts the process of listening for web connections
 * on a specified port. If the port is 0, then dynamically allocate a
 * port and modify the original port variable to reflect the actual
 * port.
 * Parameters: pointer to variable containing the port to connect on
 * Returns: the socket */
/**********************************************************************/

That makes more sense now – it allows dynamically generating a port number. That’s useful, but the functionality isn’t exposed through the command line interface, which is annoying. In our program, I’d like to expose that, and I’d also like to move away from the C convention of modifying inputs. In the rewrite, I think I’ll return a tuple. Since this is a fairly complex idea, I’ll take this time to write some notes down.

That’s enough to understand a bit more about main. After a simple status message, the program moves on to the main loop:

while (1)
{
  client_sock = accept(server_sock, (struct sockaddr *)&client_name, &client_name_len);
  if (client_sock == -1)
  {
    error_die("accept");
  }

  if (pthread_create(&newthread , NULL, (void *)accept_request, (void *)&client_sock) != 0)
  {
    perror("pthread_create");
  }
}

This is an infinite loop which accepts a connection, as can be seen if we look up accept(), which is where client_sock gets its value. It returns a file handle representing the socket. It returns -1 if it fails for some reason, and the next few lines check for that eventuality. This is another suboptimal design imposed by C’s lack of algebraic data types – in Rust, this idea can be represented with an Option or a Result.

The next few lines try (and handle errors for) spawning a new thread that runs accept_request. Looking at the comments here is not quite as illuminating as one might hope:

/**********************************************************************/
/* A request has caused a call to accept() on the server port to
 * return. Process the request appropriately.
 * Parameters: the socket connected to the client */
/**********************************************************************/

I’m not really sure what processing the request “appropriately” entails. For now, though, it’s enough to know that this is the main function for dealing with incoming requests.

The only code after this is cleanup code we won’t need in the rewrite, so we have enough info to write a short pseudocode summary of the server:

Open and configure a server socket

Until the process is terminated:
     Wait for a client to request a connection
     Try to open a connection with that client
     Open a new thread to deal with that connection's request

That’s a lot simpler than one might have imagined from the length of this post, and it doesn’t tell us much about the actual functionality of the server, but it gives you a good idea of the process one often has to go through to understand legacy code.

Now that we have examined the basic structure of the server’s execution, I’m going to dive into the actual functionality and logic of the server, which is encapsulated primarily in the function accept_request, whose signature is void accept_request(void *arg). This is a signature that is totally unrevealing, and which in Rust would require an unsafe block; this function takes a raw pointer with no type information at all. We’ll have to do quite a bit of work to understand what the function actually does.

First of all, are there any clues about what the argument might represent? Well, we can look back at how the function is called:

pthread_create(&newthread , NULL, (void *)accept_request, (void *)&client_sock)

This is a little complicated, but essentially a new thread is being spawned which will execute accept_request(&client_sock). This is the only place this function is called, so the argument is presumably expected to be only a pointer to an integer file descriptor to a socket – but the compiler knows none of that! That’s a lot of unchecked assumptions and unsafe memory access. Rust, and more importantly the Rust standard library, has better invariant checking, which will make the re-implementation a great deal safer and thus easier to extend.

Moving on to the body of the function, we see the creation of a lot of local variables which I’ll go into as they’re used. It is important to note, though, that there is a group of buffers created with absolute lengths. These appear, at first glance, to be possible introduction points for overflow vulnerabilities – something that is mitigated by the Rust idiom of defaulting to using Vecs instead of arrays.

One of these buffers, of length 1024, is populated using the function get_line, which, according to the comments above its definition, reads a line into a buffer and null-terminates it, with length checking, and returns the number of bytes stored. That buffer is printed and dissected over the course of the next 90 lines or so.

Now that it’s clear how I dissect each line of code, I’m going to move a bit faster, translating the entire program into pseudocode function by function. What we currently have is this:

Open and configure a server socket

Until the process is terminated:
 Wait for a client to request a connection
 Try to open a connection with that client
 Open a new thread to deal with that connection's request

And we’re examining the idea of “dealing with the client”. This is all done in the accept_connection function, whose pseudocode looks a bit like this:

accept_request takes a socket connecting to the client
     read a line from the client
     log (to stdout) the received request
     
     copy what is assumed to be the method into another buffer
     if the method isn't GET or POST:
          handle an unimplemented method somehow
     if the method is POST:
          make a note that this request will require executing a CGI script

     copy what is assumed to be the URL into another buffer
     if the method is GET and the url has a ? in it:
          make a note that this request will require executing a CGI script
     construct the path to the requested resource by prepending "htdocs" to the url
          (note - I'd like to make this customizable in the rewrite)

     if the URL is /:
          add 'index.html' to the path

     if the resource being requested doesn't exist:
          handle a not found error somehow
     if the resource is a directory:
          append "/index.html" to the file path
          (note- the existence of THIS file isn't checked!)
     if the file is executable:
          make a note that this request will require executing a CGI script
     
     if this request requires executing a CGI script:
          handle executing a CGI script
     otherwise:
          handle serving a static file

This analysis is pretty revealing: essentially all this function does is determine some properties of a request and then pass it off to be handled appropriately by other functions.

This particular function should be fairly easy to translate into more efficient Rust code, especially if we look at using Rust’s more advanced type system. In particular, rather than having a large number of buffers, I’d like to use slices and ADTs. For example, I might create an enum HTTPMethod:

enum HTTPMethod {
     Get,
     Post,
     Other
}

Then I could use a match expression to appropriately dispatch the request, whether to the static server, CGI handler, or error response.

In the next post, I’ll take a look at the handler functions and how they handle the various conditions and actions a request can trigger – unimplemented method, resource not found, static file serving, and CGI execution. I’ll also discuss the Rust idioms that can be used to better model the intended behavior and internal structure of this server.

Am I in a Terminal?

Sometimes, it can be useful to know if your program is running in a terminal. Since Python 3.3, this functionality has been available in the os module:

#!/usr/bin/env python3

# Test if this Python script is running in a terminal or not.

import os

try:
    size = os.get_terminal_size()
    print("I am in a terminal of size {}x{}"
        .format(size[0], size[1]))
except OSError:
    print("I am not in a terminal.")

Here is an example of it in operation:

$ ./am_i_term.py 
I am in a terminal of size 80x24

$ ./am_i_term.py | tee
I am not in a terminal.

This is useful for many reasons. For example, scripts which have interactive “beautifications” like progress bars, no-freeze spinners, and animations should cease these antics when piped into the input of other programs or redirected to files. Additionally, programs being run from scripts can disable all performance-impacting interactivity, including interactive KeyboardInterrupt handling; if a user Ctrl+C’s a script, they want it to stop, immediately, not ask to quit.

Learning Japanese the Python Way

Now that I’m in college, I’m taking a lot of non-computer science classes, and one of them is Japanese. I’m just starting out, and I need to be able to rapidly read numbers in Japanese and think about them without translating them consciously. I could make a bunch of flash cards, or use a service like Quizlet… or I could write some Python!

For those of you who are unfamiliar, Japanese doesn’t have the ridiculous numerical system that English does. One through ten are defined, and eleven is simply (ten)(one). Twenty three, for example, is (two)(ten)(three) (に じゅう さん). This means that rather than having a long list of numbers and special cases, I can just have the numbers zero to ten “hard coded”.

After that, the program is pretty simple: if the number is less than 11, simply look it up. If it’s more than 11 but less than 20, build it with じゅう plus the second digit. If it’s larger than 20, build it with the first digit plus じゅう plus the second digit.

The interactive part is pretty simple too: it runs a loop that randomly generates numbers, checking that they haven’t been done before, translates them, and asks me to translate them back. If I succeed, it moves on; if not, it doesn’t record the number as having been completed, so I have to do it again at some point in the same run.

This simple program came out to 136 lines of very verbose and error-checked Python. It’s a good piece of code for a beginner to try and modify – for example, can you get it to incorporate the alternate form of four (し) as well as the primary form? Can you make one that teaches Kanji numbers? (I plan to do both of those things at some point.)

Why Linux on the PC Needs a Focus on Hardware Support

A few days ago, I had an interesting and somewhat frustrating experience with a friend of mine. Their laptop was dying, so they asked me to give them some suggestions for a new one.

Their requirements were a computer with a display that was good for reading, enough power to be responsive and able to multitask well, and rapidly accessible storage, but not necessarily a lot of it. Of course, I immediately thought of the System76 Lemur, which happened to be on sale at the time; however, after going through a whole list of pros and cons of Ubuntu with the friend, they told me that they wanted to go with Windows or Mac OS X as they “didn’t have time to tinker”.

They’re right, of course, but this really got under my skin. The thing is, if you’re not trying to game on Linux, there are absolutely no problems using it on a desktop in almost all cases. Desktop hardware is nicely standardized and “just works”. But on a laptop? Nope. It’s a crapshoot as to whether your wireless Internet and Bluetooth will work, or if your touchpad’s multi-touch will be usable. On my Dell Inspiron 7000-series laptop, which works almost flawlessly with Ubuntu GNOME, the wireless chipset will occasionally forget that any but a single network, to which I’m not connected, exists.

Why is this? Well, it’s because laptops are often very, very custom. They have custom form factor motherboards with non-standard sets of features. Battery life is often the primary concern rather than compliance, and release cycles are very tight, so if a new hardware system is developed, drivers are produced for the target platform (almost always Windows) and released there. The Linux community has to hack together our own after the fact.

Windows runs on most laptops, and has a lot of big issues. Privacy concerns, resource overutilization, extremely poor real-time performance, and a massive lack of customization are the obvious ones, along with a downright byzantine user interface without much power to back it up (in the consumer versions, that is). Mac OS X looks simple on the surface while still exposing the massive power of a UNIX to its power users and developers (although this is becoming progressively less true). On the other hand, it costs an absolute fortune to buy into that ecosystem, and that is where Linux comes in.

In reality, Linux is a modern UNIX like Mac OS X, and it is far more flexible and powerful, but to many people, it’s just “Windows that costs less”. What we need to be is a Mac OS X that can run anywhere. Linux needs to be simple on the surface, which most DEs accomplish brilliantly, while exposing the power of the underlying OS, which isn’t hard given a terminal emulator. Where Linux falls short is the “runs anywhere” part.

Porting Deucalion to Rust

A few months ago, I made a proof-of-concept for an RPG engine based on SFML and the Tiled map editor, called Deucalion. Over time, the source code became unwieldy, leaked a great deal of memory, and was nearly impossible to build. I ended up spending more time configuring build systems than actually working on the code, and I abandoned it in favor of SBrain and schoolwork.

Recently, though, the Rust game development story has gotten a lot better, and I’ve gotten a bit of free time. With the help of a friend of mine, Dan Janes, I’ve been porting the existing code to Rust and refining the design for the game-dev-facing API. It’s been interesting, since it’s my first time running a project on which I am not the sole contributor.

I’ve certainly run into some problems because of the relative immaturity of the Rust ecosystem – for example, many projects don’t use the standard Error trait, which makes using the handy macros that rely on it such as try! nearly impossible, but I’ve also found that as a whole, the community is very responsive to having these issues pointed out and solved.

Deucalion isn’t quite at the level it was before I decided to port it – I’m still struggling to get tilemaps to draw with decent performance, and a lot of design work needs to be done – but it’s doing better than I thought it would, and I’ve discovered some of the best features of Rust so far.

For example, while Rust doesn’t have exceptions (because exception handling requires a heavyweight runtime), the convention of returning Result<T, Error> from functions that might fail allows programs to act as if it did. Deucalion implements a single, shared error struct DeucalionError that encapsulates every possible error type (Currently IoError, LuaError, TiledError, NotImplementedError, and Other), allowing callers of risky functions to act according to the actual failure that occurred.

I also like the module system much more than I thought I would at first. While learning when and where to use mod vs use can be a bit of a hassle, the fact that multiple includes create an automatic compiler error is very welcome when compared with C++.

Rust is a great language, and its ecosystem is on its way to becoming as good as that of Python or Ruby. I’m excited for every step along the way.