Do you know how much your computer can do in a second?

Let's find out how well you know computers! All of these programs have a variable NUMBER in them. Your mission: guess how big NUMBER needs to get before the program takes 1 second to run.

You don't need to guess exactly: they're all between 1 and a billion. Just try to guess the right order of magnitude! A few notes:

Good luck! We were surprised by a lot of these. We'll be anonymously collecting your answers, so expect some graphs in the future! =D

Made for you by Julia Evans (@b0rk) and Kamal Marhubi (@kamalmarhubi).
Score: 0 / 0 Remaining: 18 About this game

Welcome to the first program! This one is just to get you on your feet: how many loops can you go through in a second? (it might be more than you think!)

sum.c

Guess: iterations in one second
#include <stdlib.h>

// Number to guess: How many iterations of
// this loop can we go through in a second?

int main(int argc, char **argv) {
    int NUMBER, i, s;
    NUMBER = atoi(argv[1]);

    for (s = i = 0; i < NUMBER; ++i) {
        s += 1;
    }

    return 0;
}

loop.py

Guess: iterations in one second
#!/usr/bin/env python

# Number to guess: How many iterations of an
# empty loop can we go through in a second?

def f(NUMBER):
    for _ in xrange(NUMBER):
        pass

import sys
f(int(sys.argv[1]))

Now that we know about the most we can expect from Python (100 million things/s), let's explore a slightly more realistic use case. Dictionaries are used just about everywhere in Python, so how many elements can we add to a dictionary in Python in a second?

Once you've gotten that one, let's look at a more complicated operation -- using Python's built-in HTTP request parser to parse a request.

dict.py

Guess: iterations in one second
#!/usr/bin/env python

# Number to guess: How many entries can
# we add to a dictionary in a second?

# Note: we take `i % 1000` to control
# the size of the dictionary

def f(NUMBER):
    d = {}
    for i in xrange(NUMBER):
        d[i % 1000] = i

import sys
f(int(sys.argv[1]))

parse_http_request.py

Guess: HTTP requests parsed in one second
#!/usr/bin/env python

# Number to guess: How many HTTP requests
# can we parse in a second?

from BaseHTTPServer import BaseHTTPRequestHandler
from StringIO import StringIO

class HTTPRequest(BaseHTTPRequestHandler):
    def __init__(self, request_text):
        self.rfile = StringIO(request_text)
        self.raw_requestline = self.rfile.readline()
        self.error_code = self.error_message = None
        self.parse_request()

    def send_error(self, code, message):
        self.error_code = code
        self.error_message = message

request_text = """GET / HTTP/1.1
Host: localhost:8001
Connection: keep-alive
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 Safari/537.36
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-GB,en-US;q=0.8,en;q=0.6
"""

def f(NUMBER):
    for _ in range(NUMBER):
        HTTPRequest(request_text)

import sys
f(int(sys.argv[1]))

Next up, we have downloading a webpage vs running a Python script! Hint: these are both less than 100 million :)

download_webpage.py

Guess: HTTP requests completed in one second
#!/usr/bin/env python

# Number to guess: How many times can we
# download google.com in a second?

from urllib2 import urlopen


def f(NUMBER):
    for _ in xrange(NUMBER):
        r = urlopen("http://google.com")
        r.read()

import sys
f(int(sys.argv[1]))

run_python.sh

Guess: iterations in one second
#!/bin/bash

# Number to guess: How many times can we start
# the Python interpreter in a second?

NUMBER=$1

for i in $(seq $NUMBER); do
    python -c '';
done

How many bytes can you write to disk in a second? We all know writing to memory is faster, but how *much* faster? This code was run on a computer with an SSD

write_to_disk.py

Guess: bytes written in one second
#!/usr/bin/env python

# Number to guess: How many bytes can we write
# to an output file in a second?
# Note: we make sure everything is sync'd to disk
# before exiting :)
import tempfile
import os

CHUNK_SIZE = 1000000
s = "a" * CHUNK_SIZE

def cleanup(f, name):
    f.flush()
    os.fsync(f.fileno())
    f.close()
    try:
        os.remove(name)
    except:
        pass

def f(NUMBER):
    name = './out'
    f = open(name, 'w')
    bytes_written = 0
    while bytes_written < NUMBER:
        f.write(s)
        bytes_written += CHUNK_SIZE
    cleanup(f, name)

import sys
f(int(sys.argv[1]))

write_to_memory.py

Guess: bytes written in one second
#!/usr/bin/env python

# Number to guess: How many bytes can we write
# to a string in memory in a second?

import cStringIO

CHUNK_SIZE = 1000000
s = "a" * CHUNK_SIZE

def f(NUMBER):
    output = cStringIO.StringIO()
    bytes_written = 0
    while bytes_written < NUMBER:
        output.write(s)
        bytes_written += CHUNK_SIZE

import sys
f(int(sys.argv[1]))

File time! Sometimes I run a huge grep and it takes FOREVER. How many bytes can grep search in a second?

Note when doing this one that the bytes grep is reading are already in memory. This will give us an idea of how much of grep's slowness is because of the search time required, and how much is because it needs to read from disk.

Listing files also takes time! How many files can find list in a second?

grep_bytes.sh

Guess: bytes searched in one second
#!/bin/bash 

# Number to guess: How many bytes can `grep`
# search, unsuccessfully, in a second?
# Note: the bytes are in memory


NUMBER=$1

cat /dev/zero | head -c $NUMBER | grep blah
exit 0

find-filenames.sh

Guess: files listed in one second
#!/bin/bash

# Number to guess: How many files can `find` list in a second?
# Note: the files will be in the filesystem cache.

find / -name '*' 2> /dev/null | head -n $1 > /dev/null

Serialization is a pretty common place to spend a lot of time, and it can really hurt, especially if you end up serializing/deserializing the same data repeatedly. Here are a couple of benchmarks: of parsing 64K of JSON, and the same data encoded in msgpack format.

json_parse.py

Guess: iterations in one second
#!/usr/bin/env python

# Number to guess: How many times can we parse
# 64K of JSON in a second?

import json

with open('./setup/protobuf/message.json') as f:
    message = f.read()

def f(NUMBER):
    for _ in xrange(NUMBER):
        json.loads(message)

import sys
f(int(sys.argv[1]))

msgpack_parse.py

Guess: iterations in one second
#!/usr/bin/env python

# Number to guess: How many times can we parse
# 46K of msgpack data in a second?

import msgpack

with open('./setup/protobuf/message.msgpack') as f:
    message = f.read()

def f(NUMBER):
    for _ in xrange(NUMBER):
        msgpack.unpackb(message)

import sys
f(int(sys.argv[1]))

DATABASES. We don't have anything fancy like PostgreSQL for you, but we made 2 copies of a SQLite table with 10 million rows, one indexed and one unindexed.

database_indexed.py

Guess: queries in one second
#!/usr/bin/env python

# Number to guess: How many times can we
# select a row from an **indexed** table with 
# 10,000,000 rows?

import sqlite3

conn = sqlite3.connect('./indexed_db.sqlite')
c = conn.cursor()
def f(NUMBER):
    query = "select * from my_table where key = %d" % 5
    for i in xrange(NUMBER):
        c.execute(query)
        c.fetchall()

import sys
f(int(sys.argv[1]))

database_unindexed.py

Guess: queries in one second
#!/usr/bin/env python

# Number to guess: How many times can we
# select a row from an **unindexed** table with 
# 10,000,000 rows?

import sqlite3

conn = sqlite3.connect('./unindexed_db.sqlite')
c = conn.cursor()
def f(NUMBER):
    query = "select * from my_table where key = %d" % 5
    for i in xrange(NUMBER):
        c.execute(query)
        c.fetchall()

import sys
f(int(sys.argv[1]))

Hashing time! Here we'll compare MD5 (which is designed to be fast) to bcrypt (which is designed to be slow). You can hash quite a bit of stuff in a second with MD5; not so with bcrypt.

hash.py

Guess: bytes hashed in one second
#!/usr/bin/env python

# Number to guess: How many bytes can we md5sum in a second?

import hashlib

CHUNK_SIZE = 10000
s = 'a' * CHUNK_SIZE

def f(NUMBER):
    bytes_hashed = 0
    h = hashlib.md5()
    while bytes_hashed < NUMBER:
        h.update(s)
        bytes_hashed += CHUNK_SIZE
    h.digest()
import sys
f(int(sys.argv[1]))

bcrypt_hash.py

Guess: # passwords hashed in one second
#!/usr/bin/env python

# Number to guess: How many passwords
# can we bcrypt in a second?

import bcrypt

password = 'a' * 100

def f(NUMBER):
    for _ in xrange(NUMBER):
        bcrypt.hashpw(password, bcrypt.gensalt())
        
import sys
f(int(sys.argv[1]))

Next up, let's talk about memory access. CPUs these days have L1 and L2 caches, which are much faster to access than main memory. This means that accessing memory sequentially (where the CPU can load a bunch of data into a cache) will normally give you faster code than accessing memory out of order.

Let's see how that shakes out in practice! You might want to refer to Latency Numbers Every Programmer Should Know to guess at this one.

fill_array.c

Guess: bytes written in one second
#include <stdlib.h>
#include <stdio.h>

// Number to guess: How big of an array (in bytes)
// can we allocate and fill in a second?

// this is intentionally more complicated than it needs to be
// so that it matches the out-of-order version :)

int main(int argc, char **argv) {
    int NUMBER, i;
    NUMBER = atoi(argv[1]);

    char* array = malloc(NUMBER);
    int j = 1;
    for (i = 0; i < NUMBER; ++i) {
        j = j * 2;
        if (j > NUMBER) {
            j = j - NUMBER;
        }
        array[i] = j;
    }

    printf("%d", array[NUMBER / 7]);
    // so that -O2 doesn't optimize out the loop

    return 0;
}

fill_array_out_of_order.c

Guess: bytes written in one second
#include <stdlib.h>
#include <stdio.h>

// Number to guess: How big of an array (in bytes)
// can we allocate and fill with 5s in a second?
// The catch: We do it out of order instead of in order.
int main(int argc, char **argv) {
    int NUMBER, i;
    NUMBER = atoi(argv[1]);

    char* array = malloc(NUMBER);
    int j = 1;
    for (i = 0; i < NUMBER; ++i) {
        j = j * 2;
        if (j > NUMBER) {
            j = j - NUMBER;
        }
        array[j] = j;
    }

    printf("%d", array[NUMBER / 7]);
    // so that -O2 doesn't optimize out the loop

    return 0;
}