Vatsal Parekh

Your friendly web dev

Reading fortune dat files with python using structs

2017-12-17

Well first a bit about fortune program:

fortune - print a random, hopefully interesting, adage.

Taken from the man page of fortune. Read man fortune for more.

I always have fortune command inside my .bashrc, so whenever I open a terminal I am greeted with a fortune cookie. I have grown rather fond of it, to a point that it is one of the first thing that I install whenever I install a new distro. I also make others install it. =)


So back to the topic at hand.

fortune when supplied with no commands prints a random fortune for you taken from fortune files that come with it when you install.

Now these files usually stored inside /usr/share/fortune have a corresponding dat files with them. The fortune program uses these files to serve cookies. The fortune cookies file has this format (% is the delimiter):

3rd Law of Computing:
    Anything that can go wr
fortune: Segmentation violation -- Core dumped
%
667:
    The neighbor of the beast.
%

So the fortune file stores the actually cookies and the dat file holds the meta data about the fortune cookie file.

The dat files are made by a program called strfile. Strfile stores a header with info about the fortune cookie file and offsets of the cookies in the dat file. This allows random access of the strings. Read man strfile for more.

Now, using python to read the dat files created by strfile:

We can do this by using a module name struct from the standard library. This module performs conversions between Python values and C structs represented as Python bytes objects.

First the header structure the strfile uses:

#define VERSION 1
unsigned long str_version; /* version number */
unsigned long str_numstr; /* # of strings in the file */
unsigned long str_longlen; /* length of longest string */
unsigned long str_shortlen; /* shortest string length */
#define STR_RANDOM 0x1 /* randomized pointers */
#define STR_ORDERED 0x2 /* ordered pointers */
#define STR_ROTATED 0x4 /* rot-13'd text */
unsigned long str_flags; /* bit field for flags */
char str_delim; /* delimiting character */ 

The header takes 21 bytes. But it is padded with 3 bytes. Also strfile writes all fields(ie header fields and offsets) in network byte order(big-endian).

import struct

with open('somefortune.dat', 'rb') as dat:
        # (version, numstr, len of longest string, len of shortest string, flags, delimiter, )
        header = struct.unpack(">IIIIIcxxx", dat.read(24))
        offsets = [] # for offsets from dat file
        for i in range(header[1]+1): # str_numstr + 1 == no. of offsets (starting from 0 to str_numstr)
            offsets.append(struct.unpack(">I", dat.read(4)))
print(header)
print(offsets)

struct.unpack takes format string and a buffer to unpack and returns a tuple.

>   for big-endian
I   unsigned long
c   char
x   pad byte

The output of above code would be something like this:

(1, 10, 1103, 106, 0, b'%')
[(0,), (215,), (1219,), (2324,), (2432,), (3070,), (3815,), (4468,), (4755,), (5777,), (5894,)]

Now that we have offsets and the header we can use it with fortune cookie file to get a random fortune for ourselves. By doing something like this:

def get_fortune(header, offsets):
    random_number = random.randint(1, header[1])
    with open(filename) as file:
        fortunes_all = file.read()
        fortune = fortunes_all[offsets[random_number-1][0]:offsets[random_number][0]-2] # -2 to remove '%\n'
    return fortune

I made a small script to read a random cookie and print to terminal. You can check it here.

Here is one fortune cookie using the file:

❯ python get_fortune.py education
Try not to have a good time ... This is supposed to be educational.
        -- Charles Schulz

Thanks for reading.