Faster sieves

Although the standard Unix shell pipe operator can only construct linear pipelines, the pipe() system call can be used to build arbitrary data-flow networks. Named pipes (also known as fifos) allow this to be done at shell level, albeit clumsily. A case in point is a sieve that determines primality of n by testing prime divisors only up to the square root of n instead of n.

The following Haskell code does the trick. The sieve has an auxiliary input, the stream of primes itself. The first prime, 2, is given explicitly, so it is available for the guard n<p^2 that postpones testing for multiples of prime p.


    primes = 2 : sieve primes [3..]
    sieve ps@(p:ps') (n:ns)
        | n < p^2   = n : sieve ps ns
        | otherwise = sieve ps' (filter ((!=0).(`mod`p)) ns)
The data stream connections in this algorithm can't be represented using only the shell operator '|'. First, the stream primes is delivered both to standard output and to sieve. And second, the feedback of that stream into sieve induces a dataflow cycle.

Using less common programming techniques, we can translate the Haskell into shell code. To split a stream for delivery to two places, we use the Unix plumbing command tee, with a named pipe for the alternate destination. The pipe is opened for reading on file descriptor 4 in the sink process.


#!/bin/bash
cull() {             # cull p deletes multiples of p
   while true
   do 
       read n
       (($n % $1  != 0)) && echo $n
   done
}

sink() {             # emit primes < p^2, then cull p               
    read -u4 p
    while
        read n
        (($n < $p * $p))
    do
        echo $n
    done
    cull $p | sink &
}

mkfifo f
seq 3 1000000 | (echo 2; sink 2 4<f) | tee f
The shell code and the Haskell code are not as exactly parallel as they were in the basic sieve model because reads from a pipe are destructive, while reads from a Haskell list are not. Thus the shell code reads prime p once and uses it in a loop, while the Haskell code reads it repeatedly from the list.