Safe streaming with linear types

Adaptation of the streaming library for increased safety through linear types. Summer of Haskell 2017 project in collaboration with Tweag I/O. My name is Edvard Hübinette, and I study computer science @ Chalmers University of Technology in Gothenburg, Sweden.

View the Project on GitHub

Linear types 101 and its relevance to streams

19 Jun 2017

Linear types is closely related to linear logic, where the idea of value consumption is introduced, meaning a value is only valid for one use. In the 90’s, this was found a neat way to represent finite resources which is very interesting for programming languages; this viewpoint is well explained here.

In practice, this whole idea builds on the rule a value of a linear type has to be used exactly once:

-- this does not use x in a linear way since it duplicates it
dup :: a -> (a, a)
dup x = (x, x)

-- neither does this, since it ignores it
derp :: a -> ()
derp x = ()

-- but this uses x exactly once
id :: a -> a
id x = x

-- and so does this
twiceWithFive :: Int -> (Int, Int)
twiceWithFive x = (x*2, 5)

This consumption requirement is explicitly indicated with a linear function arrow: , also called lollipop. This is also what is used in the Tweag I/O GHC fork, which extends the type system to check for correct multiplicity when such arrows are used, in a backward compatible way. In my bachelor thesis we weren’t as clever and split the type universe into linear and non-linear counterparts and did checks this way, which made it much more difficult to use in practice since we couldn’t write functions compatible with both.

What do we gain?

This is actually all there is to it in the practical sense, but this simple concept has very nice consequences. Most of all, it creates a resource aware type system, which not only knows types of values but also number of uses; a linearly typed value is guaranteed to only have one reference to it at any time, duplication or ignoring will not compile.

For example, let’s imagine that we want to model arrays, including an update function:

data Arr a = { ... }

update :: Arr a -> Arr a

since we do not know if there are other references to this array when we update it, we need to allocate fresh memory for this updated version. However, if we instead type it as

update :: Arr a  Arr a

and only allow linear operations on the arrays, the type system guarantees us that there is only one reference to the incoming array, this one! That means we could implement destructive updates (change the allocated memory in-place) safely, which is splendid since it is both faster and does not require garbage collection since there is always only one reference to the array.

Relevance to streams

With effectful streams (which here refers to the streaming library), the stream elements perform effects in some monad, and access to old stream states can result in really wonky behaviour. Consider this example:

import System.IO 
import Control.Monad.Trans
import Streaming.Prelude

streamTwoLines :: Stream (Of (IO String)) IO ()
streamTwoLines = do
    -- open up a file handle
    h <- lift $ openFile "temp" ReadMode 
    -- stream two lines
    yield $ hGetLine h
    yield $ hGetLine h
    -- close the file handle again
    lift $ hClose h


timetravel = do
    -- s here is the rest of the stream
    Right (v1, s) <- next streamTwoLines
    Right (v2, s') <- next s
    -- Stream is done, so the next 'next' will close the file
    Left () <- next s'
    
    -- but we still have a reference to the old state...
    Right (v3, _) <- next s -- note the same s here, 'next':ed again
    v3 >>= putStrLn
    putStrLn "Succesful streaming without runtime errors"

Running timetravel:

**** Exception: temp: hGetLine: illegal operation (handle is closed)`

Yikes! The point of a strong type system is to get rid of pesky runtime errors, but the streaming library makes it possible (even though we can mostly blame it on IO).

The underlying problem is that old stream references repeat their monadic action when called, meaning that any action can be repeated again, leading to almost arbitrary results if one is not careful. This is not limited to IO, but applies to any monadic action! This means that whatever you can do with a monad, streaming could repeat where you least expect it. How could we eliminate this problem?

With linear types of course! This is a perfect example of where non-linearity can lead to errors, even in a setting as strict as Haskell, if one is not careful. If the streaming combinators would return linear stream references, errors like this would be caught at compile time; coincidentally this is exactly what I will be doing for the rest of the summer!

Further example of vulnerability

The example from above can be modified to read the whole file, from a stream only containing two lines. This is because the monadic action at the head of s reads a line from that file, and since we didn’t close the file handle (by calling next again on s') we can keep grabbing lines by repeating that action.

infRead = do
    Right (v1, s) <- next f  
    Right (v2, s') <- next s
    v1 >>= putStrLn
    v2 >>= putStrLn
    forever $ do
        Right (secret, _) <- next s
        secret >>= putStrLn

Running infRead:

a b c
d e f
secret line 1
secret line 2
*** Exception: temp: hGetLine: end of file