======================[ Laziness: handle with care ]====================== The idea of lazy computation as the default mode of a execution for a programming language was revolutionary. Lisp and Scheme started with eager evaluation; it was not until functional programming was already mature that the idea of lazy computation as its natural companion emerged---and quickly took over in a progression of languages that led to Haskell. This story is told in the "A History of Haskell: Being Lazy With Class" by the co-creators of Haskell and modern functional programming. The place of laziness is apparent even from the title---and you will find its beginnings described in the first few pages: http://haskell.cs.yale.edu/wp-content/uploads/2011/02/history.pdf You can track the origins of this idea from one the key papers that argued for it, "CONS should not evaluate its arguments", http://www.cs.indiana.edu/pub/techreports/TR44.pdf . Observe how the jargon of PL has changed since this paper---and what remains the same to this day. For the application of laziness in Javascript, Scala, Swift, and Racket (a popular teaching dialect of Scheme) see http://matt.might.net/articles/implementing-laziness/ Laziness is in near-perfect correspondence with pattern matching: we care to evaluate the matched data structure only so far as the pattern requires. In a world where typed can only be built by constructors and all functions that work on it must start with taking it apart by patterns, this correspondence can save us the runtime effort of evaluating most of these constructors. --------------------[ An example: lazy lists ]-------------------- List are perhaps the simplest example. Recall that lists come with built-in constructors : and [] , and any other way to create or handle them is "syntactic sugar" that reduces to using and/or pattern-matching them. For example, "head 1:2:3:4:5:[]" need not construct a full list; the head's definition, "head x:_ = x", can make do with just one : , ignoring the rest. Moreover, the rest of the list may not even be defined: Prelude> head (1:undefined) 1 This ability to evaluate an expression whose parts may be undefined is referred to as "non-strict semantics". Non-strictness is strongly tied with laziness, but they are not quite synonymous, as https://wiki.haskell.org/Lazy_vs._non-strict explains. Observe that the above does even less work than printing in the REPL: an attempt to print 1:undefined fails with an exception, Prelude> 1:undefined [1*** Exception: Prelude.undefined where the call to head above succeeds. Of course, the same idea is behind Haskell's infinite lists like [1..]. Printing it leads to an infinite loop in the REPL, Prelude> [1..] [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,^C ^C whereas evaluating functions on it makes for all kinds of neat tricks like in http://www.techrepublic.com/article/infinite-list-tricks-in-haskell/ ----------------------[ Thunks ]---------------------- Unlike LISP or Scheme, the default "product" of a function or operator such as "+" or ":" is not a value. Rather, it's a *thunk*, which could be evaluated further if actually needed, a recipe of how to compute that value. You can think of the thunk as an "environment" data structure that is paired with the generic code that computes that kind of a value, and will guide that generic code to compute that particular value if called. This should remind you of continuations, a closely related idea where a captured environment data structure guides postponed execution---except in Haskell this postponed execution can also be pattern-matched on. ----------------------[ Surprises ]---------------------- You should remember that thunks will not just "do what you mean". They will happily accumulate and consume memory even if you don't intend them to. For example(*)(**), the simplest kind of a recursive sum function (*) the following example and some further examples are from http://eax.me/lazy-evaluation/ (a Russian programmer blog) (**) the code for this and further examples can be found in hs/lazy/ , the commands to build and run them in hs/lazy/Makefile (starting from the bottom of the file; I kept adding targets to the top of the file, to run the top target's commands with "make", which by default processes the first target) $ cat mysum.hs mysum [] = 0 mysum (x:xs) = x + mysum xs main = putStrLn $ show $ mysum $ take 100000 $ [1..] will happily accumulate thunks of "+" to consume megabytes of memory space, which will be acted upon only at print time, and garbage-collected. This can be seen by compiling the above program with memory profiling: $ cat Makefile mysum: mysum.hs ghc -rtsopts mysum.hs ./mysum +RTS -sstderr $ make ghc -rtsopts mysum.hs [1 of 1] Compiling Main ( mysum.hs, mysum.o ) Linking mysum ... ./mysum +RTS -sstderr 5000050000 21,813,256 bytes allocated in the heap 12,048,632 bytes copied during GC 6,571,152 bytes maximum residency (5 sample(s)) 25,472 bytes maximum slop 11 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 29 colls, 0 par 0.01s 0.01s 0.0003s 0.0006s Gen 1 5 colls, 0 par 0.01s 0.01s 0.0021s 0.0039s INIT time 0.00s ( 0.00s elapsed) MUT time 0.01s ( 0.01s elapsed) GC time 0.02s ( 0.02s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 0.02s ( 0.03s elapsed) %GC time 71.5% (71.5% elapsed) Alloc rate 3,989,256,766 bytes per MUT second Productivity 28.2% of total user, 22.7% of total elapsed Note that this simple program uses up 11MB of RAM---certainly not what you'd expect of a simple integer-summing loop! You might think that this is due to accumulation of stack frames, since the mysum function above is not tail-recursive. However, tail recursion doesn't in fact help: $ cat mysum-tr.hs mysum acc [] = acc mysum acc (x:xs) = mysum (acc+x) xs main = putStrLn $ show $ mysum 0 $ take 100000 $ [1..] $ cat Makefile mysum-tr: mysum-tr.hs ghc -rtsopts mysum-tr.hs ./mysum-tr +RTS -sstderr $ make ghc -rtsopts mysum-tr.hs [1 of 1] Compiling Main ( mysum-tr.hs, mysum-tr.o ) Linking mysum-tr ... ./mysum-tr +RTS -sstderr 5000050000 20,994,040 bytes allocated in the heap 18,621,440 bytes copied during GC 5,915,616 bytes maximum residency (5 sample(s)) 91,232 bytes maximum slop 11 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 35 colls, 0 par 0.01s 0.01s 0.0003s 0.0006s Gen 1 5 colls, 0 par 0.01s 0.01s 0.0021s 0.0044s INIT time 0.00s ( 0.00s elapsed) MUT time 0.00s ( 0.01s elapsed) GC time 0.02s ( 0.02s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 0.02s ( 0.03s elapsed) %GC time 74.7% (74.0% elapsed) Alloc rate 4,797,541,133 bytes per MUT second Productivity 25.0% of total user, 20.0% of total elapsed If anything, it leads to _more_ garbage collection, not less, ~18M rather than ~12M ! The culprit is still the thunks of "+" accumulating, despite the tail-recursion savings on frames (about ~1M of overall allocation). ----------------------[ Forcing strictness with seq ]---------------------- This problem is so common that Haskell includes the function seq to force evaluation of the thunks. It works on an expression until it encounters a first constructor such as that of a cons cell or pair. $ cat Makefile mysum-seq: mysum-seq.hs ghc -rtsopts mysum-seq.hs ./mysum-seq +RTS -sstderr $ make ghc -rtsopts mysum-seq.hs [1 of 1] Compiling Main ( mysum-seq.hs, mysum-seq.o ) Linking mysum-seq ... ./mysum-seq +RTS -sstderr 5000050000 14,451,704 bytes allocated in the heap 16,768 bytes copied during GC 44,312 bytes maximum residency (2 sample(s)) 21,224 bytes maximum slop 1 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 26 colls, 0 par 0.00s 0.00s 0.0000s 0.0001s Gen 1 2 colls, 0 par 0.00s 0.00s 0.0005s 0.0008s INIT time 0.00s ( 0.00s elapsed) MUT time 0.00s ( 0.00s elapsed) GC time 0.00s ( 0.00s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 0.00s ( 0.01s elapsed) %GC time 8.0% (18.5% elapsed) Alloc rate 5,112,028,298 bytes per MUT second Productivity 90.3% of total user, 63.1% of total elapsed Much better---this cut the garbage-collected thunks to mere kilobytes! Note that the list [1..100000] is still costing us a lot (see this by reducing 100000 to 100 and rerunning), but in this example the target is thunks, not the data structures that list creates. ----------------------[ -XBangPatterns ]---------------------- Besides seq, there are a number of ways to enforce strict evaluation. The simplest---which is not in standard Haskell, but can be enabled as a languages extension---is using ! with the function's pattern matching to indicate that the argument expression should be strictly evaluated: $ cat mysum-bang.hs -- Uncomment the following to have the code compile without -- the -XBangPatterns option to ghc: --{-# LANGUAGE BangPatterns #-} mysum !acc [] = acc mysum !acc (x:xs) = mysum (acc+x) xs main = putStrLn $ show $ mysum 0 $ take 100000 $ [1..] (run this and see the GC stats) ----------------------[ -O2 ]---------------------- Compiling with -O2 will catch some cases where strict evaluation is called for. See the mysum-tr-o2 target in the hs/lazy/Makefile . However, -O2 strictness analysis is not magic. The very next target, mysum-rest, fails to get optimized. In mysum-rest the mysum function returns a tuple (pair) of the list's sum up to its third argument and the object representing the rest of the list (in our example, an infinite list!). The compiler fails to deduce the desired strictness of the first argument, and brings back megabytes of GC. This example can be fixed by explicitly specifying strict patterns, as in mysum-rest-bang or by using seq as in mysum-rest-seq More about Haskell's strictness analysis: https://wiki.haskell.org/Performance/Strictness ----------------------[ seq vs deepseq ]---------------------- Seq is O(1) and stops at the first constructor. Thus seq works for arithmetic operators, but stops, e.g., at a pair constructor. Consider mysum-and-len , which returns the pair of the list's sum and length, using two accumulator arguments. Its thunks cause a ton of GC---in fact, around ~35M, twice as many as before, despite the tail recursion, because of the additional accumulator argument creates it own thunks. This is fixed by using strict patterns as in mysum-and-len-bang --- but let's try something else. Let's rewrite this function to take the two accumulators as a pair, and let's use the seq trick above to try to cause the strict evaluation. Unlike the previous examples of mysum-rest-seq.hs and mysum-seq.hs, this trick ( mysum-and-len-seq ) does not work! The forced thunk evaluation caused by seq stops at the tuple constructor---with the familiar result of causing a ton of GC. Just for such cases, the deepseq function imported from Control.DeepSeq helps breach this obstacle. Note that deepseq is O(N) in the size N of the thunk expression, as it fully traverses that expression (cf. deep copy of LISP lists). The mysum-and-len-deepseq example shows how. Note that deepseq in my GHC version has trouble inferring the types, and gives "ambiguous type" errors. For this reason, in this one example we need an explicit type definition for mysum. This example can also be fixed by explicit -XBangPattern strictness ( mysum-and-len-bang ). ----------------------[ Strict data structures ]---------------------- Finally, another way of enforcing strictness is to declare it in type definitions. Compare mysum-and-len-spair.hs , which defines its own pair type for the sum and length of the list, and mysum-and-len-spair-strict.hs , which stipulates that strict evaluation must be applied whenever an SPair constructor is called.