Rob's Scala Blog

Saturday, 15 December 2012

Sorting an immutable list (in Scala)

[2013-01-07 Update: Results, Discussion and Conclusion updated in light of new benchmarks obtained using a different set-up.]

This post presents a method that does not resort to using arrays, as the built-in methods do, but instead uses scala.collection.immutable.List itself.

Motivation

Finding myself writing Java code making heavy use of arrays, one minute, and then Scala code in which everything should be immutable, the next, I decided to try to rewrite some of the former in the manner of the latter.

The code in question examines a set of 2-d points, and finds all those points which lie on hidden straight lines (with at least four points per line); the object of the exercise was to demonstrate a clever algorithm which employs sorting to reduce the time taken, so that it becomes proportional to N*log(N) rather than N*N (for N points).

Having eventually succeeded in rewriting this in Scala, using immutable lists instead of arrays, I began to wonder how the sort method I was blithely calling was actually implemented.

The answer is that the list is first copied to an array, which is then sorted (using one of Java's built-in sort methods), before being copied to a new immutable list.

Well, well!

No wonder the immutable list version was taking longer than the array one, with all that copying going on.

Why not do the sorting by creating new immutable lists directly?

So I set out to find out.

All been done before?

Searching the web, the best algorithm I could find was one for sorting mutable linked lists, with a sample implementation written in C. Therefore, using this one as a starting point, I began to devise one of my own, using Scala.

This could easily have been one of the FPPinS assignments, but without the skeleton solution!

After a great deal of perseverance, writing out of lists using pencil and paper, and an exercise in divining recursive functions from sets of similar ones, the following algorithm and implementation was arrived at.

Algorithm for sorting an immutable list

Sorting is done by first recognising that the number of elements in the list may be reduced to a sum of factors which are decreasing powers of 2; e.g. 15 = 8 + 4 + 2 + 1. The list to be sorted is

then considered to be the concatenation of subsequences of these lengths, and is sorted in two

stages:

each subsequence, starting with the longest, is sorted and prepended to a list;
each element of the tail of that list is then merged with the head.

Each subsequence in stage 1. is sorted by repeatedly considering it to be the concatenation of pairs of sorted subsequences, initially of length, 1, which are then merged.

In this way, it was found possible to use only constant-time list-operations (head, tail, ::) throughout.

Cases

Each time a pair of subsequences is merged, heads must be prepended to the result first, so that the order of the result is the reverse of those of the subsequences being merged. Therefore, it is important to keep track of the order of each subsequence, and ensure that the order of the result of the very first merge is chosen so that very last one will be from low to high. This was achieved by recognising that there are three cases which can arise, based on the number of stage 1. subsequences, each one with its own sequence of orders (where, in the following, '<' indicates low-to-high, '>', high-to-low):

Single subsequence: <
Even number of subsequences: >(<>...<>)>
Odd number of subsequences (> 1): >(<>...<>)<<

Note that the the number of subsequences may be derived from the number of elements using Java's Integer.bitCount method.

The cases can be neatly implemented using case-classes, of a trait supplying a method that returns the order as a function of both the position of the subsequence and the previous order.

Algorithm analysis

The number (strictly, order of growth) of comparisons of elements (for a list of length, N) appears to be optimal for sorting:

stage 1. N*log(N)
stage 2. N

Stability

If possible, the order of two elements considered equal by the sort criterion should be preserved (in case it might be significant), in which case the sorting algorithm is termed 'stable'. Here, stability was achieved by being more careful when selecting the particular inequality to test for when comparing elements.

Results

The implementation, complete with tests of performance as well as for correctness, is to be found in a GitHub project called list-sort.

Performance

Performance testing was carried out using the rather useful ScalaMeter framework (which I learned of here).

As shown in the figure, the answer to the above question soon became apparent - it takes about 7 times longer to sort an immutable list of 30,000 integers, compared with using the built-in sorted method (which first copies the elements to an array, and sorts that), falling to about 4 times longer for 150,000 integers.

[2013-01-07 The above results were obtained on a dual-core MacBook Pro, using--as pointed out by Eugene Platonov (see comments)--input arrays/lists which were already sorted. So, new results have been obtained, this time using input arrays/lists in reverse-sorted order (thereby representing the worst case), and on two different set-ups (thanks to results submitted by Eugene in the comments).

Note that the 'AMD PC' is a quad-core set-up, running Java 6, versus Java 7 on the MacBook Pro (see comments). 2013-01-07]

Lines of code

Given that it is possible to sort a list in just a few lines of code if the restriction to constant-time list operations is removed, at about 100 lines of code excluding doc-comments, this particular algorithm and implementation are relatively complex and lengthy.

Discussion

One observation which goes some way towards accounting for the relatively poor performance is
that the recursive function to sort the first (largest) power-of-2 elements is not tail-recursive.

Note that this should not however (in itself) result in excessive use of memory, since only log2(len) frames will accumulate (which is only 17 in the largest test case).

[2013-01-07 The new results show a significant difference in performance characteristics between the two set-ups.

With Java 7 on a dual-core (Intel) Mac, the built-in sorting methods perform much better than the current algorithm, and this does not appear to be affected by whether or not the input is already sorted.

With Java 6 on a PC with a quad-core AMD CPU, the current algorithm's performance is on par with that of the built-in method for sorting an array, and exceeds that of the build-in method for sorting a list. 2013-01-07]

Conclusion

A stable algorithm for sorting an immutable list, using only constant-time list-operations, was devised and implemented in Scala.

One of the reasons for going to this trouble was to try to find out if it might be more efficient to sort such a list using functional programming, rather than by first copying it to an array. The above performance testing suggests that it definitely is not, and would appear to be in keeping with advice from Martin Odersky (skip to 1:07:30) about when to 'drop down' to using imperative arrays:

where functional programming really matters is [programming] 'in the large'
okay provided access is controlled and interface is purely functional.

[2013-01-07 The contributed results suggest that, on the contrary, using functional programming to sort a list can indeed be more efficient than using imperative arrays, depending on the particular set-up.

They also suggest that the built-in sorting methods of Java 7 and available on (Intel) Macs are highly optimised in comparison to those of Java 6 and available on AMD hardware. 2013-01-07]

Comments welcome.

Friday, 23 November 2012

Dependent-type 'revolution'

Just prior to Scala eXchange 2012 (which took place earlier this week), I came across a (quite accessible) blog post entitled, Unifying Programming and Math - The Dependent Type Revolution.

An intriguing title, referring to one of those subjects I hadn't quite got round to investigating yet - still on my to-do list was to listen to an interview with Miles Sabin talking about, amongst other things, dependent types.

Although Miles was at the conference on the first day, he wasn't around on the second, so I missed my chance to talk to him. However, there was always the panel discussion, so I made do ;-) with addressing a question to Martin Odersky on the matter (22:20 mins into the video).

Here's what he had to say, with my own, possibly wrong-end-of-the-stick, embellishments:

There's an interesting area between Haskell and maths, that's populated by programming languages [featuring dependent types] such as Agda, and theorem provers such as Coq.
According to the Curry-Howard Isomorphism,

there's a strong correspondence between types and properties [of programs] (so we can encode the properties of programs in types);

[By properties of programs is meant things like the property of associativity of the concatenation operator of a list.]

if a type is a property [of a program], then the [existence of that working] program [involving that type] is a proof of that property.

Certain theorems in maths (e.g. Four-colour theorem) can only be proved (practically) by writing a program [in a language like Agda, expressing the theorem as dependent types].
Applications of such proofs in computing are currently somewhat specialised/limited:

proof of correctness wrt attacks on cryptographic systems.
proof of correctness of whole programs, using theorem provers such as Z3 to do the constraints solving, is currently limited to programs in the 10s of lines.

Scaling of proof of whole-program correctness to much larger programs, or the obviation of unit tests, is unlikely to happen any time soon.
We are constrained by the need to reconcile that which it is possible to do with types, with that which humans can comprehend.
Having type systems which are easy to use is the more pressing challenge.

To delve deeper into dependent types using Scala, it looks like Miles Sabin's Shapeless project and blog are the places to go.

Saturday, 11 August 2012

Don't quite get how FP handles state?

At first, neither did I.

Reading Learn You a Haskell (Chapter 13) or Functional Programming in Scala (Chapter 6), I was asked to call something that obtains a value that depends on some state, a state!

In other words, the type of that something is called State:

Okay, so it's really short for 'stateful computation' or 'state action/transition', but they shortened it to 'State', even though this is what s/S represents.

Confusing, right?

So what would I have called it?

How about, simply, Get,

since,

instances obtain values
the statefulness may be conveyed in the name of the trait/class/object of which it's a member (e.g. Stateful).

One argument against this name is that instances are sometimes used to set values (and store them in the state). However, it looks like this is still implemented by doing a get, using an instance of Get in which A is Unit.

Anyway, consider changing the name yourself if you too were confused. You never know, it might catch on.

Thursday, 14 June 2012

Fixing scala.Either - unbiased vs biased

[2012-06-18 Update: details of new branch added, where Either gains right-biased capability while retaining its unbiased one.]
[2012-06-25 Update: details of new branch added, where Either gains a withFilter method.]
[2012-06-29 Update: link to SIP candidate added.]

This latest effort is the result of a stalled attempt to write a formal proposal (SIP) explaining exactly how scala.Either should be fixed.

It became apparent that the need to first deprecate things, as opposed to simply removing them, might make the approach I was advocating less attractive.

So, with a view to helping the community to come to an informed decision, I now present a workable set of changes that leaves scala.Either unbiased, together with one that gives it a bias towards its Right subtype, for comparison.

Unbiased

To recap, we wish to make the following changes to Either's Left-/RightProjection inner classes:

have map return a Left-/RightProjection instead of an Either
have flatMap also do this, and also take an A => Left-/RightProjection instead of an A => Either
remove filter

Although we can deprecate filter, we cannot do this to the existing versions of its map and flatMap.

Therefore, we have no choice but to deprecate Left-/RightProjection themselves, and in turn, Either's left and right methods.

The alternative names I came up with are Left-/RightProj, lp and rp. At least they are shorter, and anyway, left and right would no longer be correct, since their map and flatMap now return a Left-/RightProj instead of an Either.

One advantage of this is that we can now just leave out filter, rather than have to deprecate it.

Right-Biased

Here we need to do the following:

deprecate Either's left and right methods
deprecate Left-/RightProjection themselves
add the following methods to Either itself

get
getOrElse
foreach
forall
exists
flatMap
map
toSeq
toOption

Note the absence of filter.

Decision time

Please take a look at the unbiased (lp_rp branch) and Right-biased (right-biased branch) end results, together with the respective versions of the accompanying test code:

unbiased

Right-biased

The reasons to remain unbiased, which occur to me, are as follows,

any use-cases demanding the ability to choose from which subtype (Left or Right) the value passed to map, etc. is taken, will not be excluded
Either does not imply any bias, although Left and Right do imply a weak one

while those to go biased, would seem to me to be that,

there is no need to append .rp (or .lp) in for comprehensions, which must be done for each generator
there is no need to append .e in order to obtain the Either from the result yielded.

Please register your preference on this scala-debate thread, or else below.

Thanks!

2012-06-18 Following a highly productive exchange of ideas on the above thread, a new branch entitled add_right-bias_2-10 has been added to the project. In this version of Either, the question of unbiased vs biased goes away! Since these two usage modes turn out not to be mutually exclusive, a right-bias has been added, and the existing unbiased capability (through the lp and rp methods) retained.

2012-06-25 add_right-bias_2-10_withFilter branch added to the project. This version adds a withFilter method to Either, such that right-biased for comprehensions may now use if and involve refutable pattern-matching, provided the appropriate implicit conversions are supplied (as indicated by the compiler), that will be used to create a Left value whenever the expression following if is false, or there is no match.

2012-06-29 SIP candidate now written.

Thursday, 7 June 2012

Okay/Reason extends Checked (a biased scala.Either)

[2012-06-08 Update: added link to 'tests involving Option', demonstrating how 'if' may still be used in for-comprehensions involving Checked.]
[2012-06-09 Update: last four 'tests involving Option' now demonstrate converting from a Checked[A, R] to a Checked[B, R].]
[2012-06-10 Update: the last eight 'tests involving Option' now demonstrate the above; links to Musicians and printMusicians test code added.]

This blog has so far been concerned with how to enhance the existing scala.Either class.

The first three posts presented certain 'extras' that can be used together with the unmodified Either,

while the previous one presented a modified version which claims to fix certain of its shortcomings.

Today's post, however, presents an alternative to Either, and is perhaps the culmination of the above.

Motivation and credits

The current effort is intended to help inform any decision to add an alternative to Either to the Scala library. It owes much to the original author of the Either class, Tony Morris, and to contributors to the debate referred to in the previous post, wherein other alternatives are discussed.

The 'biased alternative' to go alongside the unbiased Either envisaged in that post has now been written. What follows is a brief tutorial, followed by some instructions for using it yourself.

Tutorial

Checked[A, R] is for use as the return type, in place of some type, A, wherever an exception might otherwise be thrown and a more functional style is preferred. Such a result will contain either an A wrapped in an instance of Okay or else an R wrapped in an instance of Reason, depending on whether or not an exceptional condition arose.

Checked vs Either

There are three main differences.

Firstly, rather than have to specify that it's only the value contained in the Right (or perhaps, Left) instance of an Either that's to be passed to the foreach, map or flatMap methods, it is only ever the value contained in Okay instances of Checked, that is passed.

Secondly, Checked has no filter or withFilter method, and consequently, no support for if in for comprehensions, on the grounds that the result should never be empty - either it's Okay, or else there's a Reason why not. Note, however, that if may still be used following a conversion from a Checked to a scala.Option, which does have both these methods, as demonstrated here.

Thirdly, Checked comes complete with implicit support similar to that provided for Either by EitherExtras:

a.succeed becomes a.toOkay
a.fail becomes a.toReason
a.fastCheck becomes a.ff (as in fail-fast)
a.slowCheck becomes a.fs (as in fail-slowly)
fast(f) <*> ... becomes ff(f) <*> ... where f is curried
slow(f) <*> ... becomes fs(f) <*> ... ditto

Mix-in or import

Checked, together with its implicit support, is nested in a trait called Checking. As is the case for EitherExtras, this has an abstract type, now called R, which must be set to the type of the value wrapped by Reason when using the <*> operator.

As is also the case for EitherExtras, Checking has a companion object which extends its companion trait, and sets the abstract type to String.

For example code, please see the following test code:

Instructions

To make use of the Scala 2.9.2-compatible class files now available in the Maven central repository, add the following to your build.sbt,

libraryDependencies += "org.lafros" %% "scala-checking" % "1.0"

or the following to your pom.xml.

<dependency>
<groupId>org.lafros</groupId>
<artifactId>scala-checking</artifactId>
<version>1.0</version>
</dependency>

The code is available under the Apache v2.0 licence, and maintained in a Github project called scala-checking.

Comments welcome.

Monday, 28 May 2012

Fixing scala.Either - left/right.map returns projection

[2012-06-06 Update: support for 'if' in for-comprehensions now removed - result must be either Left or Right, and never empty!]
[2012-06-11 Update: added link to 'tests involving Option'.]

That Either can, after all, participate in for comprehensions involving multiple generators was a revelation of an earlier post.

To recap, you need to indicate which of the two possible results you're interested in, by using either the left: LeftProjection or right: RightProjection of the Either. For example, multiple checks (each returning an Either[..., Int]) on some a: Int, would be carried out as follows:

What needs fixing

However, it has subsequently been pointed out (in this bug report), that definitions aren't available,

and worse still, I notice that neither can you use if together with multiple generators and yield:

Debate

The above bug-report resulted in a debate about a proposal to fix Either by modifying it in such a way that it would behave like its right: RightProjection. This so-called right-biasing would therefore eliminate the need to specify which result you were interested in (and involve left and right being deprecated).

However, not everyone agreed with this, with some calling for an alternative class to be introduced.

My own opinion is that Either should just be patched so as to enable the above two cases to work. (I don't agree with the right-biasing proposal, since the class's very name does not imply any bias; I'd sooner see an unbiased Either alongside a biased alternative, with a name such as Checked.)

Patch

So I ended up trying to do the patching myself, and against the odds, now appear to have succeeded.

Rather than have left/right.map return an Either, it now returns a Left-/RightProjection.

This means that yield will result in a Left-/RightProjection instead of an Either, which just requires that you add .e to obtain the Either:

The patched Either is maintained in a Github project called scala-either-proj-map-returns-proj, including for comprehension tests, tests involving Option, and the test app from which the example above was taken.

Feel free to build the project yourself, and add further tests that try to break something.

Comments welcome.

Wednesday, 16 May 2012

EitherExtras class files now published

EitherExtras is now in a package called org.lafros.scala, and licensed for use under the Apache v2.0 licence.

A jar file containing Scala 2.9.2-compatible class files is now available in the Maven central repository. To make use of it, add the following to your build.sbt,

libraryDependencies += "org.lafros" %% "scala-either-extras" % "1.0"

or the following to your pom.xml.

<dependency>
<groupId>org.lafros</groupId>
<artifactId>scala-either-extras</artifactId>
<version>1.0</version>
</dependency>

Please report issues here.

Many thanks to Sonatype for providing their publishing service, and to Josh Suereth et al. for providing instructions.