Tech

What's wrong with SBT?

A fun picture :)

Before

1. What will the text be about?

What the author does not like about SBT

2. What questions am I looking to answer?

I know working with SBT was difficult. I am looking to answer WHY working with SBT was difficult

3. What do I know about the topic already?
  • Working with SBT sucks.
  • Error messages are cryptic
  • Compilation is slow
4. Skim the text to get an idea for the structure of the text
  • Unwanted Data Model
  • Too many layers of interpretation
  • Poor tooling
  • Semantic Duplication
  • Global, mutable Execution Model
  • Global namespacing
  • Lack of caching
  • Single process/Classpath architecture

During

What's wrong with SBT?
  • SBT stands for scala build tool
  • Things that can fixed easily:

    1. The difference between run in (Test, myproject) and myproject/test:run
    2. Leaking implementation details like Def.task{...}.taskValue (https://github.com/sbt/sbt/pull/2943 (https://github.com/sbt/sbt/pull/2943))
    3. Syntactic bugs, like lazy val (js, jvm) = crossProject() not being allowed
    4. The unusual evaluation order of foo.value expressions
  • Thing such an obtuse operator defintion like ++>> have been recently deprecated
What data model does SBT use?
  • A 2D grid - tasks vs. subprojects
  • The grid ends up being sparse in practice
  • Violating the rule of "make illegal states unrepresentable"
  • The author proposes a hierarchacal model
  • Any software project can be composed of the following

    1. Files grouped together into folders
    2. Classes grouped together in nested packages
    3. Methods grouped into classes,
    4. Control-flow structures are nested and grouped together in methods
  • If you are organizing files in such a way on disk, why should the build tool view the project any differently?
What layers of interpretation exist in SBT?
  1. Run Scala code to construct the Seq[Setting] that gets passed to each subproject

  2. Interpret Seq[Setting] , with all the := and ++= operations, to create Task graph

  3. Interpret Task Graph to actually do the work, scheduling the work in a topological order, and caching where necessary

  4. Each layer is executed sequentially and there exists no overlap between the layers

  5. Trying to trace through the Layer 1 Scala code that is constructing the Seq[Setting] for a subproject is difficult

  6. Following the interpretation of Layer 2's settings to figure out what ends up being bound to each key to is difficult

  7. Figuring how in Layer 3 SBT is going to execute the task you told it to execute, along with all its dependencies, is difficult

  8. Other libraries do have two layers, but many times one of the layers is so thin that it is transparent to the user

What is semantic duplication?
  • SBT's Layer 2 interpreter is basically a full-fledged programming language: it walks a sequence of instructions, manipulating, mutating and reassigning variables (the taskKey s and settingKey s). This results in several concepts in Layer 2 which look strikingly similar to concepts in Layer 1 (directly executing Scala code to construct the Seq[Setting] ):
  • SBT's mySettingKey vs Scala's val myValue
  • SBT's foo := bar assignment vs Scala's val foo = bar
  • SBT's foo += 1 mutation vs Scala's var foo = ...; foo += 1
  • SBT's scope-delegation (letting you fall back to broader scopes) vs Scala's lexical scope-delegation (which also lets you access values defined in broader scopes)
  • Fundamentally, understanding the concepts of scoping, mutation, and overriding in any programming language is difficult. SBT's flavor of these concepts is similar enough to normal Scala to be just as difficult to pick up, but also different enough that you cannot leverage your existing Scala (or any other programming language) experience to help you. The fact that they are similar-but-different adds additional confusion when a programmer mistakes one for the other.
  • Overall, the complexity of Layer 2's evaluation semantics and its not-quite-the-same similarity to normal Scala semantics adds a huge load to anyone trying to understand how SBT works.

What exactly is lacking in the tool support?

  • Scala's debugging tools only allow you to debug layer 1 because it is plain Scala
  • Once you are in Layer 2 (or Layer 3), you are no longer writing Scala code. All your Scala tooling no longer applies:
  • Your Scala editors can't jump to where your Layer 2 settingKey or taskKey values were assigned,
  • Your Scala debuggers can't step through the Seq[Setting] interpreter
  • You can't put println s in between Setting s to see what a settingKey or taskKey is bound to at a particular point in the Layer 2 execution.
  • Rather than writing Scala code to "do stuff", you end up programming a meta-interpreter running Scala along with two strange, un-specced programming languages with no tool support, all just so you can zip your classfiles into a jar.
How does sbt handle namespacing?
  • It doesn't
  • Keys in the sbt are not namespaced into modules
  • Programmers can prefix keys with subproject names like C programmers used to
  • Layer 1 is immutable, but layer 2 is not
How does sbt handle caching?
  • By default, nothing is cached
  • It is left to library developers to determine what is cached and what is not
  • Even makefiles have a concept of only recompiling what has changed and what modules depend on it
  • Try using bazel for this type of functionality
How does sbt handle classpaths?
  • By default, all projects will be jammed into a single classpath
  • This allows the sbt file to be type safe, but disallows versioning of modules
  • type safety is not MORE important than the other features many package managers provide
How bad is it?
  • This is more than a PR
  • At least a succient analysis of the tool's flaws allow us to approach this problem thoughtfully

After

Generate a few questions that you'd ask someone else to gauge how well they comprehended this text


References

original post

Previous: Distributed video processing at Facebook scale