In a statically typed language like Java, we can significantly reduce the number of bugs in our code by leveraging types to model the problems we are solving.
The more accurately we model our problem, the fewer invalid states that are even compilable.
Let’s consider a method that processes a learner based on their current class:
public boolean doSomething(int userId, String studentClass){
//...
}
For this example, only four values really make
sense, conceptually, but the String type permits
a significantly larger set of values.
Almost every possible input will be invalid!
With sufficient unit tests, we might be able to make it safe…
When a String is used to represent data that has a more specific,
constrained set of values, it’s often referred to as being
“stringly typed”.
This is generally considered an anti-pattern because it defers type-checking to runtime, requires extensive validation, and reduces readability.
As an alternative, we could simply create a four-valued type which exactly matches the valid states in our model.
public enum StudentClass {
FRESHMAN,
SOPHOMORE,
JUNIOR,
SENIOR
}
With this type, we can use
public boolean doSomething(int userId, StudentClass studentClass){
//...
}
Now, we don’t need to test for invalid classes. Actually, we can’t even write such tests, because the invalid code won’t compile!
(ignoring null values…)
By using a well modeled, constrained type, we can also do other things more easily. Imagine such code:
if(studentClass.equals(FRESHMAN)){
return freshmanStuff;
}else if(studentClass.equals(JUNIOR){
return juniorStuff;
}else{
return seniorStuff;
}
We accidentally forget the Sophomores, who now get processed as Seniors!
Using modern Java features, we could instead write:
return switch(studentClass){
case FRESHMAN -> freshmanStuff;
case JUNIOR -> juniorStuff;
case SENIOR -> seniorStuff;
}
At compile time, this will fail, because we have not
exhaustively covered all cases! This will force us
to either add logic for Sophomores or to add a default
case (which you should generally avoid!).
This also protects us in the future! Imagine if
we get a feature request to add support for GRAD
students. We can add this to the enum, and the compiler
will tell us all locations in the code that must be
updated to support this new value in the type!
There is even support for testing such types in unit tests:
@ParameterizedTest
@EnumSource(StudentClass.class)
void someTest(StudentClass studentClass){
// ... Set up test data ...
final var result = doSomething(id, studentClass);
// ... Test result ...
}
If we add an enum value, and have parameterized the unit tests for all code using the enum, then all of our tests will automatically start covering the new value!
Even if some large, primitive type accurately models some data, it’s still probably a good idea to partition the values in your code so you don’t use them in the wrong place.
For instance, instead of passing in an int, we could
define:
public class UserId{
private final int id;
// ...
}
So, now, we don’t have to worry about passing in an int that does not represent a user’s id:
public boolean doSomething(UserId id, StudentClass studentClass){
// ...
}
Some languages, like Ada, allow you to create new types from existing ones that share representation but are incompatible.
type Celsius is new Float;
type Fahrenheit is new Float;
Now, you cannot accidentally assign a Celsius
value to a variable expecting Fahrenheit, even
though they are both just floats at runtime!
This is much nicer than Java, because you don’t have to unwrap values to access the underlying primitives.
Many languages offer type “aliases” which allow you to use an alternative name for a type, but they do not create a new type and can be used interchangebly with the original type!
There are some obvious limitations, though:
At some point we have raw input that could be invalid, but we only have to validate once at the edge of our code.
Java doesn’t care if you just sprinkle null values around, so many of the protections assume you have not done so irresponsibly.
Often, people will implement an expression structure using such nodes:
public class Expression{
private int value;
private Operation op;
private Expression left;
private Expression right;
}
This requires a lot of safety checks, etc. to make sure that the code is used correctly.
public interface Expression {}
public class Constant implements Expression {
private final int value;
// ...
}
public class BinOp implements Expression {
private Operation op;
private final Expression left;
private final Expression right;
// ...
}
Now, we don’t need to check if there are children or a value, etc. This is encoded in the type itself.
We can also leverage the types to process the tree:
int compute(Expression e){
return switch(e){
case Constant c -> c.value();
case BinOp bo -> doOp(
compute(bo.left()),
compute(bo.right()),
bo.op);
default -> throw new IllegalArgumentException("Unknown Type");
}
}
Because Expression is a standard interface, the compiler
cannot guarantee that Constant and BinOp are the
only implementations.
Therefore, the switch is considered not exhaustive,
and we are forced to handle the default case (or potential
unknown types) at runtime.
We can fix this by sealing the interface:
public sealed interface Expression
permits Constant, BinOp {}
Now the compiler knows the hierarchy is closed. It can
prove exhaustiveness at compile time, meaning we can
safely remove the default case!
public int doOp(int left, int right, Operations op){
return switch(op){
case ADD -> left + right;
case MULT -> left * right;
case SUB -> left - right;
}
}
public class Constant implements Expression{
// ...
}
public interface BinOp extends Expression{}
public class Mult implements BinOp {
// ...
}
public class Add implements BinOp {
// ...
}
// ...
switch(e){
case Constant c -> c.value();
case Add a -> compute(a.left())+compute(a.right());
case Mult m -> compute(m.left())*compute(m.right());
// ...
}
switch(e){
case Constant c -> c.value();
case BinOp b -> {
final var left = compute(b.left());
final var right = compute(b.right());
yield switch(b){
case ADD -> left + right;
case MULT -> left * right;
case SUB -> left -right;
// ...
}
}
}