API Analysis: Google Summer of Code 2022
This summer I wrote static analysis tools and performed experiments on the pub.dev ecosystem of packages.
Introduction
Every Dart package has a public API, or the set of symbols available for other packages and applications to use in their code. It is the job of developers and package maintainers to assign version numbers to the different releases of their package according to the semantic versioning (semver) specification1. By looking at the changes in version numbers between releases of a package, developers know what they can expect to change between releases without examining the source code themselves. However, if a package maintainer assigns an incorrect version number to a package release, unexpected bugs can appear in dependent packages because of the false assumptions made by the developers of those dependent packages.
It is often challenging to make backwards-compatible changes to an existing package, which is already depended on by other packages and applications. When developers make breaking changes to a package without honouring semver, they risk breaking other people’s dependent code.
This problem can also apply in reverse: when a dependent package updates their code, relying on a breaking change made by a dependency, it is easy to forget to update the version constraint associated with this dependency, allowing releases of the dependency before the breaking change. I will call this reverse problem a “lower bound constraint issue”, or “issue” for short; I will call the process of identifying issues “lower bound constraint analysis”.
My work aims to provide a framework for automatically identifying these problems: incompatibilities between a package and a particular version of one of its dependencies, or semver violations between two versions of one package (this can also be thought of as a kind of ‘incompatibility’). I will call this process “API analysis”.
In particular, I have written an experimental tool which can identify a subset of issues that exist in a given package.
Package summary, the *Shape model
Before lower bound constraint analysis can take place, it is necessary to be able to build a model of the public API of a package, and determine which symbols are available for other packages and applications to use.
For the purposes of API analysis, the public API of a package is summarized as a PackageShape
object, which itself contains various other *Shape
objects describing the members of the package, such as top-level getters/setters, functions, classes, extensions and typedefs, as well as information on libraries and the symbols that they export. The members of each class and extension are also recorded. This will be referred to as a “package summary”, or “summary” for short.
Note that in place of properties, the summary contains discrete getters and setters. Note also that no type information is stored. See shapes.dart for the complete *Shape
model.
Example
Consider a package consisting of the following .dart
library files:
lib/a.dart
|
|
lib/b.dart
|
|
The JSON form of the summary of this package is the following:
Click to expand
|
|
Lower bound constraint analysis
This latter problem discussed in the introduction, a lower bound constraint issue, will be loosely defined as the scenario of a package specifying a wider range of allowed versions of a particular dependency than the range of versions which define the symbols defined in this dependency and used by the package. Specifying the wrong dependency constraint is a bug.
Lower bound constraint analysis aims to identify cases of these issues across the Dart ecosystem.
Example
Let’s take a closer look at what can happen to cause an issue.
Suppose you are the developer of bar
, which is a package that depends on foo
with the constraint ^1.0.0
. For now, foo v1.0.0
is the only available version.
The developer of foo
decides to release v1.1.0
of their package, which changes its public API.
While working on your package bar
, you run dart pub upgrade
and rewrite a part of your code. You don’t realise that you are using a symbol which is only available in foo v1.1.0
. dart analyze
reports no errors or warnings2. An issue is introduced.
You publish your package as bar v1.0.0
.
Now I decide to write an app (or a package) named myapp
. For one reason or another, I have decided that I don’t want foo v1.1.0
, so I express that in my pubspec. I also want some version of bar
.
This is all well and good, but when I need to execute f()
which is defined in bar
, a bizarre thing happens: my code fails to compile due to errors in bar
, which are out of my control. This is a difficult thing to debug, because there is nothing wrong with any of the .dart
library files in either foo
, bar
or myapp
, but rather this is a bug in the dependency constraint imposed by bar
on foo
.
It should have been corrected to ^1.1.0
when you started using sayHi()
instead of sayHello()
.
Approach
Lower bound constraint analysis looks at packages like bar
and tries to find issues like the one outlined above. I will call the analysed package the “target package”, or “target” for short.
To perform lower bound constraint analysis on a target package, the following procedure is performed:
- Identify the dependencies of the target, produce a summary of each dependency at its lowest allowed version.
- Identify invocations of symbols in the source code of target, which are defined in one of its direct dependencies. For each invocation, check if the corresponding identifier exists in the summary of the corresponding dependency produced in step 1. If not, an issue has been identified.
- For each identified issue in step 2, produce a report.
Reporting
Distinctions must be made between issues, incorrect lower bound constraints and references to symbols associated with an issue. These distinctions are important when interpreting results from running analysis on pub.dev.
- An issue is a case of one symbol being missing from the lowest allowed version of the dependency in which it is defined.
- The symbol referenced in one issue may have a list of references, namely invocations of the missing symbol in source code of the target package.
- One incorrect lower bound constraint can give rise to many issues, in cases where multiple symbols are missing from the lowest allowed version of one dependency, in which they are defined.
Example
The following is an example report associated with an incorrect lower bound dependency constraint imposed by the package mockito
on the package analyzer
.
Click to expand
|
|
Over-approximations and simplifications
Dart is a complex language and it is difficult to identify what is and isn’t a breaking change in a public API of a given package. The criterion for a breaking change for the purposes of API analysis is the removal/renaming of a public symbol, for example a class, a class member or a top-level function.
There exist many ways to subtly cause the same problems as issues do, like changing the arguments list or the return type of a method/function. In particular, the inheritance relationships that exist between types make it difficult to determine whether changing the type of a symbol from one to another is a breaking change. For these reasons, lower bound constraint analysis only considers the existence of public symbols and the identifiers (names) which can be used to access them. After having been constrained to a smaller class of bugs, issues, it becomes easier to perform static analysis and identify them.
The target package
As the version solving algorithm used by pub favours more recent versions, issues will likely not lead to unexpected behaviour when developing the target package. However, any packages which themselves have a dependency on the target may introduce tighter dependency constraints, leading to the possibility that symbols required by the target are not found and the dependent package fails to compile as a result.
Despite the fact that compilation errors do not usually occur when developing the target, issues are always caused by wrong dependency constrains in the pubspec of the target, not in that of a package which depends on the target. The solution to resolve an issue is always to bump up the lower bound version constraint of a dependency which provides the symbol in question. In this way, any packages depending on target and imposing even tighter version constraints (as demonstrated in the example) might fail version solving instead of failing to compile, which is another problem altogether – one for the developers of the dependent package to solve.
Dev dependencies and the dummy package
A dummy package was used in cases where static analysis was to be performed on a published package. A dummy package was created, with just one direct dependency specified in its pubspec and with an exact version as its constraint – this corresponds to the name and version of the required published package.
This approach avoids the need to download dev dependencies, which should not be taken into account for the purposes of lower bound constraint analysis.
Note that this approach can only be used to perform static analysis on packages published to pub.dev .
Development and source code
The development history and code review associated with this project can be found at these pull requests, in chronological order:
- https://github.com/CicadaCinema/pana/pull/1
- https://github.com/CicadaCinema/pana/pull/2
- https://github.com/dart-lang/pana/pull/1121
Refer to this document for information on how to run API analysis yourself, or this document for a starting point on the tests written alongside this project, which demonstrate many summary and lower bound constraint analysis features at a glance.
Running analysis on pub.dev
To see whether issues are prevelant in the pub.dev ecosystem of packages, lower bound constraint analysis was run on all eligible packages published to pub.dev . The following are the results of this analysis (see reporting to read about an important distinction to be aware of when interpreting these results):
- Analysis was performed with the version of the tools associated with this commit.
- 18326 packages were considered eligible at the time of analysis, of which 45 failed version solving.
- 568 issues were identified.
- 206 distinct incorrect dependency constraints were identified.
- The following table shows the top 10 most frequently occuring dependency packages across all the aforementioned incorrect dependency constraints:
dependency frequency analyzer
24 url_launcher
16 dio
11 get
9 collection
6 json_annotation
6 permission_handler
6 code_builder
5 image_picker
5 petitparser
5 - 171 distinct target packages imposed at least one incorrect constraint on at least one of their dependencies. This represents over 0.9% of eligible packages.
- 97 distinct packages were depended upon with an incorrect dependency constraint.
Impact
As part of my work, I filed the following pull requests upon noticing an issue in a notable package:
- https://github.com/dart-lang/sdk/pull/49552
- https://github.com/dart-lang/pana/pull/1098
- https://github.com/dart-lang/pana/pull/1103
- https://github.com/flutter/plugins/pull/6176
- https://github.com/dart-lang/mockito/pull/558
- https://github.com/flutter/plugins/pull/6202
A new CI check was introduced in the flutter/plugins repository (also see this associated issue).
Related
https://github.com/dart-lang/dartdoc
https://github.com/google/dart-shapeshift
-
In fact, the Dart ecosystem makes a few modifications to semver which you can read about on the versioning page of the Dart website and in the readme of the
pub_semver
package. ↩︎ -
Running
dart pub downgrade
beforedart analyze
would have caught the issue in this case, and this often works on real world packages, but in other cases, there are more dependencies at play, notably dev dependencies, which make issues impossible to catch using this method. ↩︎