git

My personal website source code
Log | Files | Refs | Submodules | README | LICENSE

lockfile-lib-misconception.md (9822B)


      1 ---
      2 title: 'The controversy and misconception around package managers lockfile in
      3 libraries'
      4 date: '2022-01-31T00:48:00+01:00'
      5 tags: ['lockfiles', 'package-managers', 'packaging', 'dependencies',
      6 'reproducible-builds', 'reproducible', 'deterministic']
      7 description: "This post describes the common misconception and controversy
      8 around package managers philosophy about the abomination of lockfiles in
      9 packages, more specifically in libraries."
     10 ---
     11 
     12 **NOTE:** I'm making this because I feel like this should be clarified. I had a
     13 lot of discussions where people were biased by other opinions, mostly due to
     14 spread misconceptions about these files. I can update this later with a more
     15 fundamental reason if I see even more controversy on discussions I have in a
     16 near future.
     17 
     18 A lot of package managers use a lockfile mechanism to reliably reproduce their
     19 packages across different environments. This mechanism is used when other
     20 environments build packages that do not use pinned dependency versions and end
     21 up using a modified version compatible with the version expression specified in
     22 the package manifest file.
     23 
     24 ## But what is a lock file?
     25 
     26 A lockfile is normally a generated file by the package manager that contains
     27 the information about the exact versions currently used to build a package
     28 successfully the way is intended to be. Some lockfiles include other metadata
     29 about the package used such as checksums to ensure integrity.
     30 
     31 ### Advantages
     32 
     33 One of the biggest advantages is to make deterministic dependency resolution.
     34 This way, package managers can more easily replicate the same builds on
     35 different environments. With that, you can also build your own package more
     36 consistently. Both help users and developers seeking problems and can decrease
     37 test suite failures on development.
     38 
     39 ### Security risk
     40 
     41 It is straightforward to understand the advantages of a lockfile but some
     42 people don't understand the _" disadvantages"_ and intentionally skip lockfiles
     43 review, which is a tremendously bad idea; let me explain to you why.
     44 
     45 For example, GitHub has a bad security issue related to detached references
     46 that allow any user to create a fork of the repo and associate a commit to the
     47 original repo. See the example of Linus Torvalds' `linux` mirror with
     48 [this](https://github.com/torvalds/linux/tree/8bcab0346d4fcf21b97046eb44db8cf37ddd6da0)
     49 commit. Because git obviously allows anyone to create commits with any email,
     50 GitHub automatically associates it as the real Linus Torvalds. So a user can
     51 easily create a security vulnerability, bump a patch version and change the
     52 commit hash on the lockfile accordingly, making it a poisoned environment,
     53 without being too obvious.
     54 
     55 Another problem related to other external services can rise. The problem here
     56 is trusting the source and the underlying service, falling into accepting
     57 changes that can't be easily verified by just looking at it. Should we trust
     58 any commit coming from `github.com/torvalds/linux`? Yes, but apparently not.
     59 
     60 This can be mitigated using GPG signatures. Linux releases, for example, are
     61 all signed by a group of trusted keys. That way we can trust that release tag
     62 by verifying the underlying signature.
     63 
     64 Therefore committing these files should be made and reviewed with caution and
     65 in a trusted environment.
     66 
     67 ## You might say, why not pin the versions, instead?
     68 
     69 Well, if [semantic versioning](https://semver.org/) standard is strictly
     70 followed by the package maintainers, breaking changes wouldn't be a big of a
     71 deal, although, the package will always be different. In a real-world
     72 situation, breaking changes happen all the time, even if the intention is only
     73 a bug fix. Sometimes things get out of control and because of today's systems
     74 complexity, regression bugs can easily happen.
     75 
     76 However, specifying a version range covering patches or minor versions is more
     77 practically useful for a situation where the latest non-breakable version is
     78 preferable.
     79 
     80 Also, just pinning versions doesn't solve integrity issues and lockfiles does.
     81 
     82 ## Why reproducible builds are important?
     83 
     84 Quoting Reproducible Builds project:
     85 
     86 > The motivation behind the Reproducible Builds project is therefore to allow
     87 > verification that no vulnerabilities or backdoors have been introduced during
     88 > this compilation process. By promising identical results are always generated
     89 > from a given source, this allows multiple third parties to come to a
     90 > consensus on a “correct” result, highlighting any deviations as suspect and
     91 > worthy of scrutiny.
     92 
     93 You can read more about the project and their motivation along with tools to
     94 make your builds more reproducible, [here](https://reproducible-builds.org/).
     95 
     96 ## The controversy and misconception part
     97 
     98 Here is where the rant starts. A lot of package managers and maintainers have,
     99 behind them, a strong philosophy about not including lockfiles for libraries.
    100 This is something I can't really understand, given the rationale.
    101 
    102 For example, the Rust package manager, Cargo, doesn't generate lockfiles by
    103 default for libraries:
    104 
    105 > This property is most desirable from applications and packages which are at
    106 > the very end of the dependency chain (binaries). As a result, it is
    107 > recommended that all binaries check in their Cargo.lock.
    108 >
    109 > For libraries the situation is somewhat different. A library is not only used
    110 > by the library developers, but also any downstream consumers of the library.
    111 > Users dependent on the library will not inspect the library’s Cargo.lock
    112 > (even if it exists). This is precisely because a library should not be
    113 > deterministically recompiled for all users of the library.
    114 
    115 You can read more about this on "The Cargo Book",
    116 [here](https://doc.rust-lang.org/cargo/faq.html#why-do-binaries-have-cargolock-in-version-control-but-not-libraries).
    117 
    118 The last sentence is just wrong. Libraries should indeed be deterministically
    119 recompiled to ensure consistency. That might not be true for the end-user, but
    120 this is essential for the library developers to detect if an introduced change
    121 caused problems. Essentially, packages should test their environment against
    122 their supported version ranges and their locked reproducible environment.
    123 Testing only one of those is wrong, and that is probably the root of the
    124 misconception.
    125 
    126 Many other package managers do claim the same thing and it seems they justify
    127 themselves with each other claims. The worst part is the fact that some
    128 maintainers decline including lockfiles in their projects and others create
    129 pull requests/issues requesting to remove lockfiles, without thinking logically
    130 about the problem and consequences, hence my frustration.
    131 
    132 Fortunately, there is some clarification articles out there, including
    133 [yarn](https://classic.yarnpkg.com/blog/2016/11/24/lockfiles-for-all/) blog
    134 post and [Shalvah's blog
    135 post](https://blog.shalvah.me/posts/understanding-lockfiles) that you should
    136 check out, although there is a lot of bold claims that don't make sense.
    137 
    138 From
    139 [sindresorhus/ama/issues/479](https://github.com/sindresorhus/ama/issues/479#issuecomment-309440715):
    140 
    141 > The lockfile defeats the whole purpose of the caret ^ that is the default
    142 > save behavior. And it prevents us from getting security patches immediately,
    143 > which is insane. There are more good updates than bad updates, so it does
    144 > more harm than good. The idea that it protects us from malicious code is
    145 > silly because there's no way in hell that people are actually auditing the
    146 > entire dependency graph when they do finally get around to updating the
    147 > lockfile. It's a fallacy that leads to a false sense of security.
    148 
    149 Lockfiles are NOT there to prevent security issues, they are there to reproduce
    150 environments. If you rely on lockfiles for security, you are doing it wrong.
    151 Nothing prevents you from ignoring the lockfile as a user and you should patch
    152 upstream if there is a security issue on some dependency. As a developer you
    153 might want to proactively update that file, but also keep them to reproduce
    154 your application/library.
    155 
    156 From [dev.to, When not to use
    157 package-lock.json](https://dev.to/gajus/stop-using-package-lock-json-or-yarn-lock-3ddi):
    158 
    159 > The origin of this misuse is NPM documentation. It should instead explain
    160 > that package-lock.json should only be committed to the source code version
    161 > control when the project is not a dependency of other projects, i.e.
    162 > package-lock.json should only by committed to source code version control for
    163 > top-level projects (programs consumed by the end user, not other programs).
    164 >
    165 > [...]
    166 >
    167 > I would support a variation of package-lock.json if it could somehow only
    168 > apply to devDependencies. I can see some (albeit small and with tradeoffs)
    169 > benefit to wanting your development environment not break if there is a
    170 > broken release among your dependencies.
    171 
    172 Testing only with lockfiles is wrong, as well as living in the bleeding edge
    173 world by testing only with the latest version. You should test both or ideally,
    174 all the versions your manifest file supports. Only covering `devDependencies`
    175 is also a claim that makes zero sense. Normal `dependencies` may not be part of
    176 the build process but of the execution/runtime part of your
    177 application/library, and should indeed be reproducible.
    178 
    179 ## Conclusion
    180 
    181 The conclusion is simple, please consider using a lockfile. Don't assume that
    182 semantic versioning is followed strictly because that is utopic. Also, make
    183 your testsuite deterministic and wide to your dependency requirements. Someone
    184 from the outside will touch your library, possibly try to contribute and
    185 complain about their testsuite failing due to an unknown
    186 [Heisenbug](https://ipfs.io/ipfs/bafkreigrsldz4g6eubx47ubp7qh7bqr4cd4copde35awdac3w6bwbt2lem),
    187 most likely a side effect caused by a dependency, conducting an effort to
    188 discover a problem, just because you are against lockfiles.