lockfile-lib-misconception.md (9822B)
1 --- 2 title: 'The controversy and misconception around package managers lockfile in 3 libraries' 4 date: '2022-01-31T00:48:00+01:00' 5 tags: ['lockfiles', 'package-managers', 'packaging', 'dependencies', 6 'reproducible-builds', 'reproducible', 'deterministic'] 7 description: "This post describes the common misconception and controversy 8 around package managers philosophy about the abomination of lockfiles in 9 packages, more specifically in libraries." 10 --- 11 12 **NOTE:** I'm making this because I feel like this should be clarified. I had a 13 lot of discussions where people were biased by other opinions, mostly due to 14 spread misconceptions about these files. I can update this later with a more 15 fundamental reason if I see even more controversy on discussions I have in a 16 near future. 17 18 A lot of package managers use a lockfile mechanism to reliably reproduce their 19 packages across different environments. This mechanism is used when other 20 environments build packages that do not use pinned dependency versions and end 21 up using a modified version compatible with the version expression specified in 22 the package manifest file. 23 24 ## But what is a lock file? 25 26 A lockfile is normally a generated file by the package manager that contains 27 the information about the exact versions currently used to build a package 28 successfully the way is intended to be. Some lockfiles include other metadata 29 about the package used such as checksums to ensure integrity. 30 31 ### Advantages 32 33 One of the biggest advantages is to make deterministic dependency resolution. 34 This way, package managers can more easily replicate the same builds on 35 different environments. With that, you can also build your own package more 36 consistently. Both help users and developers seeking problems and can decrease 37 test suite failures on development. 38 39 ### Security risk 40 41 It is straightforward to understand the advantages of a lockfile but some 42 people don't understand the _" disadvantages"_ and intentionally skip lockfiles 43 review, which is a tremendously bad idea; let me explain to you why. 44 45 For example, GitHub has a bad security issue related to detached references 46 that allow any user to create a fork of the repo and associate a commit to the 47 original repo. See the example of Linus Torvalds' `linux` mirror with 48 [this](https://github.com/torvalds/linux/tree/8bcab0346d4fcf21b97046eb44db8cf37ddd6da0) 49 commit. Because git obviously allows anyone to create commits with any email, 50 GitHub automatically associates it as the real Linus Torvalds. So a user can 51 easily create a security vulnerability, bump a patch version and change the 52 commit hash on the lockfile accordingly, making it a poisoned environment, 53 without being too obvious. 54 55 Another problem related to other external services can rise. The problem here 56 is trusting the source and the underlying service, falling into accepting 57 changes that can't be easily verified by just looking at it. Should we trust 58 any commit coming from `github.com/torvalds/linux`? Yes, but apparently not. 59 60 This can be mitigated using GPG signatures. Linux releases, for example, are 61 all signed by a group of trusted keys. That way we can trust that release tag 62 by verifying the underlying signature. 63 64 Therefore committing these files should be made and reviewed with caution and 65 in a trusted environment. 66 67 ## You might say, why not pin the versions, instead? 68 69 Well, if [semantic versioning](https://semver.org/) standard is strictly 70 followed by the package maintainers, breaking changes wouldn't be a big of a 71 deal, although, the package will always be different. In a real-world 72 situation, breaking changes happen all the time, even if the intention is only 73 a bug fix. Sometimes things get out of control and because of today's systems 74 complexity, regression bugs can easily happen. 75 76 However, specifying a version range covering patches or minor versions is more 77 practically useful for a situation where the latest non-breakable version is 78 preferable. 79 80 Also, just pinning versions doesn't solve integrity issues and lockfiles does. 81 82 ## Why reproducible builds are important? 83 84 Quoting Reproducible Builds project: 85 86 > The motivation behind the Reproducible Builds project is therefore to allow 87 > verification that no vulnerabilities or backdoors have been introduced during 88 > this compilation process. By promising identical results are always generated 89 > from a given source, this allows multiple third parties to come to a 90 > consensus on a “correct” result, highlighting any deviations as suspect and 91 > worthy of scrutiny. 92 93 You can read more about the project and their motivation along with tools to 94 make your builds more reproducible, [here](https://reproducible-builds.org/). 95 96 ## The controversy and misconception part 97 98 Here is where the rant starts. A lot of package managers and maintainers have, 99 behind them, a strong philosophy about not including lockfiles for libraries. 100 This is something I can't really understand, given the rationale. 101 102 For example, the Rust package manager, Cargo, doesn't generate lockfiles by 103 default for libraries: 104 105 > This property is most desirable from applications and packages which are at 106 > the very end of the dependency chain (binaries). As a result, it is 107 > recommended that all binaries check in their Cargo.lock. 108 > 109 > For libraries the situation is somewhat different. A library is not only used 110 > by the library developers, but also any downstream consumers of the library. 111 > Users dependent on the library will not inspect the library’s Cargo.lock 112 > (even if it exists). This is precisely because a library should not be 113 > deterministically recompiled for all users of the library. 114 115 You can read more about this on "The Cargo Book", 116 [here](https://doc.rust-lang.org/cargo/faq.html#why-do-binaries-have-cargolock-in-version-control-but-not-libraries). 117 118 The last sentence is just wrong. Libraries should indeed be deterministically 119 recompiled to ensure consistency. That might not be true for the end-user, but 120 this is essential for the library developers to detect if an introduced change 121 caused problems. Essentially, packages should test their environment against 122 their supported version ranges and their locked reproducible environment. 123 Testing only one of those is wrong, and that is probably the root of the 124 misconception. 125 126 Many other package managers do claim the same thing and it seems they justify 127 themselves with each other claims. The worst part is the fact that some 128 maintainers decline including lockfiles in their projects and others create 129 pull requests/issues requesting to remove lockfiles, without thinking logically 130 about the problem and consequences, hence my frustration. 131 132 Fortunately, there is some clarification articles out there, including 133 [yarn](https://classic.yarnpkg.com/blog/2016/11/24/lockfiles-for-all/) blog 134 post and [Shalvah's blog 135 post](https://blog.shalvah.me/posts/understanding-lockfiles) that you should 136 check out, although there is a lot of bold claims that don't make sense. 137 138 From 139 [sindresorhus/ama/issues/479](https://github.com/sindresorhus/ama/issues/479#issuecomment-309440715): 140 141 > The lockfile defeats the whole purpose of the caret ^ that is the default 142 > save behavior. And it prevents us from getting security patches immediately, 143 > which is insane. There are more good updates than bad updates, so it does 144 > more harm than good. The idea that it protects us from malicious code is 145 > silly because there's no way in hell that people are actually auditing the 146 > entire dependency graph when they do finally get around to updating the 147 > lockfile. It's a fallacy that leads to a false sense of security. 148 149 Lockfiles are NOT there to prevent security issues, they are there to reproduce 150 environments. If you rely on lockfiles for security, you are doing it wrong. 151 Nothing prevents you from ignoring the lockfile as a user and you should patch 152 upstream if there is a security issue on some dependency. As a developer you 153 might want to proactively update that file, but also keep them to reproduce 154 your application/library. 155 156 From [dev.to, When not to use 157 package-lock.json](https://dev.to/gajus/stop-using-package-lock-json-or-yarn-lock-3ddi): 158 159 > The origin of this misuse is NPM documentation. It should instead explain 160 > that package-lock.json should only be committed to the source code version 161 > control when the project is not a dependency of other projects, i.e. 162 > package-lock.json should only by committed to source code version control for 163 > top-level projects (programs consumed by the end user, not other programs). 164 > 165 > [...] 166 > 167 > I would support a variation of package-lock.json if it could somehow only 168 > apply to devDependencies. I can see some (albeit small and with tradeoffs) 169 > benefit to wanting your development environment not break if there is a 170 > broken release among your dependencies. 171 172 Testing only with lockfiles is wrong, as well as living in the bleeding edge 173 world by testing only with the latest version. You should test both or ideally, 174 all the versions your manifest file supports. Only covering `devDependencies` 175 is also a claim that makes zero sense. Normal `dependencies` may not be part of 176 the build process but of the execution/runtime part of your 177 application/library, and should indeed be reproducible. 178 179 ## Conclusion 180 181 The conclusion is simple, please consider using a lockfile. Don't assume that 182 semantic versioning is followed strictly because that is utopic. Also, make 183 your testsuite deterministic and wide to your dependency requirements. Someone 184 from the outside will touch your library, possibly try to contribute and 185 complain about their testsuite failing due to an unknown 186 [Heisenbug](https://ipfs.io/ipfs/bafkreigrsldz4g6eubx47ubp7qh7bqr4cd4copde35awdac3w6bwbt2lem), 187 most likely a side effect caused by a dependency, conducting an effort to 188 discover a problem, just because you are against lockfiles.