An Experience Report on Producing Verifiable Builds for Large-Scale Commercial Systems
Build verifiability is a safety property for a software system which can be used to check against various security-related issues during the build process. In summary, a verifiable build generates equivalent build artifacts for every build instance, allowing independent auditors to verify that the g...
Saved in:
Published in: | IEEE transactions on software engineering Vol. 48; no. 9; pp. 3361 - 3377 |
---|---|
Main Authors: | , , , , |
Format: | Journal Article |
Language: | English |
Published: |
New York
IEEE
01-09-2022
IEEE Computer Society |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Build verifiability is a safety property for a software system which can be used to check against various security-related issues during the build process. In summary, a verifiable build generates equivalent build artifacts for every build instance, allowing independent auditors to verify that the generated artifacts correspond to their source code. Producing a verifiable build is a very challenging problem, as non-equivalences in the build artifacts can be caused by non-determinsm from the build environment, the build toolchain, or the system implementation. Existing research and practices on build verifiability mainly focus on remediating sources of non-determinism. However, such a process does not work well with large-scale commercial systems (LSCSs) due to their stringent security requirements, complex third party dependencies, and large volumes of code changes. In this paper, we present an experience report on using a unified process and a toolkit to produce verifiable builds for LSCSs. A unified process contrasts with the existing practices in which recommendations to mitigate sources of non-determinism are proposed on a case-by-case basis and are not codified in a comprehensive tool. Our approach supports the following three strategies to systematically mitigate non-equivalences in the build artifacts: remediation, controlling, and interpretation. Case study on three LSCSs within <inline-formula><tex-math notation="LaTeX">{{\sf Huawei}}</tex-math> <mml:math><mml:mi mathvariant="sans-serif">Huawei</mml:mi></mml:math><inline-graphic xlink:href="cogo-ieq1-3092692.gif"/> </inline-formula> shows that our approach is able to increase the proportion of verified build artifacts from less than 50 to 100 percent. To cross-validate our approach, we successfully applied our approach to build 2,218 open source packages distributed under <inline-formula><tex-math notation="LaTeX">{{\sf CentOS}}</tex-math> <mml:math><mml:mi mathvariant="sans-serif">CentOS</mml:mi></mml:math><inline-graphic xlink:href="cogo-ieq2-3092692.gif"/> </inline-formula> 7.8, increasing the proportion of verified build artifacts from 85 to 99 percent with minimal human intervention. We also provide an overview of our mitigation guideline, which describes the recommended strategies to mitigate various non-equivalences. Finally, we present some discussions and open research problems in this area based on our experience and lessons learned in the past few years of applying our approach within the company. This paper will be useful for practitioners and software engineering researchers who are interested in build verifiability. |
---|---|
ISSN: | 0098-5589 1939-3520 |
DOI: | 10.1109/TSE.2021.3092692 |