An Experience Report on Producing Verifiable Builds for Large-Scale Commercial Systems

Build verifiability is a safety property for a software system which can be used to check against various security-related issues during the build process. In summary, a verifiable build generates equivalent build artifacts for every build instance, allowing independent auditors to verify that the g...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on software engineering Vol. 48; no. 9; pp. 3361 - 3377
Main Authors:	Shi, Yong, Wen, Mingzhi, Cogo, Filipe R., Chen, Boyuan, Jiang, Zhen Ming
Format:	Journal Article
Language:	English
Published:	New York IEEE 01-09-2022 IEEE Computer Society
Subjects:	build system Codification Determinism Equivalence large scale commercial system Process control Safety Security Software Software engineering Source code System implementation trustworthiness Verifiable build
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Build verifiability is a safety property for a software system which can be used to check against various security-related issues during the build process. In summary, a verifiable build generates equivalent build artifacts for every build instance, allowing independent auditors to verify that the generated artifacts correspond to their source code. Producing a verifiable build is a very challenging problem, as non-equivalences in the build artifacts can be caused by non-determinsm from the build environment, the build toolchain, or the system implementation. Existing research and practices on build verifiability mainly focus on remediating sources of non-determinism. However, such a process does not work well with large-scale commercial systems (LSCSs) due to their stringent security requirements, complex third party dependencies, and large volumes of code changes. In this paper, we present an experience report on using a unified process and a toolkit to produce verifiable builds for LSCSs. A unified process contrasts with the existing practices in which recommendations to mitigate sources of non-determinism are proposed on a case-by-case basis and are not codified in a comprehensive tool. Our approach supports the following three strategies to systematically mitigate non-equivalences in the build artifacts: remediation, controlling, and interpretation. Case study on three LSCSs within <inline-formula><tex-math notation="LaTeX">{{\sf Huawei}}</tex-math> <mml:math><mml:mi mathvariant="sans-serif">Huawei</mml:mi></mml:math><inline-graphic xlink:href="cogo-ieq1-3092692.gif"/> </inline-formula> shows that our approach is able to increase the proportion of verified build artifacts from less than 50 to 100 percent. To cross-validate our approach, we successfully applied our approach to build 2,218 open source packages distributed under <inline-formula><tex-math notation="LaTeX">{{\sf CentOS}}</tex-math> <mml:math><mml:mi mathvariant="sans-serif">CentOS</mml:mi></mml:math><inline-graphic xlink:href="cogo-ieq2-3092692.gif"/> </inline-formula> 7.8, increasing the proportion of verified build artifacts from 85 to 99 percent with minimal human intervention. We also provide an overview of our mitigation guideline, which describes the recommended strategies to mitigate various non-equivalences. Finally, we present some discussions and open research problems in this area based on our experience and lessons learned in the past few years of applying our approach within the company. This paper will be useful for practitioners and software engineering researchers who are interested in build verifiability.
ISSN:	0098-5589 1939-3520
DOI:	10.1109/TSE.2021.3092692