Skip to content

Commit

Permalink
Write custom tooling for publishing to BCR
Browse files Browse the repository at this point in the history
I had hoped to use Publish to BCR, but there are a few issues with it.

First, a security issue: Publish to BCR requires granting a third-party
app write access to the GitHub repository, even though it only reads
from the repository, which requires no special privileges to read a
repository: bazel-contrib/publish-to-bcr#157

Second, merely cutting a release is not sufficient to satisfy
https://blog.bazel.build/2023/02/15/github-archive-checksum.html
One needs to manually upload a release tarball that GitHub then stores
explicitly. (Perhaps someone should define a deterministic tarball
creation process for git revisions and end this silliness.) Since that
tarball is added by an individual developer, it seems poor that nothing
checks it against the git repository.

The BCR repository itself has some tooling for making a release. It
works by interactively asking questions (not automatable), but then
saves an undocumented JSON file with the answers. I've written a script
that generates the JSON file we need from a git tag. These JSON files
need to reference file paths, so they cannot be made standalone. (See
bazelbuild/bazel-central-registry#2781)
Instead, the script drops everything into a temporary directory.

Since BCR's limitations force us to do a lot of custom processing
anyway, I made the script check that:

1. The release tarball matches the archive tarball, which are stable
   enough in practice. This allows anyone to perform an easy
   (still GitHub-dependent) check that they match, unless GitHub
   changes the hash.

2. The tarball's contents match the git tag in the local repository, so
   we verify GitHub against the developer's workstation.

The script then prints a command to run in a local fork of the
bazel-central-registry repository to make a PR. Alas, even downloading
the tarball from GitHub takes a few seconds, so I had a bit of fun with
the script output.

Change-Id: I2a748309f63848ff097ee3c3e93e11751ef65cd7
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/71307
Reviewed-by: Adam Langley <[email protected]>
Auto-Submit: David Benjamin <[email protected]>
Commit-Queue: David Benjamin <[email protected]>
  • Loading branch information
davidben authored and Boringssl LUCI CQ committed Sep 16, 2024
1 parent ea0f164 commit fb61601
Show file tree
Hide file tree
Showing 5 changed files with 773 additions and 5 deletions.
10 changes: 5 additions & 5 deletions .bcr/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Publish to BCR Configuration
# BCR Configuration

This directory contains configuration for the Publish to BCR app, which
automates publishing releases to the Bazel Central Registry. See
https://github.com/bazel-contrib/publish-to-bcr/tree/main/templates for
details.
This directory contains configuration information for BCR. It is patterned after
the [Publish to BCR app](https://github.com/bazel-contrib/publish-to-bcr/tree/main/templates),
which we have [opted not to use](https://github.com/bazel-contrib/publish-to-bcr/issues/157).
However, `presubmit.yml` is used by [our own BCR tooling](../docs/releasing.md).
26 changes: 26 additions & 0 deletions docs/releasing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Cutting Periodic "Releases"

The [Bazel Central Registry](https://github.com/bazelbuild/bazel-central-registry)
needs versioned snapshots and cannot consume git revisions directly. To cut a
release, do the following:

1. Pick a new version. The current scheme is `0.YYYYMMDD.0`. If we need to cut
multiple releases in one day, increment the third digit.

2. Update `MODULE.bazel` with the new version and upload to Gerrit.

3. Once that CL lands, make a annotated git tag at the revision. This can be
[done from Gerrit](https://boringssl-review.googlesource.com/admin/repos/boringssl,tags).
The "Annotation" field must be non-empty. (Just using the name of the tag
again is fine.)

4. Create a corresponding GitHub [release](https://github.com/google/boringssl/releases/new).

5. Download the "Source code (tar.gz)" archive from the new release and
re-attach it to the release. (The next step will check that the archive is
correct.)

6. Run `go run ./util/prepare_bcr_module TAG` and follow the instructions. The
tool does not require special privileges, though it does fetch URLs from
GitHub and read the local checkout. It outputs a JSON file for BCR's tooling
to consume.
225 changes: 225 additions & 0 deletions util/prepare_bcr_module/git.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,225 @@
// Copyright (c) 2024, Google Inc.
//
// Permission to use, copy, modify, and/or distribute this software for any
// purpose with or without fee is hereby granted, provided that the above
// copyright notice and this permission notice appear in all copies.
//
// THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
// WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
// MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
// SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
// WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION
// OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN
// CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.

package main

import (
"bytes"
"cmp"
"crypto/sha256"
"fmt"
"os/exec"
"slices"
"strings"
"sync"
)

type treeEntryMode int

const (
treeEntryRegular treeEntryMode = iota
treeEntryExecutable
treeEntrySymlink
)

func (m treeEntryMode) String() string {
switch m {
case treeEntryRegular:
return "regular file"
case treeEntryExecutable:
return "executable file"
case treeEntrySymlink:
return "symbolic link"
}
panic(fmt.Sprintf("unknown mode %d", m))
}

type treeEntry struct {
path string
mode treeEntryMode
sha256 []byte
}

func sortTree(tree []treeEntry) {
slices.SortFunc(tree, func(a, b treeEntry) int { return cmp.Compare(a.path, b.path) })
}

func compareTrees(got, want []treeEntry) error {
// Check for duplicate files.
for i := 0; i < len(got)-1; i++ {
if got[i].path == got[i+1].path {
return fmt.Errorf("duplicate file %q in archive", got[i].path)
}
}

// Check for differences between the two trees.
for i := 0; i < len(got) && i < len(want); i++ {
if got[i].path == want[i].path {
if got[i].mode != want[i].mode {
return fmt.Errorf("file %q was a %s but should have been a %s", got[i].path, got[i].mode, want[i].mode)
}
if !bytes.Equal(got[i].sha256, want[i].sha256) {
return fmt.Errorf("hash of %q was %x but should have been %x", got[i].path, got[i].sha256, want[i].sha256)
}
} else if got[i].path < want[i].path {
return fmt.Errorf("unexpected file %q", got[i].path)
} else {
return fmt.Errorf("missing file %q", want[i].path)
}
}
if len(want) < len(got) {
return fmt.Errorf("unexpected file %q", got[len(want)].path)
}
if len(got) < len(want) {
return fmt.Errorf("missing file %q", want[len(got)].path)
}
return nil
}

type gitTreeEntry struct {
path string
mode treeEntryMode
objectName string
}

func gitListTree(treeish string) ([]gitTreeEntry, error) {
var stdout, stderr bytes.Buffer
cmd := exec.Command("git", "ls-tree", "-r", "-z", treeish)
cmd.Stdout = &stdout
cmd.Stderr = &stderr
if err := cmd.Run(); err != nil {
return nil, fmt.Errorf("error listing git tree %q: %w\n%s\n", treeish, err, stderr.String())
}
lines := strings.Split(stdout.String(), "\x00")
ret := make([]gitTreeEntry, 0, len(lines))
for _, line := range lines {
if len(line) == 0 {
continue
}

idx := strings.IndexByte(line, '\t')
if idx < 0 {
return nil, fmt.Errorf("could not parse ls-tree output %q", line)
}

info, path := line[:idx], line[idx+1:]
infos := strings.Split(info, " ")
if len(infos) != 3 {
return nil, fmt.Errorf("could not parse ls-tree output %q", line)
}

perms, objectType, objectName := infos[0], infos[1], infos[2]
if objectType != "blob" {
return nil, fmt.Errorf("unexpected object type in ls-tree output %q", line)
}

var mode treeEntryMode
switch perms {
case "100644":
mode = treeEntryRegular
case "100755":
mode = treeEntryExecutable
case "120000":
mode = treeEntrySymlink
default:
return nil, fmt.Errorf("unexpected file mode in ls-tree output %q", line)
}

ret = append(ret, gitTreeEntry{path: path, mode: mode, objectName: objectName})
}
return ret, nil
}

func gitHashBlob(objectName string) ([]byte, error) {
h := sha256.New()
var stderr bytes.Buffer
cmd := exec.Command("git", "cat-file", "blob", objectName)
cmd.Stdout = h
cmd.Stderr = &stderr
if err := cmd.Run(); err != nil {
return nil, fmt.Errorf("error hashing git object %q: %w\n%s\n", objectName, err, stderr.String())
}
return h.Sum(nil), nil
}

func gitHashTree(s *stepPrinter, treeish string) ([]treeEntry, error) {
gitTree, err := gitListTree(treeish)
if err != nil {
return nil, err
}

s.setTotal(len(gitTree))

// Hashing objects one by one is slow, so parallelize. Ideally we could
// just use the object name, but git uses SHA-1, so checking a SHA-265
// hash seems prudent.
var workerErr error
var workerLock sync.Mutex

var wg sync.WaitGroup
jobs := make(chan gitTreeEntry, *numWorkers)
results := make(chan treeEntry, *numWorkers)
for i := 0; i < *numWorkers; i++ {
wg.Add(1)
go func() {
defer wg.Done()
for job := range jobs {
workerLock.Lock()
shouldStop := workerErr != nil
workerLock.Unlock()
if shouldStop {
break
}

sha256, err := gitHashBlob(job.objectName)
if err != nil {
workerLock.Lock()
if workerErr == nil {
workerErr = err
}
workerLock.Unlock()
break
}

results <- treeEntry{path: job.path, mode: job.mode, sha256: sha256}
}
}()
}

go func() {
for _, job := range gitTree {
jobs <- job
}
close(jobs)
wg.Wait()
close(results)
}()

tree := make([]treeEntry, 0, len(gitTree))
for result := range results {
s.addProgress(1)
tree = append(tree, result)
}

if workerErr != nil {
return nil, workerErr
}

if len(tree) != len(gitTree) {
panic("input and output sizes did not match")
}

sortTree(tree)
return tree, nil
}
Loading

0 comments on commit fb61601

Please sign in to comment.