DLRN internals

This document aims at describing the inner workings of DLRN, so a new contributor can get up to speed as quickly as possible.

Main concepts

The following basic concepts are used in DLRN. You’ll need to get used to them if you want to understand the code:

  • Source Git: DLRN will always take the source code to build the package from a Git repository, regardless of the Source0 entry in the spec file.
  • Distgit: spec files are assumed to be present in a Git repository. DLRN has a driver-based mechanism to allow different options for the distgit location, see the Drivers section.
  • Project: each project corresponds with a package to be built. A package may define a number of subpackages in the spec file, but a single source RPM file is always created. The DLRN driver mechanism allows us to have different sources of information for the project list, such as rdoinfo or a single git repo.
  • Commit: a commit is the main abstraction used by DLRN. It aggregates all information related to each package built, such as: - Project name - Hash of the commit from the source git - Hash of the commit from the distgit - Build status (successful or not) - Name of the rpms

Directory structure

The DLRN codebase is structured as follows:

  • doc/: project documentation
  • scripts/: several useful scripts, used by CI and DLRN itself, plus some other miscellaneous files.
    • build_rpm.sh: script that calls mock to build rpm files (see the Building packages section).
    • submit_review.sh: script to open a Gerrit review after a build failure (see the Error reporting section).
    • centos.cfg and fedora.cfg: base mock configurations for CentOS and Fedora builders.
  • dlrn/: main DLRN code
    • build.py: build functions, described in detail in the Building packages section.
    • config.py: general configuration management.
    • db.py: database-related code.
    • notifications.py: error reporting functions.
    • purge.py: dlrn-purge command, used to reduce the disk usage on long-running instances.
    • reporting.py: reporting functions, create simple HTML reports of the repository status.
    • repositories.py: functions required to clone a git repo and get information from it.
    • rpmspecfile.py: basic rpm spec file parsing to be able to get package names and dependencies.
    • rsync.py: synchronizes yum repositories between servers, used to have a multi-node architecture.
    • shell.py: main file, reads command-line arguments and launches the build process.
    • utils.py: miscellaneous utilities.
    • api/: DLRN API code, described in detail in its own page.
    • drivers/: modular drivers for project listing and distgit location, described in the Drivers section.
    • migrations/: Alembic scripts for database maintenance.
    • stylesheets/: contains a CSS file used by the reporting module.
    • templates/: contains Jinja2 templates, also used by the reporting module.
    • tests/: unit tests.

High level algorithm

When DLRN is run, the following (simplified) sequence of events is executed:

fetch information for available projects
for each project
    find last processed commit
    refresh source git
    find last commit in source git and distgit
    if any commit is later than last_processed_commit
        add commit to list of commits to be processed
for each commit_to_be_processed
    build package
    create yum repo with built package and the latest versions of every other package
    if errors
        report via e-mail or Gerrit review if configured
    store commit in DB
    generate HTML report

Configuration

DLRN uses a simple INI file for its configuration. Most config options are located in the [DEFAULT] section and read during startup. Only driver-specific options have their own section.

The dlrn/config.py file defines a ConfigOptions class, that will create an object including all parsed options.

Drivers

Drivers are derived from the PkgInfoDriver class (see dlrn/drivers/pkginfo.py), and are used to:

  • Define a list of projects (packages) to be built
  • Define the source and distgit repos for each project
  • Fetch the new commits for each project’s source and distgit repos
  • Pre-process spec files, if needed

Each driver must provide the following methods:

  • getpackages(). This method will return a list of dictionaries. Each individual dict must contain the following mandatory keys (others are optional):
    • ‘name’: package name
    • ‘upstream’: URL for source repo
    • ‘master-distgit’: URL for distgit repo
    • ‘maintainers’: list of e-mail addresses for package maintainers
  • getinfo(). This method will return a list of commits to be processed for a specific package.
  • preprocess(). This method will run any required pre-processing for the spec files.
  • distgit_dir(). This method will return the distgit repo directory for a given package name.

You can check the code of the existing rdoinfo and gitrepo drivers to see their implementation specifics. If you create a new driver, you need to add the project name to the projects.ini configuration file, and if you need any new options, be sure to add them to a driver-specific section (see the Configuration section for details).

Building packages

The package build logic is included in build.py. There we have several functions:

  • build(). This is the function called externally. It gathers some configuration options and parameters, then calls build_rpm_wrapper to launch the build process and returns a list with the built rpms.
  • build_rpm_wrapper(). This wrapper function prepares the mock configuration file to be used during the build using the configuration. It will also add the most current repository to the mock configuration, so we can use packages in the current repository as dependencies during the build. Finally, it will spawn a Bash script, build_rpm.sh, which is in charge of the last step in the build chain.

The build_rpm.sh script takes care of running the mock command to create the package(s) from source. Mock requires a source RPM (SRPM) as input, so some additional magic is done inside it. Specifically:

  • The script tries to determine a version and release number for the package. This version number should be compatible with the Fedora guidelines, and allow upgrades from and to packages from stable releases, which is not always easy. We use the following algorithm:
    • For Python projects, take the output from python setup.py version. Most OpenStack projects use PBR, which gives us proper pre-versioning after a tagged release.
    • For Puppet projects, we take the version from the metadata.json or Modulefile files, if available, and increase the .Z version if there are any commits after the tagged release.
    • For other projects, we take the version number from the latest git tag.
    • If everything fails, default to version 0.0.1.
    • The release number is always 0.<date>.<upstream source commit short hash>.
  • A tarball is generated using python setup.py sdist for Python projects, and tar for any other project. Then, the spec file is updated to use this tarball as Source0, and a source RPM is created.
  • Finally, mock is executed to build the final RPM, using the configuration created previously in build_rpm_wrapper.

Hashed yum repositories

Each build is stored on a separate directory. A hashed structure is used for the directories, such as cd/af/cdaf2c77d974d5e794909313dceb3554be69a42e_4b1619fe. In this structure, cdaf2c77d974d5e794909313dceb3554be69a42e is the commit hash for the source git repo, and 4b1619fe is the short hash for the distgit commit. The first two directory levels (cd/af) are taken from the commit hash.

Post-build actions

After a package is built, we need to create a package repository with the latest version for every package in the project list. The post_build() function in shell.py takes care of that. The idea behind this is that the repo for each build will contain the most current version of each package to date.

To minimize the amount of storage used for each repo, DLRN does not copy the packages to the current hashed directory. Instead, post_build() iterates through the list of packages, finding the RPMs for their latest successful builds, and symlinks them in the current hashed directory.

It is probably easier to understand with an example:

  • Initially, we only have source commit 010b0a and distgit commit 020202 for project foo, then its hashed repo will look like:

    01/0b/010b0a_020202/foo-<version>.el7.centos.noarch.rpm
    
  • Then, we build project bar, with source commit 030303 and distgit commit 040404. Its hashed repo will be:

    03/03/030303_040404/bar-<version>.el7.centos.noarch.rpm
    03/03/030303_040404/foo-<version>.el7.centos.noarch.rpm -> ../../../01/0b/010b0a_020202/foo-<version>.el7.centos.noarch.rpm
    

    And the same process will be followed for every new package.

Error reporting

DLRN allows two different ways to notify build errors, both included in notifications.py:

  • A notification e-mail, sent using the sendnotifymail() function. The mail recipient list is taken from the maintainers project property.
  • A Gerrit review. This option makes use of a utility script submit_review.sh and the configured options in options.ini to create the review. It also adds the project maintainers to the generated review.

API internals

The API is described in detail in its own documentation.