This post describes how to handle files that are used as assets by jobs and pipelines defined on a common gitlab-ci repository when we include those definitions from a different project.

Problem description

When a .giltlab-ci.yml file includes files from a different repository its contents are expanded and the resulting code is the same as the one generated when the included files are local to the repository.

In fact, even when the remote files include other files everything works right, as they are also expanded (see the description of how included files are merged for a complete explanation), allowing us to organise the common repository as we want.

As an example, suppose that we have the following script on the assets/ folder of the common repository:

dumb.sh
#!/bin/sh
echo "The script arguments are: '$@'"

If we run the following job on the common repository:

job:
  script:
    - $CI_PROJECT_DIR/assets/dumb.sh ARG1 ARG2

the output will be:

The script arguments are: 'ARG1 ARG2'

But if we run the same job from a different project that includes the same job definition the output will be different:

/scripts-23-19051/step_script: eval: line 138: d./assets/dumb.sh: not found

The problem here is that we include and expand the YAML files, but if a script wants to use other files from the common repository as an asset (configuration file, shell script, template, etc.), the execution fails if the files are not available on the project that includes the remote job definition.

Solutions

We can solve the issue using multiple approaches, I’ll describe two of them:

  • Create files using scripts
  • Download files from the common repository

Create files using scripts

One way to dodge the issue is to generate the non YAML files from scripts included on the pipelines using HERE documents.

The problem with this approach is that we have to put the content of the files inside a script on a YAML file and if it uses characters that can be replaced by the shell (remember, we are using HERE documents) we have to escape them (error prone) or encode the whole file into base64 or something similar, making maintenance harder.

As an example, imagine that we want to use the dumb.sh script presented on the previous section and we want to call it from the same PATH of the main project (on the examples we are using the same folder, in practice we can create a hidden folder inside the project directory or use a PATH like /tmp/assets-$CI_JOB_ID to leave things outside the project folder and make sure that there will be no collisions if two jobs are executed on the same place (i.e. when using a ssh runner).

To create the file we will use hidden jobs to write our script template and reference tags to add it to the scripts when we want to use them.

Here we have a snippet that creates the file with cat:

.file_scripts:
  create_dumb_sh:
    - |
      # Create dumb.sh script
      mkdir -p "${CI_PROJECT_DIR}/assets"
      cat >"${CI_PROJECT_DIR}/assets/dumb.sh" <<EOF
      #!/bin/sh
      echo "The script arguments are: '\$@'"
      EOF
      chmod +x "${CI_PROJECT_DIR}/assets/dumb.sh"

Note that to make things work we’ve added 6 spaces before the script code and escaped the dollar sign.

To do the same using base64 we replace the previous snippet by this:

.file_scripts:
  create_dumb_sh:
    - |
      # Create dumb.sh script
      mkdir -p "${CI_PROJECT_DIR}/assets"
      base64 -d >"${CI_PROJECT_DIR}/assets/dumb.sh" <<EOF
      IyEvYmluL3NoCmVjaG8gIlRoZSBzY3JpcHQgYXJndW1lbnRzIGFyZTogJyRAJyIK
      EOF
      chmod +x "${CI_PROJECT_DIR}/assets/dumb.sh"

Again, we have to indent the base64 version of the file using 6 spaces (all lines of the base64 output have to be indented) and to make changes we have to decode and re-code the file manually, making it harder to maintain.

With either version we just need to add a !reference before using the script, if we add the call on the first lines of the before_script we can use the downloaded file in the before_script, script or after_script sections of the job without problems:

job:
  before_script:
    - !reference [.file_scripts, create_dumb_sh]
  script:
    - ${CI_PROJECT_DIR}/assets/dumb.sh ARG1 ARG2

The output of a pipeline that uses this job will be the same as the one shown in the original example:

The script arguments are: 'ARG1 ARG2'

Download the files from the common repository

As we’ve seen the previous solution works but is not ideal as it makes the files harder to read, maintain and use.

An alternative approach is to keep the assets on a directory of the common repository (in our examples we will name it assets) and prepare a YAML file that declares some variables (i.e. the URL of the templates project and the PATH where we want to download the files) and defines a script fragment to download the complete folder.

Once we have the YAML file we just need to include it and add a reference to the script fragment at the beginning of the before_script of the jobs that use files from the assets directory and they will be available when needed.

The following file is an example of the YAML file we just mentioned:

bootstrap.yml
variables:
  CI_TMPL_API_V4_URL: "${CI_API_V4_URL}/projects/common%2Fci-templates"
  CI_TMPL_ARCHIVE_URL: "${CI_TMPL_API_V4_URL}/repository/archive"
  CI_TMPL_ASSETS_DIR: "/tmp/assets-${CI_JOB_ID}"

.scripts_common:
  bootstrap_ci_templates:
    - |
      # Downloading assets
      echo "Downloading assets"
      mkdir -p "$CI_TMPL_ASSETS_DIR"
      wget -q -O - --header="PRIVATE-TOKEN: $CI_TMPL_READ_TOKEN" \
        "$CI_TMPL_ARCHIVE_URL?path=assets&sha=${CI_TMPL_REF:-main}" |
        tar --strip-components 2 -C "$ASSETS_DIR" -xzf -

The file defines the following variables:

  • CI_TMPL_API_V4_URL: URL of the common project, in our case we are using the project ci-templates inside the common group (note that the slash between the group and the project is escaped, that is needed to reference the project by name, if we don’t like that approach we can replace the url encoded path by the project id, i.e. we could use a value like ${CI_API_V4_URL}/projects/31)
  • CI_TMPL_ARCHIVE_URL: Base URL to use the gitlab API to download files from a repository, we will add the arguments path and sha to select which sub path to download and from which commit, branch or tag (we will explain later why we use the CI_TMPL_REF, for now just keep in mind that if it is not defined we will download the version of the files available on the main branch when the job is executed).
  • CI_TMPL_ASSETS_DIR: Destination of the downloaded files.

And uses variables defined in other places:

  • CI_TMPL_READ_TOKEN: token that includes the read_api scope for the common project, we need it because the tokens created by the CI/CD pipelines of other projects can’t be used to access the api of the common one.

    We define the variable on the gitlab CI/CD variables section to be able to change it if needed (i.e. if it expires)

  • CI_TMPL_REF: branch or tag of the common repo from which to get the files (we need that to make sure we are using the right version of the files, i.e. when testing we will use a branch and on production pipelines we can use fixed tags to make sure that the assets don’t change between executions unless we change the reference).

    We will set the value on the .gitlab-ci.yml file of the remote projects and will use the same reference when including the files to make sure that everything is coherent.

This is an example YAML file that defines a pipeline with a job that uses the script from the common repository:

pipeline.yml
include:
  - /bootstrap.yaml
stages:
  - test
dumb_job:
  stage: test
  before_script:
    - !reference [.bootstrap_ci_templates, create_dumb_sh]
  script:
    - ${CI_TMPL_ASSETS_DIR}/dumb.sh ARG1 ARG2

To use it from an external project we will use the following gitlab ci configuration:

gitlab-ci.yml
include:
  - project: 'common/ci-templates'
    ref: &ciTmplRef 'main'
    file: '/pipeline.yml'

variables:
  CI_TMPL_REF: *ciTmplRef

Where we use a YAML anchor to ensure that we use the same reference when including and when assigning the value to the CI_TMPL_REF variable (as far as I know we have to pass the ref value explicitly to know which reference was used when including the file, the anchor allows us to make sure that the value is always the same in both places).

The reference we use is quite important for the reproducibility of the jobs, if we don’t use fixed tags or commit hashes as references each time a job that downloads the files is executed we can get different versions of them.

For that reason is not a bad idea to create tags on our common repo and use them as reference on the projects or branches that we want to behave as if their CI/CD configuration was local (if we point to a fixed version of the common repo the way everything is going to work is almost the same as having the pipelines directly in our repo).

But while developing pipelines using branches as references is a really useful option; it allows us to re-run the jobs that we want to test and they will download the latest versions of the asset files on the branch, speeding up the testing process.

However keep in mind that the trick only works with the asset files, if we change a job or a pipeline on the YAML files restarting the job is not enough to test the new version as the restart uses the same job created with the current pipeline.

To try the updated jobs we have to create a new pipeline using a new action against the repository or executing the pipeline manually.

Conclusion

For now I’m using the second solution and as it is working well my guess is that I’ll keep using that approach unless giltab itself provides a better or simpler way of doing the same thing.