This post describes how to handle files that are used as assets by jobs and pipelines defined on a common gitlab-ci repository when we include those definitions from a different project.
Problem description
When a .giltlab-ci.yml
file includes files from a different
repository its contents are expanded and the resulting code is the same as the one generated when the included files
are local to the repository.
In fact, even when the remote files include other files everything works right, as they are also expanded (see the description of how included files are merged for a complete explanation), allowing us to organise the common repository as we want.
As an example, suppose that we have the following script on the assets/
folder of the common repository:
If we run the following job on the common repository:
job:
script:
- $CI_PROJECT_DIR/assets/dumb.sh ARG1 ARG2
the output will be:
The script arguments are: 'ARG1 ARG2'
But if we run the same job from a different project that includes the same job definition the output will be different:
/scripts-23-19051/step_script: eval: line 138: d./assets/dumb.sh: not found
The problem here is that we include and expand the YAML
files, but if a script wants to use other files from the
common repository as an asset (configuration file, shell script, template, etc.), the execution fails if the files are
not available on the project that includes the remote job definition.
Solutions
We can solve the issue using multiple approaches, I’ll describe two of them:
- Create files using scripts
- Download files from the common repository
Create files using scripts
One way to dodge the issue is to generate the non YAML files from scripts included on the pipelines using HERE documents.
The problem with this approach is that we have to put the content of the files inside a script on a YAML file and if it
uses characters that can be replaced by the shell (remember, we are using HERE documents) we have to escape them (error
prone) or encode the whole file into base64
or something similar, making maintenance harder.
As an example, imagine that we want to use the dumb.sh
script presented on the previous section and we want to call it
from the same PATH of the main project (on the examples we are using the same folder, in practice we can create a hidden
folder inside the project directory or use a PATH like /tmp/assets-$CI_JOB_ID
to leave things outside the project
folder and make sure that there will be no collisions if two jobs are executed on the same place (i.e. when using a ssh
runner).
To create the file we will use hidden jobs to write our script template and reference tags to add it to the scripts when we want to use them.
Here we have a snippet that creates the file with cat
:
.file_scripts:
create_dumb_sh:
- |
# Create dumb.sh script
mkdir -p "${CI_PROJECT_DIR}/assets"
cat >"${CI_PROJECT_DIR}/assets/dumb.sh" <<EOF
#!/bin/sh
echo "The script arguments are: '\$@'"
EOF
chmod +x "${CI_PROJECT_DIR}/assets/dumb.sh"
Note that to make things work we’ve added 6 spaces before the script code and escaped the dollar sign.
To do the same using base64
we replace the previous snippet by this:
.file_scripts:
create_dumb_sh:
- |
# Create dumb.sh script
mkdir -p "${CI_PROJECT_DIR}/assets"
base64 -d >"${CI_PROJECT_DIR}/assets/dumb.sh" <<EOF
IyEvYmluL3NoCmVjaG8gIlRoZSBzY3JpcHQgYXJndW1lbnRzIGFyZTogJyRAJyIK
EOF
chmod +x "${CI_PROJECT_DIR}/assets/dumb.sh"
Again, we have to indent the base64
version of the file using 6 spaces (all lines of the base64
output have to be
indented) and to make changes we have to decode and re-code the file manually, making it harder to maintain.
With either version we just need to add a !reference
before using the script, if we add the call on the first lines of
the before_script
we can use the downloaded file in the before_script
, script
or after_script
sections of the
job without problems:
job:
before_script:
- !reference [.file_scripts, create_dumb_sh]
script:
- ${CI_PROJECT_DIR}/assets/dumb.sh ARG1 ARG2
The output of a pipeline that uses this job will be the same as the one shown in the original example:
The script arguments are: 'ARG1 ARG2'
Download the files from the common repository
As we’ve seen the previous solution works but is not ideal as it makes the files harder to read, maintain and use.
An alternative approach is to keep the assets on a directory of the common repository (in our examples we will name it
assets
) and prepare a YAML file that declares some variables (i.e. the URL of the templates project and the PATH where
we want to download the files) and defines a script fragment to download the complete folder.
Once we have the YAML file we just need to include it and add a reference to the script fragment at the beginning of the
before_script
of the jobs that use files from the assets
directory and they will be available when needed.
The following file is an example of the YAML file we just mentioned:
The file defines the following variables:
CI_TMPL_API_V4_URL
: URL of the common project, in our case we are using the projectci-templates
inside thecommon
group (note that the slash between the group and the project is escaped, that is needed to reference the project by name, if we don’t like that approach we can replace the url encoded path by the project id, i.e. we could use a value like${CI_API_V4_URL}/projects/31
)CI_TMPL_ARCHIVE_URL
: Base URL to use the gitlab API to download files from a repository, we will add the argumentspath
andsha
to select which sub path to download and from which commit, branch or tag (we will explain later why we use theCI_TMPL_REF
, for now just keep in mind that if it is not defined we will download the version of the files available on themain
branch when the job is executed).CI_TMPL_ASSETS_DIR
: Destination of the downloaded files.
And uses variables defined in other places:
CI_TMPL_READ_TOKEN
: token that includes theread_api
scope for the common project, we need it because the tokens created by the CI/CD pipelines of other projects can’t be used to access the api of the common one.We define the variable on the gitlab CI/CD variables section to be able to change it if needed (i.e. if it expires)
CI_TMPL_REF
: branch or tag of the common repo from which to get the files (we need that to make sure we are using the right version of the files, i.e. when testing we will use a branch and on production pipelines we can use fixed tags to make sure that the assets don’t change between executions unless we change the reference).We will set the value on the
.gitlab-ci.yml
file of the remote projects and will use the same reference when including the files to make sure that everything is coherent.
This is an example YAML file that defines a pipeline with a job that uses the script from the common repository:
To use it from an external project we will use the following gitlab ci configuration:
Where we use a YAML anchor to ensure that we use the same reference when including and when assigning the value to the
CI_TMPL_REF
variable (as far as I know we have to pass the ref
value explicitly to know which reference was used
when including the file, the anchor allows us to make sure that the value is always the same in both places).
The reference we use is quite important for the reproducibility of the jobs, if we don’t use fixed tags or commit hashes as references each time a job that downloads the files is executed we can get different versions of them.
For that reason is not a bad idea to create tags on our common repo and use them as reference on the projects or branches that we want to behave as if their CI/CD configuration was local (if we point to a fixed version of the common repo the way everything is going to work is almost the same as having the pipelines directly in our repo).
But while developing pipelines using branches as references is a really useful option; it allows us to re-run the jobs that we want to test and they will download the latest versions of the asset files on the branch, speeding up the testing process.
However keep in mind that the trick only works with the asset files, if we change a job or a pipeline on the YAML files restarting the job is not enough to test the new version as the restart uses the same job created with the current pipeline.
To try the updated jobs we have to create a new pipeline using a new action against the repository or executing the pipeline manually.
Conclusion
For now I’m using the second solution and as it is working well my guess is that I’ll keep using that approach unless giltab itself provides a better or simpler way of doing the same thing.