Using dependencies from PyPI
Using PyPI packages (aka “pip install”) involves two main steps.
Installing third party packages
Using bzlmod
To add pip dependencies to your MODULE.bazel
file, use the pip.parse
extension, and call it to create the central external repo and individual wheel
external repos. Include in the MODULE.bazel
the toolchain extension as shown
in the first bzlmod example above.
pip = use_extension("@rules_python//python/extensions:pip.bzl", "pip")
pip.parse(
hub_name = "my_deps",
python_version = "3.11",
requirements_lock = "//:requirements_lock_3_11.txt",
)
use_repo(pip, "my_deps")
For more documentation, including how the rules can update/create a requirements
file, see the bzlmod examples under the examples folder or the documentation
for the @rules_python//python/extensions:pip.bzl
extension.
Note
We are using a host-platform compatible toolchain by default to setup pip dependencies.
During the setup phase, we create some symlinks, which may be inefficient on Windows
by default. In that case use the following .bazelrc
options to improve performance if
you have admin privileges:
startup --windows_enable_symlinks
This will enable symlinks on Windows and help with bootstrap performance of setting up the hermetic host python interpreter on this platform. Linux and OSX users should see no difference.
Using a WORKSPACE file
To add pip dependencies to your WORKSPACE
, load the pip_parse
function and
call it to create the central external repo and individual wheel external repos.
load("@rules_python//python:pip.bzl", "pip_parse")
# Create a central repo that knows about the dependencies needed from
# requirements_lock.txt.
pip_parse(
name = "my_deps",
requirements_lock = "//path/to:requirements_lock.txt",
)
# Load the starlark macro, which will define your dependencies.
load("@my_deps//:requirements.bzl", "install_deps")
# Call it to define repos for your requirements.
install_deps()
Vendoring the requirements.bzl file
In some cases you may not want to generate the requirements.bzl file as a repository rule while Bazel is fetching dependencies. For example, if you produce a reusable Bazel module such as a ruleset, you may want to include the requirements.bzl file rather than make your users install the WORKSPACE setup to generate it. See https://github.com/bazelbuild/rules_python/issues/608
This is the same workflow as Gazelle, which creates go_repository
rules with
update-repos
To do this, use the “write to source file” pattern documented in https://blog.aspect.dev/bazel-can-write-to-the-source-folder to put a copy of the generated requirements.bzl into your project. Then load the requirements.bzl file directly rather than from the generated repository. See the example in rules_python/examples/pip_parse_vendored.
Requirements for a specific OS/Architecture
In some cases you may need to use different requirements files for different OS, Arch combinations. This is enabled via the requirements_by_platform
attribute in pip.parse
extension and the pip_parse
repository rule. The keys of the dictionary are labels to the file and the values are a list of comma separated target (os, arch) tuples.
For example:
# ...
requirements_by_platform = {
"requirements_linux_x86_64.txt": "linux_x86_64",
"requirements_osx.txt": "osx_*",
"requirements_linux_exotic.txt": "linux_exotic",
"requirements_some_platforms.txt": "linux_aarch64,windows_*",
},
# For the list of standard platforms that the rules_python has toolchains for, default to
# the following requirements file.
requirements_lock = "requirements_lock.txt",
In case of duplicate platforms, rules_python
will raise an error as there has
to be unambiguous mapping of the requirement files to the (os, arch) tuples.
An alternative way is to use per-OS requirement attributes.
# ...
requirements_windows = "requirements_windows.txt",
requirements_darwin = "requirements_darwin.txt",
# For the remaining platforms (which is basically only linux OS), use this file.
requirements_lock = "requirements_lock.txt",
)
pip rules
Note that since pip_parse
and pip.parse
are executed at evaluation time,
Bazel has no information about the Python toolchain and cannot enforce that the
interpreter used to invoke pip
matches the interpreter used to run
py_binary
targets. By default, pip_parse
uses the system command
"python3"
. To override this, pass in the python_interpreter
attribute or
python_interpreter_target
attribute to pip_parse
. The pip.parse
bzlmod
extension
by default uses the hermetic python toolchain for the host platform.
You can have multiple pip_parse
s in the same workspace, or use the pip
extension multiple times when using bzlmod. This configuration will create
multiple external repos that have no relation to one another and may result in
downloading the same wheels numerous times.
As with any repository rule, if you would like to ensure that pip_parse
is
re-executed to pick up a non-hermetic change to your environment (e.g., updating
your system python
interpreter), you can force it to re-execute by running
bazel sync --only [pip_parse name]
.
Using third party packages as dependencies
Each extracted wheel repo contains a py_library
target representing
the wheel’s contents. There are two ways to access this library. The
first uses the requirement()
function defined in the central
repo’s //:requirements.bzl
file. This function maps a pip package
name to a label:
load("@my_deps//:requirements.bzl", "requirement")
py_library(
name = "mylib",
srcs = ["mylib.py"],
deps = [
":myotherlib",
requirement("some_pip_dep"),
requirement("another_pip_dep"),
]
)
The reason requirement()
exists is to insulate from
changes to the underlying repository and label strings. However, those
labels have become directly used, so aren’t able to easily change regardless.
On the other hand, using requirement()
has several drawbacks; see
this issue for an enumeration. If you don’t
want to use requirement()
, you can use the library
labels directly instead. For pip_parse
, the labels are of the following form:
@{name}//{package}
Here name
is the name
attribute that was passed to pip_parse
and
package
is the pip package name with characters that are illegal in
Bazel label names (e.g. -
, .
) replaced with _
. If you need to
update name
from “old” to “new”, then you can run the following
buildozer command:
buildozer 'substitute deps @old//([^/]+) @new//${1}' //...:*
Entry points
If you would like to access entry points, see the py_console_script_binary
rule documentation,
which can help you create a py_binary
target for a particular console script exposed by a package.
‘Extras’ dependencies
Any ‘extras’ specified in the requirements lock file will be automatically added
as transitive dependencies of the package. In the example above, you’d just put
requirement("useful_dep")
or @pypi//useful_dep
.
Consuming Wheel Dists Directly
If you need to depend on the wheel dists themselves, for instance, to pass them
to some other packaging tool, you can get a handle to them with the
whl_requirement
macro. For example:
load("@pypi//:requirements.bzl", "whl_requirement")
filegroup(
name = "whl_files",
data = [
# This is equivalent to "@pypi//boto3:whl"
whl_requirement("boto3"),
]
)
Creating a filegroup of files within a whl
The rule whl_filegroup
exists as an easy way to extract the necessary files
from a whl file without the need to modify the BUILD.bazel
contents of the
whl repositories generated via pip_repository
. Use it similarly to the filegroup
above. See the API docs for more information.
Advanced topics
Circular dependencies
Sometimes PyPi packages contain dependency cycles – for instance a particular
version sphinx
(this is no longer the case in the latest version as of
2024-06-02) depends on sphinxcontrib-serializinghtml
. When using them as
requirement()
s, ala
py_binary(
name = "doctool",
...
deps = [
requirement("sphinx"),
],
)
Bazel will protest because it doesn’t support cycles in the build graph –
ERROR: .../external/pypi_sphinxcontrib_serializinghtml/BUILD.bazel:44:6: in alias rule @pypi_sphinxcontrib_serializinghtml//:pkg: cycle in dependency graph:
//:doctool (...)
@pypi//sphinxcontrib_serializinghtml:pkg (...)
.-> @pypi_sphinxcontrib_serializinghtml//:pkg (...)
| @pypi_sphinxcontrib_serializinghtml//:_pkg (...)
| @pypi_sphinx//:pkg (...)
| @pypi_sphinx//:_pkg (...)
`-- @pypi_sphinxcontrib_serializinghtml//:pkg (...)
The experimental_requirement_cycles
argument allows you to work around these
issues by specifying groups of packages which form cycles. pip_parse
will
transparently fix the cycles for you and provide the cyclic dependencies
simultaneously.
pip_parse(
...
experimental_requirement_cycles = {
"sphinx": [
"sphinx",
"sphinxcontrib-serializinghtml",
]
},
)
pip_parse
supports fixing multiple cycles simultaneously, however cycles must
be distinct. apache-airflow
for instance has dependency cycles with a number
of its optional dependencies, which means those optional dependencies must all
be a part of the airflow
cycle. For instance –
pip_parse(
...
experimental_requirement_cycles = {
"airflow": [
"apache-airflow",
"apache-airflow-providers-common-sql",
"apache-airflow-providers-postgres",
"apache-airflow-providers-sqlite",
]
}
)
Alternatively, one could resolve the cycle by removing one leg of it.
For example while apache-airflow-providers-sqlite
is “baked into” the Airflow
package, apache-airflow-providers-postgres
is not and is an optional feature.
Rather than listing apache-airflow[postgres]
in your requirements.txt
which
would expose a cycle via the extra, one could either manually depend on
apache-airflow
and apache-airflow-providers-postgres
separately as
requirements. Bazel rules which need only apache-airflow
can take it as a
dependency, and rules which explicitly want to mix in
apache-airflow-providers-postgres
now can.
Alternatively, one could use rules_python
’s patching features to remove one
leg of the dependency manually. For instance by making
apache-airflow-providers-postgres
not explicitly depend on apache-airflow
or
perhaps apache-airflow-providers-common-sql
.
Bazel downloader and multi-platform wheel hub repository.
The bzlmod
pip.parse
call supports pulling information from PyPI
(or a
compatible mirror) and it will ensure that the bazel
downloader is used for downloading the wheels. This allows
the users to use the credential helper to authenticate
with the mirror and it also ensures that the distribution downloads are cached.
It also avoids using pip
altogether and results in much faster dependency
fetching.
This can be enabled by experimental_index_url
and related flags as shown in
the examples/bzlmod/MODULE.bazel example.
When using this feature during the pip
extension evaluation you will see the accessed indexes similar to below:
Loading: 0 packages loaded
currently loading: docs/
Fetching module extension pip in @@//python/extensions:pip.bzl; starting
Fetching https://pypi.org/simple/twine/
This does not mean that rules_python
is fetching the wheels eagerly, but it
rather means that it is calling the PyPI server to get the Simple API response
to get the list of all available source and wheel distributions. Once it has
got all of the available distributions, it will select the right ones depending
on the sha256
values in your requirements_lock.txt
file. The compatible
distribution URLs will be then written to the MODULE.bazel.lock
file. Currently
users wishing to use the lock file with rules_python
with this feature have
to set an environment variable RULES_PYTHON_OS_ARCH_LOCK_FILE=0
which will
become default in the next release.
Fetching the distribution information from the PyPI allows rules_python
to
know which whl
should be used on which target platform and it will determine
that by parsing the whl
filename based on PEP600, PEP656 standards. This
allows the user to configure the behaviour by using the following publicly
available flags:
--@rules_python//python/config_settings:py_linux_libc
for selecting the Linux libc variant.--@rules_python//python/config_settings:pip_whl
for selectingwhl
distribution preference.--@rules_python//python/config_settings:pip_whl_osx_arch
for selecting MacOS wheel preference.--@rules_python//python/config_settings:pip_whl_glibc_version
for selecting the GLIBC version compatibility.--@rules_python//python/config_settings:pip_whl_muslc_version
for selecting the musl version compatibility.--@rules_python//python/config_settings:pip_whl_osx_version
for selecting MacOS version compatibility.
Credential Helper
The “use Bazel downloader for python wheels” experimental feature includes support for the Bazel Credential Helper.
Your python artifact registry may provide a credential helper for you. Refer to your index’s docs to see if one is provided.
See the Credential Helper Spec for details.
Basic Example:
The simplest form of a credential helper is a bash script that accepts an arg and spits out JSON to stdout. For a service like Google Artifact Registry that uses ‘Basic’ HTTP Auth and does not provide a credential helper that conforms to the spec, the script might look like:
#!/bin/bash
# cred_helper.sh
ARG=$1 # but we don't do anything with it as it's always "get"
# formatting is optional
echo '{'
echo ' "headers": {'
echo ' "Authorization": ["Basic dGVzdDoxMjPCow=="]'
echo ' }'
echo '}'
Configure Bazel to use this credential helper for your python index example.com
:
# .bazelrc
build --credential_helper=example.com=/full/path/to/cred_helper.sh
Bazel will call this file like cred_helper.sh get
and use the returned JSON to inject headers
into whatever HTTP(S) request it performs against example.com
.