Skip to main content

Your submission was sent successfully! Close

Thank you for signing up for our newsletter!
In these regular emails you will find the latest updates from Canonical and upcoming events where you can meet our team.Close

Thank you for contacting us. A member of our team will be in touch shortly. Close

Apache Spark Configuration Management

Apache Spark comes with a wide range of configuration properties that can be fed into Apache Spark using a single property file, e.g. spark.properties, or by passing configuration values on the command line, as an argument to spark-submit, pyspark and spark-shell.

Charmed Apache Spark improves on this capability by enabling a set of hierarchical layers of configurations, that are merged and overridden based on a precedence rule.

Each layer may also be linked to a particular component of the Charmed Apache Spark solution. For more information about components, see the components overview page.

Using the different layers appropriately allow to organize and centralize configuration definition consistently for groups, single users, single environment and session. The sections below summarize the hierarchical levels of configurations. The final configuration is resolved by merging the different layers, starting from top to bottom, overriding the latter sources on top of previous ones in case of multi-level definitions.

Group configuration

Group configurations are centrally stored as secrets in K8s, and managed by spark-integration-hub-k8s charm that takes care of managing their lifecycle from creation, modification and deletion. Please refer to this how-to guide for more information on the usage of the spark-integration-hub-k8s charm for setting up group configurations. Theese are valid across users, machines and sessions.

User configuration

User configurations are centrally stored as secrets in K8s, but they are managed by the user using the spark-client snap and/or spark8t Python library. For more information, please refer to here for the spark-client snap and here for the spark8t Python library. They are valid across machines and sessions.

Environment configuration

Environment configurations are stored in your local environment, and they can apply to multiple Apache Spark users launched/used from the same machine. They are valid across users and sessions. These configurations may be stored in:

  • static properties files specified via environment variable SPARK_CLIENT_ENV_CONF
  • $SNAP_DATA/etc/spark8t/spark-defaults.conf

The file specified by the environment variable takes the precedence.

Session configuration

Session configurations are provided as CLI arguments to the spark-client command, and they are only valid for the related command/session. CLI configurations may be provided by:

  • Single Property specified using parameter(s) --conf <key>=<value>
  • Properties Files specified using parameter(s) --properties-file

Single Property takes the precedence.

Last updated 17 days ago. Help improve this document in the forum.