Monday, February 8, 2021

Stages of Despair in Configuration

Configuration. It's just a text file right?

If you shrugged and agreed, you can probably stop reading. You are at Stage 0 of configuration despair and can live in blissful ignorance until you encounter higher stages. 

Morbidly curious or enlightened by experience? Please read on.

Here is an example of a order of configuration resolution from a "mature" framework: an excerpt from the Spring framework in Java: 

Spring Boot uses a very particular PropertySource order that is designed to allow sensible overriding of values. Properties are considered in the following order:

  1. Devtools global settings properties on your home directory (~/.spring-boot-devtools.properties when devtools is active).
  2. @TestPropertySource annotations on your tests.
  3. properties attribute on your tests. Available on @SpringBootTest and the test annotations for testing a particular slice of your application.
  4. Command line arguments.
  5. Properties from SPRING_APPLICATION_JSON (inline JSON embedded in an environment variable or system property).
  6. ServletConfig init parameters.
  7. ServletContext init parameters.
  8. JNDI attributes from java:comp/env.
  9. Java System properties (System.getProperties()).
  10. OS environment variables.
  11. RandomValuePropertySource that has properties only in random.*.
  12. Profile-specific application properties outside of your packaged jar (application-{profile}.properties and YAML variants).
  13. Profile-specific application properties packaged inside your jar (application-{profile}.properties and YAML variants).
  14. Application properties outside of your packaged jar (application.properties and YAML variants).
  15. Application properties packaged inside your jar (application.properties and YAML variants).
  16. @PropertySource annotations on your @Configuration classes.
  17. Default properties (specified by setting SpringApplication.setDefaultProperties)
Wow. Spring is intended to be near-universal in it's use cases for enterprise Java, and that "stack" of configuration value resolution isn't (solely) a product of exuberant coding. A large part of that likely evolved from spring being used in (10000s? 100000s? millions?)... a LOT of applications. 

To underline: "A SENSIBLE OVERRIDING OF VALUES"

A lot of that is java specific, but let's examine quickly some of the more universally true sources of configuration in that stack:

- a configuration file (yaml/json/properties/toml/xml) in the base dir of your app, or the home dir of your account, or in the common UNIX path for config settings (/etc/<your_app>). Oh, which one has precedence?

- when executing automated testing, you probably want some test-specific configuration to override base settings. 

- command line arguments: invocation of the program with directly specified arguments/configuration on the command line should override any other sources ... right? 

- OS environment value: oh right, those are like command line args. they should probably override or at least contribute, right?

- environment-specific configuration: similar to testing-only configuration, you'll have configuration specific to local dev, dev, stage, preprod, integration, acceptance, and production. While production config will likely only exist on one box, other environments may coexist on boxes so you don't waste infrastructure

- formats: I debated making this its own stage (or including it in stage 2), but configuration can be in properties files, json, yaml, toml, (ugh) xml, to say the least. All depending on programmer (usually) or end user (for "nice" programmers) preference. 

If you are nodding your head at all of that, you are at stage 1 of configuration despair. Uhoh, that's ONLY stage 1? 

Stage 2: <whispers>secretsssss</whispers>

Some of the more important parts of app configuration is secrets: passwords, keys, etc. Sure you can plop those in a text file somewhere, chmod 600, and rely on base UNIX security, but your client/company security, uh, overlords may have different requirements. It might be in aws secrets manager, in hashicorp vault, in some plain old LDAP or database, or a web service. 

Suddenly a source isn't a plain old text file or plain old cli args or plain old env vars. Your configuration subsystem may involve database drivers, webservice invocations, cloud interfaces, LDAP, decryption, etc. 

Welcome to Stage 2... retrieval of configuration uses arbitrary computing and really needs to be turing complete.  Wait, that's just stage 2? Yes, it gets worse.

Stage 3: Datatype Complexity

Wait, isn't all that above a big mound of complexity? Yes. Alas, there are so MANY kinds of complexity in IT.

An exhausted veteran of IT systems may notice that all we are really retrieving from configuration in the previous stages of despair implicitly are ... single values. Do computers ONLY take single values as input? Oh right. They take lists of values. They take map/dicts/associative arrays. Oh crap, they take entire tables of values (CSV), trees of values (deep json/yaml/xml). In the worst case: full on graphs of data objects. 


Don't believe me? Ever written ansible / terraform / chef? Heard of "configuration languages"? Ever templated your configuration files, and those template engines of course have an evaulation language. These all exist to help build out the configuration trees and graphs of data.

Oh crap, your configuration data which at a minimum are now trees of data, will probably have templating and/or cross-referencing values (basically almost an object graph). Your configuration has ... configuration. It's gone recursive. 

We're not done.

Unlike single-level single-value "precedence" you'll now have some tougher problems:

- If a configuration key's value is a map, does it overlay/combine/union its keys with the lower precedence maps (which can be very useful for DRY of common configuration / values)?
- If a configuration key's value is list, does it replace, append, union, intersect, etc with the lower precedence lists of values?
- For overlaid maps, how do you remove/undo/cancel the value of a lower-precedence map key that you don't want to be active? 

Welcome to Stage 3! Actually, between the recursive configuration and data combination, I probably could have split this one into more stages.

Still with me? Onto stage 4!

Stage 4: Filesystem overlays

This really is just a variant of all the previous three stages, but it does exist...

Ever heard of docker? You know how docker has that nice feature where a docker image can combine several "layers" of filesystems into an overall filesystem for your container? Know what that's doing?

Oh right, you're resolving configuration.

That's a POWERFUL technique.

But really it runs into all the previous stages' complexities: you'll have to determine the order of those filesystem overlays. Those "filesystems" may be subdirectories on the build machine, they may involve source control system pulls, sftp, secured systems, s3 buckets. 

Oh <deity or savior of your cultural preference>, what happens when there's a file in the same path in multiple overlaid filesystems? Generally you'd just do highest precedence file wins, but technically you may need to support diff/merges/parsing logic. What if you need to cancel out a file? How do you specify a "remove file" instruction?

Yes those are edge cases. But they exist....