Tuesday, October 15, 2019

Configuration inheritance and how it locks you in

So a quick definition of "configuration inheritance". It's not a new technique or recent revolution. It appears practically everywhere since the olden days. Basically, configuration inheritance is a base configuration/environment/set of settings that can be overridden by more specific cases/situations/users/etc.

So you start with, say this base environment, we'll say it's .rootrc:

   USERNAME=system
   MACHINE_NAME=some_server
   HOME_DIR=/home/system

But then you actually are running as user DUDE, and they inherit the base system's environment and decorate some other settings, in /home/DUDE/.bashrc file:

  USERNAME=DUDE
  HOME_DIR=/home/DUDE
  BACKUP_LOC=/mnt/backups/DUDE
 
 But then DUDE uses github, which applies and adds it's own settings from the .github file:

  USERNAME=dudeman
  GITPASSWORD=githubpassword
 
This example is just inside some UNIX environment, highlighting that this has existed since the late 1970s, and probably before then. It shows up all over programming, software, and systems management. To show you a more extreme example, here is the Spring settings hierarchy:

Spring Boot uses a very particular PropertySource order that is designed to allow sensible overriding of values. Properties are considered in the following order:
  1. Devtools global settings properties on your home directory (~/.spring-boot-devtools.properties when devtools is active).
  2. @TestPropertySource annotations on your tests.
  3. properties attribute on your tests. Available on @SpringBootTest and the test annotations for testing a particular slice of your application.
  4. Command line arguments.
  5. Properties from SPRING_APPLICATION_JSON (inline JSON embedded in an environment variable or system property).
  6. ServletConfig init parameters.
  7. ServletContext init parameters.
  8. JNDI attributes from java:comp/env.
  9. Java System properties (System.getProperties()).
  10. OS environment variables.
  11. A RandomValuePropertySource that has properties only in random.*.
  12. Profile-specific application properties outside of your packaged jar (application-{profile}.properties and YAML variants).
  13. Profile-specific application properties packaged inside your jar (application-{profile}.properties and YAML variants).
  14. Application properties outside of your packaged jar (application.properties and YAML variants).
  15. Application properties packaged inside your jar (application.properties and YAML variants).
  16. @PropertySource annotations on your @Configuration classes.
  17. Default properties (specified by setting SpringApplication.setDefaultProperties).
... I can hear Harry Caray going "holy cow". That hierarchy has lots of sensible features that involve testing, dev vs stage vs production environment settings, ability to do adhoc overrides, defaults, etc. There's reasonable justification for such a complicated stack, including evolution over time and hard learned lessons.

Really what this data pattern is a series of maps / dictionaries / key-value files that are then stacked upon each other, and the lookup then starts at the top of the stack and descends until it finds a value of a given key.

You see it in the above examples, in kubernetes/docker image overlays, in practically every infrastructure management tool, and dozens of other places.

Why? this is a good tool for centralizing common properties at one level, and overriding in specific situations, in order to manage settings across a variety of environments, or machines, or situations, or use cases.

It also enables a key aspect of software development and systems management: iterative/evolutionary development, as new requirements, systems, interfaces are added to something, or in the case of pure software development, successive releases to support increasing complexity.

But what seems like a solid foundation really is a teetering tower once you get five layers of configuration inheritance. What provides the illusion of management simplicity completely falls apart when significant reconfiguration occurs at the lower levels of the "stack of maps". And that is a key way in how organizations are locked into systems and software packages.

Here's the key feature that almost every one of these configuration inheritance / stacked maps lacks: a basic meta-mapping feature that explains where each value comes from. Using our first example of unix environment variables, which are displayed in any "context" by using the env command:

    USERNAME=dudeman
    GITPASSWORD=githubpassword
    HOME_DIR=/home/DUDE
    MACHINE_NAME=some_server
    BACKUP_LOC=/mnt/backups/DUDE

So my suggestion is that a key facility should be the metaenv command:

    .github::USERNAME=dudeman
    .github::GITPASSWORD=githubpassword
    .bashrc::HOME_DIR=/home/DUDE
    .rootrc::MACHINE_NAME=some_server
    .bashrc::BACKUP_LOC=/mnt/backups/DUDE

This command will show the additional information about which of the three files (in this example) actual values were assigned or possibly overridden, so that if there is a problem, bug, or unexpected value, someone can walk up the chain.

Unfortunately, once you get to the concept of multiple users, applications, servers, environments, etc etc, you form a huge tree of different inheriting situations, much like the huge inheritance hierarchies of OOP software and other situations.

The real problem with that is that trees are very hard to visualize with tools, especailly command line tools. But they are possible, and that is another class of tool that also does not accompany these systems, but should.

Because, should you apply a change to a file that affects a lot of other files that inherit from it, it would be nice to visualize all the impacted "downstream" files, and ALSO which downstream files are NOT affected because they override the values you changed.

No comments:

Post a Comment