Tuesday, October 15, 2019

Configuration inheritance and how it locks you in

So a quick definition of "configuration inheritance". It's not a new technique or recent revolution. It appears practically everywhere since the olden days. Basically, configuration inheritance is a base configuration/environment/set of settings that can be overridden by more specific cases/situations/users/etc.

So you start with, say this base environment, we'll say it's .rootrc:

   USERNAME=system
   MACHINE_NAME=some_server
   HOME_DIR=/home/system

But then you actually are running as user DUDE, and they inherit the base system's environment and decorate some other settings, in /home/DUDE/.bashrc file:

  USERNAME=DUDE
  HOME_DIR=/home/DUDE
  BACKUP_LOC=/mnt/backups/DUDE
 
 But then DUDE uses github, which applies and adds it's own settings from the .github file:

  USERNAME=dudeman
  GITPASSWORD=githubpassword
 
This example is just inside some UNIX environment, highlighting that this has existed since the late 1970s, and probably before then. It shows up all over programming, software, and systems management. To show you a more extreme example, here is the Spring settings hierarchy:

Spring Boot uses a very particular PropertySource order that is designed to allow sensible overriding of values. Properties are considered in the following order:
  1. Devtools global settings properties on your home directory (~/.spring-boot-devtools.properties when devtools is active).
  2. @TestPropertySource annotations on your tests.
  3. properties attribute on your tests. Available on @SpringBootTest and the test annotations for testing a particular slice of your application.
  4. Command line arguments.
  5. Properties from SPRING_APPLICATION_JSON (inline JSON embedded in an environment variable or system property).
  6. ServletConfig init parameters.
  7. ServletContext init parameters.
  8. JNDI attributes from java:comp/env.
  9. Java System properties (System.getProperties()).
  10. OS environment variables.
  11. A RandomValuePropertySource that has properties only in random.*.
  12. Profile-specific application properties outside of your packaged jar (application-{profile}.properties and YAML variants).
  13. Profile-specific application properties packaged inside your jar (application-{profile}.properties and YAML variants).
  14. Application properties outside of your packaged jar (application.properties and YAML variants).
  15. Application properties packaged inside your jar (application.properties and YAML variants).
  16. @PropertySource annotations on your @Configuration classes.
  17. Default properties (specified by setting SpringApplication.setDefaultProperties).
... I can hear Harry Caray going "holy cow". That hierarchy has lots of sensible features that involve testing, dev vs stage vs production environment settings, ability to do adhoc overrides, defaults, etc. There's reasonable justification for such a complicated stack, including evolution over time and hard learned lessons.

Really what this data pattern is a series of maps / dictionaries / key-value files that are then stacked upon each other, and the lookup then starts at the top of the stack and descends until it finds a value of a given key.

You see it in the above examples, in kubernetes/docker image overlays, in practically every infrastructure management tool, and dozens of other places.

Why? this is a good tool for centralizing common properties at one level, and overriding in specific situations, in order to manage settings across a variety of environments, or machines, or situations, or use cases.

It also enables a key aspect of software development and systems management: iterative/evolutionary development, as new requirements, systems, interfaces are added to something, or in the case of pure software development, successive releases to support increasing complexity.

But what seems like a solid foundation really is a teetering tower once you get five layers of configuration inheritance. What provides the illusion of management simplicity completely falls apart when significant reconfiguration occurs at the lower levels of the "stack of maps". And that is a key way in how organizations are locked into systems and software packages.

Here's the key feature that almost every one of these configuration inheritance / stacked maps lacks: a basic meta-mapping feature that explains where each value comes from. Using our first example of unix environment variables, which are displayed in any "context" by using the env command:

    USERNAME=dudeman
    GITPASSWORD=githubpassword
    HOME_DIR=/home/DUDE
    MACHINE_NAME=some_server
    BACKUP_LOC=/mnt/backups/DUDE

So my suggestion is that a key facility should be the metaenv command:

    .github::USERNAME=dudeman
    .github::GITPASSWORD=githubpassword
    .bashrc::HOME_DIR=/home/DUDE
    .rootrc::MACHINE_NAME=some_server
    .bashrc::BACKUP_LOC=/mnt/backups/DUDE

This command will show the additional information about which of the three files (in this example) actual values were assigned or possibly overridden, so that if there is a problem, bug, or unexpected value, someone can walk up the chain.

Unfortunately, once you get to the concept of multiple users, applications, servers, environments, etc etc, you form a huge tree of different inheriting situations, much like the huge inheritance hierarchies of OOP software and other situations.

The real problem with that is that trees are very hard to visualize with tools, especailly command line tools. But they are possible, and that is another class of tool that also does not accompany these systems, but should.

Because, should you apply a change to a file that affects a lot of other files that inherit from it, it would be nice to visualize all the impacted "downstream" files, and ALSO which downstream files are NOT affected because they override the values you changed.

Monday, October 7, 2019

NextStep OS fits in processor cache.

Back in the day of me buying a 486 PC for my college days, I remember thumbing through Computer Shopper. Somewhere in those days someone took the cache memory for processors and made a whole computer memory out of them. It was a lot more expensive!

But since those days and with the massive growth in silicon for L2/L3 caches of modern processors, I reflect on those days of operating systems and wonder "what if we could fit our OS and its apps in cache"?

Old Windows 3.1 or MS-DOS was not a pleasant experience compared to modern operating systems, almost no one can deny that. But NeXTSTEP? The color variants that came out after NeXT gave up on its black and white cubes... that is basically a modern UI, and arguably better than modern Linux GUIs.

https://www.youtube.com/watch?v=TIrTh80Z8jw

Old NeXT cubes ran on 16-64 megabytes of RAM and 25-33 megahertz of CPU, and 400 MB to 1 GB of disk. I can't find good numbers on the color variant that ran on x86, but I think our machines at college ran that very nicely on about 32 megabytes of RAM.

The prime curve of Moore's Law has produced quite the hardware bounty. Old timers like me bemoan a seemingly bloated software layer that has grown over the decades. The current Ryzen processors boast 80 megabytes of cache, and of course RAM is dollars a gigabyte, and SSD storage is getting reasonable.

Holy crap, NeXT would run entirely in cache. OK, swap might start getting into RAM. Modern RAM comes in the tens of gigabytes, much bigger than the hard disk on those machines. It basically looks like a modern OS, with a bit less aliasing on the fonts and icons. Productivity app set is practically identical. It had TCPIP networking, an HTML 1.0 web browser, spreadsheets, mail, etc.

With the end of big leaps from Moore's Law, if we want more performance, we'll need to optimize the software stack. I think a really cool project would be to start with the basis of NeXTStEP circa 1995-1997 and very carefully add in only what is needed. I remember the Mach kernel being buggy, but that can be swapped out.