The Wrong Patterns

Almost every organisation I visit in this DevOps world is obsessed by ‘technical debt’ – and consumed with guilt, hatred and loathing; well – perhaps that’s a bit harsh but……..

I find the obsession frustrating because it is all too often nebulous and therefore can’t be fixed or leads to an incorrect hypothesis about where problems lie.

These are some of the warning signs that I see people focus; often incorrectly & which lead to frustration and an inability to move forward.

I am keen to help anyone avoid these technical debt mistakes…..

1. Code Quality
Trying to ‘fix’ code quality across an entire legacy code base is not sensible.

* Languages, skills and coding techniques have changed – live with it.
* Define standards, write style and convention guides and test code going forward & maintain this approach – apply them sensitively to legacy code bases
* Don’t point a static code analysis tool at your entire code base and expect anything but pain
* Don’t try to refactor all your code to modern standards
* Fixing legacy issues should be based on metrics such as defect rate, bugs and adding new functionality

2. Run-book
Paper run-books should be burned and organisations with no run book deserve to suffer pain.

Failing to use a build pipeline|toolchain or run-book automation is a cardinal sin, don’t settle for people running complex IT release processes interactively.
* tools have existed to automate processes have existed for more than a decade; many are very good
* root|Administrator logging on to a system after development should be a reason for dismissal – perhaps a bit harsh but only perhaps
* trust the efficacy of anything done by humans with the same reliability as a 1000 person Chinese whisper

3. Staff turnover
You can’t stop staff turnover and you don’t want to.
Many organisations I work with have ‘someone’ who has to be retained – scarey.

* It is inevitable that staff will come and go and also that skills and experience will change
* Train your staff well and measure performance and conformity.
* Avoid the hero culture. Ensure that you apply the same principles to knowledge as you do to compute resilience i.e. N+1
* If you have critical systems with critical staffing requirements make plans to mitigate your risk. Automate, train, replace, apply staffing RAID1, RAID10

4. Corporate memory/intertia
We can’t do that because we are too large|have compliance|……
Being unprepared to make improvements because of corporate memory is just plain lazy

How often do I hear ‘You can’t do that’ or ‘that won’t work here’ or similar
* Don’t settle for laziness; create a culture to challenge convention
* Become the skeptical optimist; advocate the ‘we can do better’ approach and aim to prove it
* But don’t try and change the whole environment in one fell-swoop
* Make changes, strategically, but don’t do so without sponsorship

5. A relatively young industry
Whichever way you look at it IT/IS is a young industry; search for learnings from other sectors, specialisms. Take some of our own medicine! How many other industries has IT changed? Yet IT refuses to automate itself.

* It is surprising how little we have experimented
* It is surprising how few IT people are prepared to take risk
* Doing the same thing in the same way many times will yield the same results; if you want to change things you have to change the way you do things
* Accept learnings from others, go out and find experts – borrow their ideas

6. Old technology
The rate of change in our industry is significant and it is increasing
assume that any new technology will solve the problems of mankind and bring peace and harmony. Containers are great but they are not heaven-sent.

* with economic controls we must accept that technology will often be old but if it meets the needs of the business it has purpose
* Accept old technology; it can still be automated – you don’t have to Dockerize everything

7. The cloak of invisibility
The number of times I hear the words ‘we don’t have any data’.
Organisations which try to make improvements without evidence should fail.
Apply the scientific principle for crying out loud.

How often do we make changes without any data?
* We tune a server, or focus on performance coding when we don’t have any indication that there is a problem or where it lies
* We state that we can reduce cost when we don’t understand what something costs today
* We over-engineer because someone decides a service must meet certain unspecified requirement
* The business does not have any defined KPI’s and therefore doesn’t know what is important to it

8. The fragile artifact
Ignore the fragile artifact at your peril.
* I spent time in the IT department of an airport once and the ground handling system was the fragile artifact.
* Go near it and it might fail – so, people didn’t go near it
* When it failed everyone kept their head down
* Stop!!! – focus on the fragile artifact; fix it, fix the process around it, fix the team supporting it or turn it off!

9. A plethora of standards
We have standardised our Windows|RDBMS|release pipeline|……..,but we are a large organisation and we have 20 standards for each……

* So you haven’t standardised at all
* You can set a baseline / single standard – standards can be extended?
* Perhaps you are in IT and your standards don’t meet the needs of the business. Collaborate – fix the problem don’t settle on a second rate fudge
* Don’t tell me you can’t standardise – you simply haven’t considered the problem properly, understood the patterns and worked out how to abstract them

0. Enterprise Architects
Enterprise architects who attend seminar, produce Visio, talk in platitudes but can’t build systems and fix technical problems themselves do not add value to the business

* I spent some time with an airline where the EA talked about his errors in selecting a certain provider only to find that the solution he had bought did not do what it claimed; if he had spent an hour trying to use the managed service before signing the approval he would have worked this out and found a better solution
* EA’s need to be helping the organisation build and manage solutions
* EA’s need to define, measure and manage standards
* EA’s need to be driving, owning and living change

Work out what technical debt is and what it mean to you – then, and only then, we can fix the problems.

DevOps in the enterprise (continued)

I have already posed that one of the biggest challenges for the enterprise is sheer scale. Scale in terms of tin, services, process and people and this has a serious impact on transformation.

With hundreds or thousands of servers I can assure you that no-one has a complete map of the assets let alone the configuration of a global infrastructure. Sure, configuration management databases (CMDB) exist but they are no solution.
Centralised IT teams distributed according to an ITIL model have been bolstered by shadow IT which operates out of business units diluting knowledge and completeness.

I know that many industries are forced to comply with regulation . Many have defined standards to try and lower cost and achieve ‘compliance’. But I fear that many realise that their previous efforts have only gone so far. I have long held the opinion that if a privileged user is able to access a system it can no longer be guaranteed to comply to standards; this, in and of itself, is one of the reasons why Infrastructure as Code and immutable systems are so compelling.

And with this in mind the starting point for introducing DevOps to a legacy estate has to be a Discovery process.

Discovery is the first phase or in more traditional IT automation circles using an approach which I first used in 2005, a passive deployment.

HP, IBM, CA etc. will sell you Discovery tools but can you also use the emerging DevOps options?

Tools that are based on Promise Theory do not do passive although this may change and CFEngine comes close already, their goal is to implement a representation of its’ desired state from a policy. You guessed it Chef and Puppet are really based on this principle so you have a challenge.

Discovery using traditional tools uses approaches that are not popular in the enterprise; nmap and fingerprinting etc. with agents so they may work but you have no certainty.

You tackle the problem but breaking down the scale. Selecting a small footprint of Unix|Windows boxes, a specific application or some other logical divide.

You must first look for, and find, patterns and analyse these so you can find the most suitable target and start from there.
The second phase is to identify relationships which link patterns, in a computing world this becomes relationships between load balancers or connecting a web server to its’ application server.

Visual / mapping tools are a great way to start this discovery if it is available but they need to be able to exploit the discovery techniques described above. A good engineer can assimilate this information using scripts, spreadsheets and the like.

Note that Discovery takes time and will delay the implementation and adoption of your DevOps tools. This time will however be a very good investment!!!

If you have no tested content ready to deploy immediately then leave deploying the agent for later unless you can benefit by low-level infrastructure data like CPU count, RAM etc. Because if you have an agent running and someone inadvertently attaches a policy then you will have issues.

DevOps and what you wont hear

DevOps is about Automation, the ‘A’ in the CAMS acronym so you really need to understand the following to make a success of it.

Data, Data, Data – metrics and instrumentation are key to planning, execution and understanding progress

Necessity is the mother of invention; if it ain’t broke don’t fix it <Powershell or bash might be the right answer>

Legacy businesses may be able to use existing automation tools as effectively as emerging tools for automation

Choreographing a service is more important than automating a server

Redressing the failures of historical behaviour is hard nee impossible; tread carefully with legacy

Open source changes rapidly; keeping current needs a process

Workflow ideas are emerging but confused in the Configuration Management community

You write positive actions when building something. Removing something needs a positive action too. Is an immutable infrastructure practical for you?

Physical infrastructure is harder than virtualisation which is harder than cloud

Community doesn’t replace your own expertise

It is all about infrastructure as code – consider your need for a UI and how your user experience will benefit | suffer

You can test from the inside out or the outside in; which is better is yet to be decided