Digital Factories – Accelerating Innovation Throughput
In their in-depth article McKinsey explores the dynamics and successes of the ‘Digital Factory’ model, a “construction site” where change happens.
This describes a new approach to organizing digital teams, a self-organizing structure vs rigid departmental hierarchies, being pioneered by the likes of Scotiabank in Canada, and Skyscanner in Scotland, who operate a culture of ‘tribes and squads’.
Ultimately the purpose and benefit in building this capability is to accelerate the throughput rates of new digital innovations, enhancing the capacity of the organization for Digital Transformation.
Software testing is a core capability pivotal to the success of this approach.
McKinsey describes the KPI improvements the Digital Factory enables:
“We see reductions in management overhead of 50 percent for technology teams in the DF, 70 percent in the number of business analysts needed to write technology requirements, and, as test automation becomes the norm, a drop of 90 percent in the number of testers.
Finally, we see top engineering talent performing at eight times the level of their peers, as measured with metrics such as code commits.”
On DZone Cynthia Dunlop emphasizes the critical insight central to this article – That traditional enterprise software testing methodology is no longer adequate, it’s too slow and change-limiting to facilitate the rapid digital innovation businesses now need to aspire to:
“This brings us to an inflection point: given the increased cadence and complexity of software delivery that the business now demands, traditional testing is not capable of de-risking (e.g., thoroughly testing) every release candidate. As the latest test automation research recommends, we must reinvent testing…and soon. It’s not simply a matter of more tools or different tools. Reinventing testing is a deeper transformation involving people, process, and technologies.”
CIO.com repeats this warning and the need to modernize the software testing function as a strategic enabler of Digital Transformation.
“The bottom line is that if you don’t treat testing as a strategic initiative that’s imperative to your digital success, your lunch is going to get eaten by your competitors.”
DevOps.com even reported that enterprises like Merck experienced an exodus of valuable developers specifically due to them sticking with traditional legacy software testing approaches.
As ever technology alone isn’t the answer, people and organizational change is critical too.
Going Cloud Native – Mastering Chaos
The strain on traditional software testing methods is increased exponentially as organizations wholly embrace the latest development practices, notably the shift to the ‘Cloud Native’ paradigm, one based the use of containers, microservices and high frequency releases via Continuous Deployment. Netflix is the poster child of this new world.
To date enterprise architecture has been a domain based on a fundamental precept of a mostly static environment, a fixed set of applications with relatively infrequent changes made to it, maintained in one or more strictly controlled data centres.
In contrast Netflix now operates a global infrastructure spanning multiple AWS zones executing thousands of inter-operating microservices, continually spawning new ones and auto-scaling Cloud infrastructure; managing it is a process of ‘Mastering Chaos’.
As they repeatedly describe testing is fundamental to this, integrating the lessons they’ve learned into best practices applied automatically in their Continuous Deployment life-cycle of new code through their use of Spinnaker, such as canary analysis and staged deployments. Cloud guru David Linthicum makes the point that Cloud Native efforts won’t succeed without a suitable test automation capability like this.
What is especially notable is they don’t just apply testing to the process of writing and deploying new software, but also they rigorously test the whole system.
In the Mastering Chaos presentation Josh describes how they use techniques like Failure Injection Testing to simulate the failing of microservices. In the Microservices at Netflix Scale presentation, at 36:00 Ruslan demonstrates how they tested the failure of entire AWS region.
Netflix have termed this approach as ‘chaos engineering’ – In short they assume the system will fail and proactively test and simulate for this happening. At 11:00 Ruslan describes how they apply these principles in action, such as their use of Chaos Monkey for automating failure testing.
In other words Netflix applies testing from top to bottom, start to finish, of their entire environment including but not limited to their software development life-cycle. Given the principle of ‘infrastructure as code’ they know that failures can occur at any point within the overall environment not just the code they write.
For organizations seeking to emulate this transformation a number of methods and tools can be considered, including of course the components Netflix have open sourced.
The Cloud Native QA guide repeats the Netflix philosophy, notably “It is important to not only design for failure but test for recovery”. They recommend a series of QA practices for adopting the same type of culture as Netflix, various ways to test for failure and recovery as they do, and using tools such as OpenTracing.
Fernando Mayo explores the modernization of testing for this new microservices world, highlighting practices like property-based testing, fuzz testing, and mutation testing, that can help detect a wider range of defects in an automated way.
On Linkedin Shachar Landshut also proposes a framework for testing microservices, escalating up from testing individual services through integration testing and ultimately the chaos engineering approach that Netflix utilize.
Cloud Native Digital Factories
These two approaches, Digital Factories and Cloud Native DevOps, are the two half pieces of an overall jigsaw, they combine perfectly to make possible very high velocity digital innovation.
The most significant transformation that Netflix describe is to move away from centralizing functions like QA and testing. In the Microservices at Netflix Scale presentation Ruslan explains that the topmost priority for Netflix, over reliability and efficiency, is digital innovation, and they identified these organizational stage gates as choke points that directly limited this ambition.
So hand in hand with transitioning to a microservices architecture they also devolved QA testing to each individual service team, empowering them to take full life-cycle ownership of the services they created and managed.
Enterprise businesses serious about their ambition to master Digital Transformation face a challenging scope of change, their departments, tools and procedures for software development are long embedded and entrenched. However they now live in a world where companies like Netflix are on the other side of this journey, they’ve spent over seven years now undergoing the change.
The best practices for emulating this approach are now well understood and so some if not many of their competitors will also now be underway with their transformations, elevating them to a heightened level of competitive advantage that will be very difficult to catch.
As they are well understood it does mean however that they too can harness them to be the one doing the eating, not the one being eaten.