There is no doubt, that the most important part of almost any business is its staff. People may think this is changing with new technologies but even fully online businesses with a tiny team won't get very far (or anywhere actually) without the team that builds and maintains their platform. Cloud transformation is another perfect example of this. Without the team in place, success is pretty much impossible and painful failure beckons.
There's quite a lot of opinion out there on how to set up these teams, what skills are required and how best to get them. I thought I'd add my opinion into the mix. This is by no means the finished product or something that will fit all businesses, but it is something I've had some success with both internally and externally.
The concept of a Cloud Centre of Excellence is quite widely (although not wholly) agreed across the market, and for the most part, I agree. However, the roles people may wish to include in this team vary hugely, and I must admit I'm not particularly wedded to the name CCoE, but everything else I can think up is really gimmicky so I will stick with it in this blog.
The first step of a CCoE is who takes final responsibility for it? CTO/CIO? My recommendation would be that it is one of these two roles, while they should be accountable to report to the CISO, CFO and other C-Level on specific areas of governance/operation. The reason for this is that this change is really company-wide and as such needs to be viewed from the highest point possible by someone who still understands the technological challenges and benefits of the transformation.
The next piece of the jigsaw is starting the team, and for many organisations, this requires breaking down boundaries that have existed for years. In your initial steering team (which is the building block which you will build on moving forward), you require people from Infrastructure, Applications, Security/Compliance, someone with experience of IT budget ownership, service management and developers if they are separate to the application team. This will help get a workable footprint of managing legacy/mode 1 applications while being able to move forward with DevOps for cloud native deployments. By having someone who understands the current financial modelling around IT, you can help develop how cloud computing can be balanced with this, which will reduce resistance from finance and lead to a smoother transition.
So now we've found the potential people to kick the CCoE off with, what roles are they and others going to take moving forward? We break it into 8 simple roles.
Here is a brief breakdown of the roles, who might be suitable now and how they'd need to develop to be successful in this role moving forward:
Site Reliability Engineer
Definition: An SRE is responsible for maintaining a specific set of services to deliver an end result to the customer. The system will have been built predominantly using automation and Infrastructure as Code (IaC), and as such, they are not only responsible for keeping servers on, but more importantly, they make sure unhealthy servers are automatically replaced meaning no service disruption. The focus of an SRE is maintaining an SLO (service level objective) rather than a specific uptime SLA.
Who? Developers or SysAdmins who aren't afraid of code.
How do they get there? The path is slightly different depending which path they've come from, but key to their development is learning IaC tooling and having good platform knowledge. Generally, Developer/Sysadmin/Architecture certifications will all be relevant.
Definition: Infrastructure Engineers and Site Reliability Engineers have a lot in common. The difference between them is simply a matter of the scope of responsibility. Both roles automate the build of infrastructure, both roles monitor and maintain infrastructure. The difference is that Infrastructure Engineers are focused on the platforms and shared services supporting the rest of the organisation.
Who? Traditional Infrastructure or Platform teams are well placed for this.
How do they get there? Certifications and experience, working closely with the Architect team on the big picture.
Definition: A critical ongoing function to be able to deliver long-lasting value to the business in any platform or migration that's planned in the cloud. The agreed building blocks must be secure and any variance from them must be understood from the very start. The SecOps function should be involved at the design and review phase of every project and be considered included rather than separate.
Who? Traditional security teams, or if one doesn't exist the most security-minded architects or Sysadmins.
How do they get there? If they are in the traditional security team, the main change is the experience of being included throughout the process and being seen as an enabler. The book "The Phoenix Project" has some great examples of this.
Definition: An architect role is twofold; firstly to understand the environment build for particular workloads, but more than this to continue to integrate all environments and designs into the wider knowledge base for the overall cloud environment. Basically, they are responsible for the big picture, and to do this they must understand the smaller pieces as well. It's critical to note the security exceptions required and the communications between environments and have them logged in code and the design documentation.
Who? Solution Architects and Enterprise Architects can work together in these roles.
How do they get there? Generally hands-on experience and work through the certifications available through the hyperscalers.
Definition: Network stops being about wires and physical switches very quickly in the cloud world (although some of these will remain in your LAN/WAN) and starts becoming entirely theoretical but still as important as it ever was. Your SDN remains the absolute foundation of your environment and your security. The network engineer must continue to understand and log topology while making sure the environments are secure by design. It’s worth noting where the lines are drawn, Network Engineers are more about the fabric of connecting all the environments together than the single environment. They help advise on the local and are responsible for the global
Who? Traditional network engineers. .
How do they get there? I personally don't think this is a big jump for anyone who is CCNA accredited. SDN is far more theoretical but the software of running networks has always sat above the wires anyway to add intelligence.
Definition: The Project Manager remains much the same as the go-to contact for the project; the person responsible for making sure the work remains on course and will deliver the end business value as expected. Personally, I'm a big fan of Agile using Lean methodologies such as Kanban to support them.
Who? Project Manager.
How do they get there? This really comes down to them working closely with the technical team and immersing themselves in the cloud, the role has certainly become more technical. Probably get them to read "The Phoenix Project" too.
Definition: Finance (as has been and will be discussed more in this series) are critical in a successful cloud transformation project. Finance will be challenged with new types of cost models with continued variance and, in many cases, harder to reconcile numbers. In my opinion, the change needs to be from seeing IT as a cost compared to an enabler so that the varying costs are not evaluated in comparison simply to the costs of previous months but instead to the value delivered to the business. If you’re able to create a meaningful correlation between say IT costs and website sales then the varying cost in itself becomes less of a problem.. Finance should be kept involved in all projects during the design/costing stages and understand what changes throughout the project that may impact cost. As well as this, they should work with the team to have continuous FinOps and long-term Governance.
Who? Finance people with experience of flexible costs or Business Analysts.
How do they get there? Hands on experience is by far the best way.
Definition: There will be some workloads that remain mode 1. As such, a traditional service desk with service engineers will be a great function for these slow moving workloads where a DevOps approach would be inefficient.
Who? Traditional Support Staff.
How do they get there? No major change, but it's a good idea to start getting them involved in mode 2 as well so they can learn.
It is important to understand you will not be able to train all of your engineers in these roles at the same time, and a big part of “how do you get them there” where I’ve not mentioned but is accurate in all cases is what I call Knowledge Osmosis. Getting the team working collaboratively together, shadowing where appropriate. One of the easiest ways to start to spread knowledge is to follow the train the trainer approach.
With this team in place, with understanding and input from across key areas of both the business and the traditional IT function, you should be in a good place to take the next steps. Over the next few articles it will become obvious how integral this team will be. As such it’s important to spend time in getting this foundation correct from the start.
Stephen Old, Claranet UK