I know this has been done many times before, but I still see random people on Reddit asking this question, and it’s clear to me that almost no-one who has written up job descriptions for any remotely ‘devopsy’ related job posting ever understood what DevOps really is, or rather what it can be.
DevOps – The ‘Technical’ Definition In a Nutshell
The technical way to explain DevOps is to talk about software delivery and cycle times. Essentially it involves a cycle whereby developers can quickly make code changes that get built, tested, and released in a nearly fully automated fashion. After the release and use by the customer, in theory then there is also observability and tooling to know how well the code performs and how the users are using it, allowing the developer team(s) along with the business to know as quickly as possible what the results of the code changes were.
The tooling used to build, test, and release your software changes is called ‘CI/CD’ tooling.
The ‘CI’ part here stands for continuous integration which is a fancy way of talking about having code changes test themselves with automation. The end result is that when someone makes a change to code and is ‘done’ with their changes, any testing needed to ensure these changes will work in the application should already be done, via scripts and tools that have already been made to accomplish this task. This might involve anything and everything from code functionality tests, end-to-end testing with other dependent applications, security tests, build tests, and on and on. The sky’s the limit on this stuff and how much testing is done really is determined by the teams and the organization putting out the product.
The ‘CD’ part of this stands for continuous delivery. The idea behind this is that once the testing is done and is ‘all OK’, the code changes to the application get put out there automatically for the end user to use. Often times teams will have manual approvals in place, so this may not happen 100% without human intervention, but the ‘idea’ at least is that if your testing is good enough, then the changes should make it to the consumer right away.
Observability and feedback has many facets. There might be performance monitoring in place to detect errors or issues with speed and usage. There might also be other measurements like how the user uses the new features, or even focus group or other testing to get feedback from the consumer back into your development process. The market is always right! If a change is made to an application and it either breaks things in some way or the users dislike the change, then we need this feedback to tell us to either fix our changes or undo them altogether.
Over time this cycle is supposed to help us produce better software, faster. The automation in place for testing and deploying changes speeds up processes that used to take hours, or even days of time from developers and QA teams who tested changes to applications.
DevOps – The Philosophy
Years ago, software development processes, especially in larger organizations, were very different than they are today. Silos and separation of job duties ruled the day, and changes to software and applications tended to be made in larger batches. An example of this might be a monthly, or even a quarterly release cycle, where bug fixes and feature changes and additions were worked on for a while and then released all at once. Developers wrote code and checked that it worked to a point, then handed the changes off to a team of QA testers who would test out these changes in various ways, sometimes even by manually clicking buttons in applications to be sure things ‘worked’. Operations teams were then handed the new code to run and were expected to ‘get it working’, whatever that entailed. When ‘release day’ came around, it was an all-hands-on-deck affair, where anything that could go wrong, often did.
Since each team had its own set of priorities, knowledge areas, and timetables, things didn’t always happen as quickly or easily as desired. When applications stopped working, it would be a struggle to get enough of the right people together to figure out how to solve the issue, and often times more effort would be spent trying to point the finger at another team than actually solving the issue. Problems often fell through the cracks because no team technically would claim ownership.
The DevOps philosophy is about fixing that whole mess, and it’s really about improving how real people interact and work together. There are many ways to approach this, of course. A few general ideas are listed below. They ‘seem’ simple, but keep in mind that these ideas are cultural as much as organizational and adoption can be hindered if a company’s culture is opposed to them.
- Teams that own applications, not pieces of the puzzle. If a team is responsible for building, testing, releasing and ensuring that their own code runs correctly then the ‘buck’ so to speak, has nowhere else to go. You build it, you run it, you fix it, essentially.
- Less of a divide between developers, testers, and operations. Your teams might even have people all on the same team with varying degrees of specialty in those areas.
- More leeway for teams or people with different backgrounds to work on development, testing, deployment, and operations, together. A mixing of backgrounds can produce a better outcome, and better collaboration and team spirit when things need to get done.
- Automating things that can be automated. I include this as the idea and practice of automation has been critical to the success of the DevOps philosophy over the years, speeding up processes and allowing changes to be made to code much faster than ever before. If something is slow or hard to scale, maybe it needs to become more automated!
- Communication. Making sure that free communication exists between individuals and teams is important. The whole idea of ‘submit a ticket and we will get back to you’ is a non-starter here. Ticketing is important to track and prioritize work, but more direct communication brings people closer together and can improve understanding and team cooperation. It can also help align priorities across teams and improve engagement.
- Accountability. If the person or team who made a change isn’t also accountable for the outcome of that change, and cannot fix that change themselves, then the process of fixing issues or making things better for the customer gets slower and more convoluted. Finger pointing and ‘not my problem’ starts to become the norm, and the product quality can suffer as a result.
The basic idea here, of course, is that producing a better outcome for the business and the customer needs to take priority and guide the way that teams and people work together. If you are doing something you think is ‘DevOps’ but the end result is worse, you need to go back to the drawing board and make it better. The principles also extend outside the boundary of just ‘technology’ as well!
DevOps – Examples In Practice
The Custom Internal Finance App
Team A works on an internal application that is needed by accounting and other departments in the company. Team B built and maintains the servers that the application runs on. Team B has many other responsibilities and often takes a long time to deploy Team A’s changes to their servers, resulting in a slowdown inside the business for some of the finance related departments.
A Possible ‘DevOps’ Solution:
Meetings are scheduled between the two teams at the behest of someone higher up in the organization. Team A really doesn’t want much to do with the servers and team B has too many other priorities. Eventually someone realizes that perhaps there is a way for the work to get done with less human intervention. Team A creates an automated testing process and, along with someone on team B, an automated deployment method is created that takes code changes team A approves of and directly rolls it out onto a set of test servers. Team A can then test the changes out on live servers and also has a way of automatically promoting those changes to the real servers when desired. The deployment process created by this collaboration ends up being useful to speed up similar processes elsewhere in the company as well.
The Ticketing System Roadblock
Team A is charged with handling requests that come in to a ticketing system. All the other teams are held hostage by the timing and priorities of team A and often, upper management needs to step into the fray to push their own tickets to the top of the pile. Around the company, the ticketing system is known as a ‘black hole’ for requests. Everyone, of course, feels that their requests should be top priority!
A Possible ‘DevOps’ Solution:
It is decided that tickets need to have a priority assigned to them for tracking purposes. In addition a field is added to the ticketing system where requestors must add to the ticket what is blocked by this ticket not being done quickly, i.e. what they are enabled to do when this ticket is completed. Team A sees this extra communication and begins to better understand the needs of the people requesting tickets. This helps them automatically prioritize tickets in a more understanding way. In addition, someone starts to notice that certain tickets and requests that block the work on other teams can be automated, and extra systems are added so certain types of requests can be handled automatically for the users, greatly speeding up the ticket handling process. Team A now has less of a backlog and can focus on automating more things and getting one-off requests done at a faster pace.
More DevOps Problem Examples…
The Log Spamming Code Change
Team A is responsible for maintaining a long-standing and generally unchanging application inside the company. Team B handles the logging stack and servers for all applications and has noticed that a recent change to Team A’s service is now generating millions of log lines per week and costing the company extra money and resources.
A Possible ‘DevOps’ Solution:
Team A has been fairly siloed, and is the only team handling their own code changes, but agrees to let other teams submit code changes to their repo directly as pull requests. Team A has been working on their testing automation recently and is fairly confident that code changes made by external teams would be fine if the tests passed. Once team B is now allowed to access the code of team A and suggest changes via a pull request, someone on team B who has always been interested in getting more coding experience decides to tackle the problem. They find that a recent change to a library inside team A’s code is the culprit, and they submit a pull request to make a fix. Members on Team A approve of the change and test results and the code is deployed back into production. In the future, collaboration between the two teams is higher, and team A now includes some members of team B on some of their design meetings. Team A now has a fast track to feedback from team B when considering major changes to their service. Other teams also start to suggest and make changes to team A’s codebase, each team finding ways to make the legacy application better for their own use-cases. Other processes are improved throughout the business as a result.
The Load Bearing Infrastructure Engineer
Several teams A and B use a custom, purpose built environment to run their code. The environment was built and is maintained by a single member of the company’s infrastructure team (Bob), and few other people know enough about this environment to do anything with it. Bob takes great pride in his knowledge of the custom environment, but often his changes to the environment and long term plans for it clash with what the software teams need for their running services. Bob is a member of the infra team, so not only has other priorities outside of this custom environment but has his own take on what matters and what doesn’t. Things come to a head when a highly skilled lead developer threatens to quit over the issues.
A Possible ‘DevOps’ Solution:
Management decides that it may be possible for Bob to become a member of more than one team and have multiple managers. Bob is now asked to attend meetings for a number of software teams, while his meetings with the rest of the infrastructure team are scaled back. His duties for the infrastructure team are limited to lessen his workload but he is now asked to also take on requests from the different software teams using his custom environment. The change comes with a new title for him as well, and he gets to learn a new set of skills from the software teams on top of that, making him feel more valuable. Initially this change is difficult, but as Bob attends more meetings and gets to know the software teams using his environment, he begins to see himself as a member of their teams, and his desire to come up with solutions to their common problems increases. His specialized knowledge also spreads out to the new teams he is working with and the overall experience of everyone involved is improved.
DevOps – Benefits and Goals
Possible Benefits Of Implementing DevOps Practices and/or Tooling
- Quicker time to market for new features, fixes and changes in your application
- Better feedback loops from the end-users back into your development and design processes, resulting in your product being more reliable and useful overall
- Smaller, faster iterations on product changes allowing you to pivot quickly if feedback shows your ideas were not a good fit for the end users.
- Better collaboration between teams and more aligned focus on business goals over individual team priorities
- Reduction of roadblocks, freeing up your teams to ‘get things done’, whether via fewer siloes, platform tooling, or automation
- Better scaling of processes with automation and cross team collaboration on architecture and implementation design
- A faster, more fluid flow of ideas and knowledge between teams and individuals focused on common goals leading to more innovation
- Better morale and engagement for teams that were previously blocked, siloed or not communicating with other teams or the rest of the business
DevOps Implementation Strategies
- Automate testing and build processes to the point that a proposed code change to a service is automatically fully tested enough to know that it will work when it is approved. This is an ideal, of course, but 80% coverage is much better than zero. Continue to improve testing every time an issue is found in production that wasn’t tested for, if it makes sense to do so.
- Consider architectures that allow DevOps practices to succeed, if they make sense. Microservices and containers isn’t for everybody but it can help in many cases with cycle time, ease of testing, portability and reliability. You will have to consider how to design things in a way that allows for some of your other desired changes to work as well.
- Automate deployment processes such that either approved and tested changes are automatically deployed or that at the very least all one has to do is check a box somewhere for a code change that has been tested to roll out to the end user directly.
- Ensure that every metric you need to measure on your service to know it is working ‘to spec’ is being measured. Try to set this up in a way that allows the developers to add the monitoring themselves, and/or have automatic templates that they can adjust. If there is no feedback on how something runs, then how does anyone know whether their changes are making things better or worse?
- Use additional tooling for end user experience monitoring, error monitoring, or anything else that makes sense in your case. Direct customer feedback is also good. Make sure all information on how a service performs or is accepted is fed back to the same teams that design and maintain the service.
- Build security into everything you do. Automate security scans, updates and tests, and anything else as far to the ‘left’ (in the code itself and not after the fact) as you can. Security is just as much a part of proper code and system design as anything else is, so make sure you don’t put security in yet another isolated silo or forget to make everyone responsible for its implementation.
- Everything as code mindset. Easier said than done, but collaborating on code and making improvements to programmatic processes is easier than trying to do the same thing with a manual process.
Cultural and Organizational Strategies
- You wrote it, you fix it. Try to make sure that accountability for fixing an issue gets back to the same people that create that same issue in the first place.
- Having different teams is inevitable, but they don’t need to have constraining, defined roles. If two teams work on products that interact closely, consider having them become part of a larger team, or have cross team meetings, especially when designing new features or considering large changes.
- Consider putting a wider range of talents (and the permissions that go with them) on teams where it makes sense. The more a team can get done without having to wait on something outside of themselves, the faster they can get things done.
- In some cases, actually embedding ops engineers on development teams may make sense to help keep the different sides of the fence aligned on longer term goals and needs.
- Less ‘gatekeeping’ and more ‘self service’ where it makes sense.
- Foster a culture of direct communication and cross-team collaboration where possible.
- Consider having teams work on projects together towards the same business goals. For example, if the business wants to save on cloud costs, don’t just assign cloud engineers to the task, but find a way to have them collaborate with some of the software teams on the task, or create ‘virtual cross teams’ who meet less frequently and only work on that one project. Each team may have their own solution to the problems being worked on, and a better outcome may result.
- When problems occur, try to get them prevented at the source, as far into the beginning of the software lifecycle as possible. The last thing you want is an ops team manually ‘fixing’ issues and not feeding this back into the development cycle. Both sides can and should come together to offer fixes to the problem at their source, and this won’t occur if neither side allows the other to make or suggest changes. Priorities differ, so maybe your ops people can help themselves out by making changes to the application code with approval, and vice versa if changes need to be made in the operational environment that the developers would like to see. There are practical limits to this of course, but everyone should feel empowered to help improve things and not be at the mercy of someone else’s priorities.
- Document your code, document your processes. This is an often missed part of trying to open up processes and break down silos. You cannot allow other teams to have more autonomy to do what they need to do to make things better if they don’t have a clue how to do it.
DevOps Is Never ‘Done’
The thing about this kind of DevOps is that it is never done. There is always something you can do to improve how people work together in your organization, even if it is just within your own team. Don’t just make changes willy nilly and hope things are ‘better’. Find out. Ask around, see where people are getting stuck, and identify why. This is an iterative process, just like developing software, but on a larger scale.