InfoQ Homepage News Debate: What is the Role of an Operations Team in Software Development Today? [Updated May 10th]

Debate: What is the Role of an Operations Team in Software Development Today? [Updated May 10th]

Apr 30, 2010 29 min read

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

[Last update: May 10th, 05:00 GMT - final notes and summary added]

In the last several years, with the rise of such phenomena as Cloud Computing and DevOps, there has been some debate about the role of the traditional Operations team as it is often found in today's software development shops. InfoQ will explore this debate further, to get an understanding of the different aspects which are involved and the tradeoffs of each approach.

A big question which lies at the core of this debate is:

Who should be responsible for the management, monitoring and operation of a production application?

To start off the discussion around this debate, we have asked for input from Bjorn Freeman-Benson, Director of Engineering at New Relic (a provider of SaaS-based application performance management tools), and Carlos Armas, the lead editor of InfoQ's Operations community. However, this is just the beginning of the debate - as the discussion grows and evolves InfoQ will update this article with the latest discussion, and the discussion on Twitter is also being tracked by following the #roleofops tag. We want you to participate in this debate, so please feel free to send a tweet with the #roleofops tag in it, send us an email at feedback@infoq.com with your input, or leave a comment on this post to add your voice to the debate.

Bjorn Freeman-Benson: There has been a lot of talk, blogging, and general powerpointing about the topic of the changing role of operations. A lot of that communication has included a plea to bring development and operations closer together. They may call it Dev-Ops or another catchphrase, but it's always about how Development and Operations Management need to communicate more effectively to manage production applications. While my colleagues here at New Relic and I agree with that generally, we think companies should consider a more definitive step toward more effective operations management - that is, consider making each development team responsible for the deployment and performance of their own applications.

Lets consider some reasons why this is not as unusual or as radical as it may sound.

First, who knows more about an application than the team that created it? Leading up to the moment of deployment either in the datacenter or a Cloud, the application development team has conferred with the business app owners, designed, architected, and coded the application, selected and tested and integrated the various components of the application (app servers, OS, databases, integration middleware, etc.), created prototypes, ran function tests and maybe load and scalability tests, demonstrated the capability to the business, and finally got the app ready for production. How can weeks or months of collective knowledge possibly get transferred to the Operations team?

Shouldn't the dev team make the critical deployment choices - scale up or scale out the hardware hosts, to virtualize hosts or not, what's optimal CPU and memory, etc? Shouldn't the dev team decide what the best performance monitoring and logging they would need? Shouldn't the dev team monitor, deal with alerts, handle performance and availability incidents, and deal with the cranky calls from the business when the app goes whacky? (Why should Operations have all that fun?) I know this approach can work - we use it ourselves at New Relic to manage one of the busiest SaaS applications in the world. Ironically our SaaS application is one used by nearly 4,000 dev teams to manage their production applications.

Carlos Armas: It is attractive to think that because a team designed and built a system it is the best prepared to operate, monitor, and scale it. Extrapolating that logic, I would contend the development team should be doing the company accounting because software developers are good with numbers, or cleaning up the office after hours and taking out the recyclables to the containers outside since they care for the environment. Doing so ignores a practice that is several hundred years old: division of labour. The fact that the development team knows the application better than anyone else is not a good reason to give them responsibility for operation and maintenance once the code is released to production.

I will assume the role of a business owner, and give you three reasons why I do not want this to happen to my business:

1) Financial: A software developer's average salary is consistently higher than an operations support engineer's average salary. As a business owner, why would I want my most expensive team to be performing operational tasks which other teams can do in a more financially effective way? As any good ScrumMaster in any Agile team would tell us, removing obstacles and making hidden work visible is one of their primary tasks, so the development team can more efficiently develop code. Why would I willingly bring obstacles and extra work to the team to reduce its velocity?

2) Quality: I believe in check and balances, because nobody is perfect. When a team is responsible for the full lifecycle of the application, the customer is the one who ultimately suffers the consequences: a bad user experience. When the team that develops the code is the one that decides whether its quality is good enough to release, there's an increased risk of releasing imperfect code. Owning the full lifecycle makes people complacent, and the "we can always fix it in production" syndrome kicks in.

As a business owner, I would prefer to keep instead a system of check and balances, so our customers are well-served and reward us for that:

I would not want the development team responsible for doing QA, but accountable to the QA team for the quality of the code
I would not want the QA team to release and operate the application in production, but instead be accountable to the operations team for thoroughly testing the application so issues are detected earlier and bad product rejected back to development before it hits customers
I would not want the operations team to assume the role of constraining the frequency at which we release new functionality to customers, but be accountable for application availability and performance, and be responsible for releasing and operating the code, as well as feeding back bugs and imperfections that made it out despite the inspection and test process

Essentially, I would want quality to be a shared responsibility, with teams accountable to each other for very specific areas and roles. And I would want the relationship among the teams to be one of positive conflict fuelling continuous quality improvement, with the customer as the only end in mind.

3) Opportunity cost: Who is going to develop new features to improve the application when the software team is busy operating it? Who is going to fix software bugs? Will developers leave in the middle of a next-generation design session with the product team because a pager went off alerting them of an Amazon EC2 instance failure?

Bjorn Freeman-Benson: Secondly, what really is the role of operations when apps are deployed in the cloud? More and more web applications are being deployed onto public or private cloud infrastructures. At New Relic, more than 40% of our customers have apps deployed in a cloud, and we expect that group to be in the majority by the end of 2010. Granted, many of these are smaller companies without the legacy datacenters or data integration worries that larger enterprises have to deal with. But every day we signup a new customer who represents a large organization with a traditional datacenter, and the customer's application is deployed at Amazon or another public cloud.

So, in these cases, what role does Ops play? They are not responsible for cloud hardware, networks, telecom. They don't make the choice about infrastructure monitoring. The only thing belonging to their company in the cloud is the application code itself. Everything else is the responsibility of the cloud infrastructure provider. So, to put it bluntly, when my app is deployed in a cloud, who needs Ops? The Dev team must be responsible for the successful performance of cloud based apps. If not them, no one will.

Carlos Armas: Interesting, it seems as if folks expect systems in the cloud to manage themselves, which is a mistake. Let's go back a few years to illustrate the point and talk about managed services.

Managed services as we know it today began when hosting providers realized they could potentially go beyond offering hardware and bandwidth to their customers, and include systems administration and application management in their portfolio. It turned out to be a tricky business, and it was very hard to deliver with consistency and good quality. Only a few providers got it right, and at a very high price.

From the above perspective, cloud computing pushes system administration and application management back to the customer. (We do not intend to define cloud computing here, it probably deserves a different debate :) )

So what is Ops going to do in the cloud? Systems administration, to begin with. Application deployment, monitoring, issue escalation and response. 24x7 on-call support does not change because the systems are in the cloud. System administration (be that Unix, Linux, Windows, etc.) is an activity developers do not do well because it is not their area of expertise, just as system administrators are not good software developers, or good marketing and communication executives.

However, the composition of the operations team changes gradually as (and if) cloud computing becomes pervasive. You mentioned hardware, telecom, and networks previously. Obviously the demand for those skills will migrate from the cloud customer's organization to the cloud provider. The system administrator role will remain with the customer, and will be just as badly needed as before (possibly moreso) since the provider no longer manages the OSes of your virtual servers for you. (And remember, as a business owner I want the software engineers to develop code, and my ScrumMaster to prevent them from doing hidden work while also removing other obstacles in their way)

We could also argue that in such a scenario the Operations team ceases to exist, and the operations engineers become part of the software development team, or another team. That could be possible, and in fact in small organizations it currently happens. However I do not think we are concerned about the organization chart here, but more about the role of operations and development within the context of emerging hosting technology.

Essentially, my perspective is that the software development team should not be responsible and accountable for operations tasks not because they are not able to do it, but because it makes no sense financially, organizationally and business-wise.

Follow-up added April 16th

Bjorn Freeman-Benson: Before I provide another "proof point" in my argument that developers should be responsible for production operations for their applications, let me respond to a couple of Carlos' points. Carlos, I know you are being facetious when you say "[maybe] the development team should be doing the company accounting because software developers are good with numbers." No, I am not saying developers are capable of doing everyone else's job. But, we are not talking about having developers use operations skills that are way outside of a developer's capabilities. Accounting is pretty foreign to developers. But configuring hardware, deploying the software stack, managing the server capacity for a given application, establishing backup schedules - none of these Ops tasks are beyond the capability of the typical developer.

Further, when we are talking about Cloud, you say "...it seems as if folks expect systems in the cloud to manage themselves, which is a mistake." No, clearly cloud infrastructures will not manage themselves. But we are saying the sys-admin work associated with the cloud is done by the cloud provider. So I am saying that if we are running IT in a company whose apps are deployed in the cloud, why do WE need sys-admins? We need the developers to do most of the application management save what the cloud team does. If one uses one of the cloud platforms - from RightScale, Heroku, Stax, Gigaspaces, EngineYard, and others - most of the sys-admin-related work is done by the platform, not by human sys-admins.

Now let me give you another reason why Developers should take responsibility for the application in production. Who is better able to find the root cause of performance problems than the team that wrote the code? Let's say the Operations team is alerted, either by a performance monitoring tool or by the inevitable angry phone call from the business people, that an application's performance is lagging. What can the typical operations staffer do with that information? If they are extremely lucky they will be able to isolate the cause of the problem to some piece of failing infrastructure. Frankly that is relatively rare. Usually the various infrastructure monitoring tools Ops uses are showing all "green lights." More often than not, the problem lies inside the application. Who knows more about what is happening inside that VM than the development team? Most ops people are not developers, cannot read code and would not be able to track down a problem whose root cause was buried within the application layer. Why have a middle man in operations get alerted simply so he can turn the problem over to the dev team anyway? Just have the developers get the alerts and jump on the problem sooner! If it's their code that is the problem they should be the ones remediating the problem. The secondary consequence of this system is that developers become a bit more diligent about the code they push into production, knowing they have to live with the results.

Follow-up added April 19th

Carlos Armas:

Yes, I was in a very facetious mood. :)

I find humour to be a wonderful vehicle in facilitating communication and understanding. The more I read your comments, the more it feels as if our ideas are not as opposite as they look to the casual observer or at first sight.

I agree developers are able to learn and perform many Ops tasks, definitely. Keep in mind I still want them to be concerned with writing code in the first place, as I wear my business owner hat. I will go back to this point as I address a very important point you make below.

We have a very fluid, and evolving concept here: "cloud computing". If you ask me to define it, I will probably run the other way. (I do not want to attract the attention of the hordes of modern-day cloud computing evangelists).

Suffice it to say that cloud computing is very broad, it concerns services that require minimum or possibly no operations professional support (Heroku, et al.). It also involves services such as Amazon's EC2, Rackspace's RackspaceCloud, Opsource's OpsourceCloud (to name a few) where there's a substantial amount of Ops work involved, depending on the kind of application to support.

There is a strong case to make for a SaaS provider focused on a very specific service to have a homogeneous team that keeps the service ticking from concept to delivery (which might be the case of New Relic)., with razor-sharp focus on delivering the application.

One possible contrasting example would be a company that decides it doesn't want to spend a lot of capital in powering a development environment and moves its development infrastructure to EC2. Fast provisioning, quick turnaround, what's not to like? There are many other examples that come to mind.

So the moral of the story here is, "it depends".

With regards to your point about root-cause analysis, Let me start my counterpoint by highlighting a couple of your statements which sounded like music to my ears:

"The secondary consequence of this system is that developers become a bit more diligent about the code they push into production, knowing they have to live with the results"
"Why have a middle man in operations get alerted simply so he can turn the problem over to the dev team anyway?"

As I see it, if Dev starts doing Ops tasks:

Developers would be more diligent, as they would have to live with the results of the code pushed to production
This would resolve the middle-man syndrome (Ops) caught in-between, not able to fix the problem yet accountable for the failure

Let me add my own grievances, based on direct experience:

Why is it that, in a number of companies (apparently a large number) Ops is perceived (and acts) as an entity that blocks change, and adds red tape to workflow to the point that it is almost impossible to be agile and release code to production?
Why does Operations consistently stonewall Development?
Why is it that Development circumvents strict company policies, and ends up buying cloud-computing services (because Ops did not provide the service in the first place)?
Why is Ops penalized for missing an SLA target if the failure was not related to faulty infrastructure or processes, but failing code?
Why isn't Ops taking advantage of the rapid, flexible deployment capabilities of cloud services? Is it that the "guardian" mentality of '70s computing practices is still alive?

Something seems clear to me: there's conflict now, but not being versed in the history of information technology practice, it is hard to tease out the reasons why we got here in the first place. My untested and simplistic theory is that change is bad for operations, while software development is change. So there's a "primal" contradiction that needs to be managed wisely.

Let me put my business hat back on, as I run cost numbers and try to provide a process to manage (wisely?) the conflict. Let me express it in Agile user-story statements - as a business owner, I want:

Software engineers not doing the company accounting (didn't you see that coming? :) )
Operations engineers primarily focused on 24/7 service availability
Software engineers primarily focused on service improvement
A zero-wall policy between development and operations
Operations engineers as core members of Agile teams
Software engineers regularly rotating through on-call duties for third-level escalations
- In pain, there is learning
Development and Operations sharing responsibility for application availability and latency
- I want to be blunt here: missed target, no bonus for either team, I do not care who broke it. corollary: in financial pain, there's learning
Operations engineers required to learn the core, essential parts of the service application layer
- To the point they can help setup trend monitoring, and be able to predict failure build-up scenarios

Essentially, I still want my software development team to write code and build new features that amaze our customers, and not be distracted by anything else (including cloud computing). I still want my Operations team (either a multi-engineer team or a part-time remote sysadmin guy) to be tuned up and extremely responsive to the team that builds the stuff our customers want. I want my software development team to be accountable for the code they wrote and deemed fit to release to our customers. And I want my Operations to learn the application layer to the point they can call out bug vs. infrastructure anomaly (and as the icing on the cake, stop complaining about change).

Follow-up added April 20th

Bjorn Freeman-Benson: Carlos, thanks for clarifying your position. In brief, I agree with most of your observations about how Ops is perceived by Devs ("change-blockers") and I would add how Dev is perceived by Ops ("a bunch of friggin' cowboys.") And I see we have some interesting reader comments about our debate. I have also gotten some feedback offline (thanks @markimbriaco and @randybias) to the effect that my position comes across as black or white, and militantly against an Ops function. I didn't mean to, and hope I didn't portray Ops people as completely unnecessary or incompetent. I do not believe that no company needs an operations function. That is clearly not the case.

My position and my perspectives are focused on applications, and who has responsibility for them. After all, what is IT for, if not for application development and operations?

Let me use this posting to clarify and to react to some of Carlos' comments above.

First, all the discussions I have heard and read by ops people (and a good one to read is by one of our commenters, John Willis) tell us that no one knows that the role of operations is pretty dramatically changing better than the ops people themselves. Carlos, your comments show that, too They can see that their datacenters and applications are different than they were only a few years ago. It used to be that a datacenter contained a hodge-podge of proprietary technologies - a mainframe, some AS-400s, an RS-6000, some DEC minis, and some Wintel servers that "those web guys" used, plus a bunch of storage devices which needed frequent care and feeding. There are still datacenters with this kind of variety and in those, and for the teams running them, I think maybe less is changing. However it's not uncommon today for a datacenter to be comprised of 1000 Linux/Tomcat blades, all nearly clones of one another. It's also not uncommon for nearly all the applications to be web-based (Java, .Net, Ruby, PHP) and that in those datacenters there will be fewer management tools to learn and fewer proprietary systems to support. Cloud computing takes this picture to an extreme. So in more and more cases, the role of the ops team is being simplified by this standardization and commoditization. It's our contention that the picture I paint is becoming the norm rather than the exception.

Even in the case of the highly standardized datacenter (my 1000 blades example) there is still a role for operations. There are numerous jobs that need specialized knowledge - database administration, capacity planning, data backup and restore, disaster recovery planning, power management, telecom management, and a lot more. The people who perform these functions do so for the whole datacenter (or for the cloud provider if that is who employs them.)

The crux of my argument is that responsibility for application management should largely reside with the application development team, not with the operations team. And Carlos, (this will be a shocker) I completely agree with your observation that:

My untested and simplistic theory is that change is bad for operations, while software development _is_ change. So there's a 'primal' contradiction that needs to be managed wisely.

It is my contention that developers and architects are better prepared to make deployment, monitoring, and incident management decisions than the ops team because of their intimate knowledge of the application architecture and language. In the case of application management, a separation of responsibilities between Ops and Dev is less efficient. It's less clear who is responsible to the business for the success of the app. And finally, by putting ongoing management of apps squarely on the job description of the developers, your application quality will improve. Developers will no longer be allowed to hand off a poorly coded app to Ops and walk away from the ensuing mess.

I like your Agile story approach to "What the business owner wants" but I would like to hear some reader comments before I comment on those.

Follow-up added April 21st

Carlos Armas: As much as I would love to believe the role of the operations team is being simplified, (and I wish it were), I see the opposite happening.

The operations role, in my opinion, has been misunderstood and later minimized over the last 15 years or so. Not too surprising, because in large part it was the fault of Ops.

It started in mainframe times, when the MIS (Managers of Information Systems) took on the role of "priests of the computing temple". The adjudicators of "machine time" behind the glass walls operated with the principles of rite, secrecy, and separation. Too good for scrutiny, too in control to challenge in the realms of the business playfield.

Times have changed. As I see it, the simplest parts of the job have been slowly fading away. We no longer segregate /bin and /usr/bin to fast and slow hard disks, or nurture and pamper that 12GB-memory Sun E4500 that took over the place of a deity in the datacenter. I forgot when the last time I used a crimping tool to make my own cables was (thank heavens!). I also cringe and contort when I have to compile something because apt or yum will give me a slightly older versions which won't cut it.

I would say that the physical tasks of operations have long disappeared from our job description, and have been pushed down and away to starting/supporting roles. On the other hand, our job got increasingly more complex. The multi-server homogeneous datacenter (even the virtual, 'cloudy' one) brought a different, higher level of headaches and complexity. With kickstart, puppet, and other related automated deployment mechanisms came what I call "the atomic risk". A simple typo in /etc/sudoers in a single server might have been easy to fix - now we have the multiplier/accelerating effect of automation which helps the error to spread in a matter of minutes if not seconds to thousands of servers.

Our daily challenges have changed from "why is the compilation bombing?" to "how do I cajole my puppet module that deploys app Y to 120 servers to install release X, but not release X+1 before it's ready, so I do not end up with alpha-quality code in our production instances?" In that sense, I love how the 'constraints of the physical world' are becoming less of an obstacle in a cloud-provider environment. Negotiation, procurement, logistics, racking and configuration is done beforehand. That's progress. The trend began way before cloud computing, and is definitely welcome. Yet my job is definitely not getting simpler, though I am definitely getting way more done with the way automation has come to help.

Let's put it this way: it got more complex, but now I have much better tools to assist me. And, as a segue way to my next point: I am grateful to the developers that brought to life such automation tools, and that is precisely the reason I want them to keep developing new ideas, and not managing deployed applications :)

I agree with your thoughts about developers and architects being better prepared to make deployment/monitoring/incident management decisions in principle. There is no doubt in my mind developers and architects know better than anyone else what they built.

As for who is responsible to the business for the success of the app, I still have the perspective (maybe an old-fashioned one) that an application is part of the service ecosystem, and can't survive without an infrastructure foundation that supports it -- even in cloud environments, the infrastructure needs management, and I would rather have the folks who are specialized in that area performing those tasks. I guess this is a matter of viewpoints, we will likely agree to disagree here.

Now, going back to developers having app production management responsibility in their job description, as you mentioned: I like it! Immensely. "Give me your tired, your young". Rotate your developers to Ops team positions so they get first-hand experience of the needs and challenges of delivering a consistent, 24/7 SLA-backed user experience, while instilling the app-level knowledge. Same with newly hired developers. In reciprocity (retaliation?) I will rotate my ops engineers as members of your SCRUM teams, and experience first-hand the 'removal of obstacles', frustration with delays and red-tape, etc. This has the added benefit of taking down the (artificial) walls between the teams, so there's no more "us vs. them".

The above would ensure that I keep the developers doing what I need them to do (business owner hat on): building new functionality, but helping transfer the knowledge so the apps can be supported in production efficiently.

Follow-up added April 27th

Bjorn Freeman-Benson: Well this has been an interesting week. Sorry I haven't posted in few days. We deployed a significant new feature on our SaaS tool and we kicked off our next round of development. We now include production profiling in our app performance monitoring tool. We push something new at least weekly and do ad-hoc patching several times a week. Keeps us busy. Let me also say that at New Relic we also have a very good sys-admin named Bayard Carlin. I know, you probably thought by my comments that we had no ops people. But, no we have one. He is also our internal IT department serving our employees' needs. I will talk more about Bayard next post.

In looking over the comments from some of the readers, I saw several really good remarks that I would like to highlight and react to. In my next post, I will summarize what we have learned from all of your feedback and from Carlos' insights.

First, David Sims commented "It is indeed good for our developers to be deeply involved in technical support, as it leads to a better product that they produce. However, like Carlos pointed out, as a small business owner, I know it's not always the best use of resources for a developer to answer questions that a skilled support engineer can handle." I agree with both of those points. We have seen that product quality consistently improves with developer involvement in production operations. We also agree with David that it is a challenge for the development team to devote time to operations when they have new code to write. However if David means operations work is not "high value" enough, I dispute that. I am not assigning a value to the work Ops does that is any less than the work Dev does. Just different.

Second, Geva Perry's analysis of the impact Cloud Computing is having and will yet have on the role of operations is very valuable thinking that we should expand on, maybe in another thread one day. At New Relic we have some apps in the cloud and others in a more traditional hosting environment. We have lots of customers, though, deployed in all kinds of cloud environments and we hear from some of them about how they struggle with the new and different demands of that deployment option.

Third, I agree and disagree with John Allspaw's comments. I disagree when he says that (Cloud) automation will not appreciably reduce the amount of Ops people. I think it's an inevitable trend. I agree that in most larger organizations, there will remain an Ops organization and success will be measured by the degree to which they learn to collaborate, not the degree to which they obliterate the other.

Fourth, I like Sellers Smith's "signs of a healthy operations environment." I think he is on the right track. I still favor shifting more responsibility for application and service level success to the developers so that there is less emphasis on the hand-offs between Dev and Ops and more on building apps and app platforms with the end in mind. Think of the Design for Maintainability movement in industrial engineering and consumer products and you will see where I am going.

Next post I will summarize what we have learned and solicit your comments.

Follow-up added May 10th

Bjorn Freeman-Benson: This will be my last posting on the debate, though I may jump back in as a commenter if I see some more great comments like those below. First of all, this has been a great experience for me and for my colleagues at New Relic. This has stimulated some interesting internal discussions (see Bayard Carlin's comment below for an example.) We have heard from customers, partners, and other friends in the business. Its clear from these discussions and from your comments in the debate, that having an Ops function is not going away for most organizations of any size. There are too many jobs cutting across all of IT needing compliance, governance and standardization that cannot be left to individual app teams. So let's assume you are going to have an Ops function if you are employed by a larger company. Startups and small companies may (and should, in our opinion,) successfully blend Dev and Ops into a single role.

In this posting, I would like to summarize our thinking on both the Role of Operations and on some ways Dev and Ops can work more harmoniously, productively and efficiently.

The role of IT operations has changed significantly in the past, say, 10 years. And in the past 2 years the pace of change has accelerated pretty dramatically. On this, I think Ops people and most others in IT can agree. As more applications have moved to a distributed model using web technologies, application complexity has increased at the same time that application development cycle times have decreased. What a bind Operations finds itself in! Ops teams need deeper and more varied skills to manage more complex application environments even while agile methodologies, Cloud deployment platforms, high performance IDE's like Eclipse, and new application frameworks like Rails, Spring, JEE, Grails, and .NET enable faster and faster application development.

To succeed today, the Ops team will need to adapt to a faster pace of deployment and to a continuous ratcheting up of complexity. In my opinion the role will call for one set of skills that are solely within the Ops team (because they apply across the whole enterprise) and another set of skills that are shared with the Development teams. Shared skills are those where Dev takes the lead but Ops works side by side (more on that later).

These skills reside within the Ops team:

Hardware and network configuration and deployment
OS and firmware maintenance
Application stack maintenance (app servers, frameworks, plugins, etc.)
Capacity planning
Storage and backup management
Disaster planning
Security and access administration
Telecom management

And these skills are shared with Development (with Dev taking the lead and responsibility):

Application deployment
Application, Network, and Infrastructure monitoring
Log management
Database design and administration
Incident management and troubleshooting
Application performance management
Service level management and reporting

The best thing an Ops team leader can do for his people is to provide them with opportunities to cross train in those skills in which Development takes the lead.

Finally, here are some simple recommendations for improving the Ops job:

Move most Ops staff into cross-functional application teams consisting of developers, Ops, DBA's, and business analysts (product managers)
Physically co-locate Dev and Ops people into cross-functional units
Assign cross-functional success metrics that make an entire team responsible for on-time, on-target application delivery and performance
Make sure the whole team understands how to monitor performance, gather critical data, and interpret performance metrics in a critical situation
Involve developers in the selection and customization of application and database monitoring solutions
Insist that the whole team meet with their business customer-counterparts at least monthly to review progress and goals

Thanks for following our debate. It's been a lot of fun. Special thanks to Carlos Armas for challenging our positions and taking on the role of opponent.

Below are the most recent Twitter comments related to this debate, for your viewing enjoyment:

The embedded Twitter widget above is a free, open-source JavaScript-based library which is found at http://tweet.seaofclouds.com.

InfoQ Software Architects' Newsletter

Debate: What is the Role of an Operations Team in Software Development Today? [Updated May 10th]

Write for InfoQ

Who should be responsible for the management, monitoring and operation of a production application?

Rate this Article

This content is in the DevOps topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter