An Examination of the Enterprise risks of building your Cloud stack

An Examination of the risks of building your enterprise stack

This article refers to the kinds of risks CIO's need to look at, CTO's and others who have a concern for the long term issues associated with using any technology. These are risks Enterprises normally assess as part of technology adoption processes and PMO processes.

Enterprises look at these things because historically bad things have happened because people didn't look carefully at how they were deciding to use technology.

There was some initial resistance to open source in Enterprises. People were concerned mainly with licensing issues. There was concern that someone would sue them because they would claim the source code used was stolen. That never materialized. There were also license issues that companies might use source code in a way that wasn't licensed and be open to lawsuits. Almost all open source companies have since moved to very open licensing like MIT or Apache that allows virtually any use of the code without problems. Therefore, these risks have nearly disappeared but there are other risks of using open source that also have to be looked at.

There are tools to help you verify that the source code you are using is licensed properly so that nobody in the company is doing anything that leads the company to unwanted lawsuits.

With the decline of this open source risk we have almost the opposite scenario in many organizations where we have nearly an open policy of use anything and we don't care. The risks of today's use of open source has less to do with will you be sued and more to do with are you going to find that your project doesn't work, is delayed substantially or ultimately becomes unmanageable because you used the wrong open source.

The use of open source in Enterprises is huge now. Some are beginning to question the fact that there are 100,000 open source projects and that your organization may end up using literally thousands of projects and you have no idea in the future what risk and inefficiencies you have taken by letting everybody do anything they want.

Here are some of the risks that you may not have thought of to consider when using open source or really any technology in your business.

Selection risk – Are you looking at the best components in the category?

For every category of open source there are many choices. If you look at KV type NOSQL databases there are at least a dozen choices from MongoDB, Aerospike, Cassandra, and others. The file storage area has many choices from Ceph, Ruke, EFS, Cinder, OpenZFS and many more.

There are many places to find open source projects and it is easy to miss important projects that are up and coming or revolutionary. Do you find the best component out there? Did you miss something that turns out to be the up and coming best product?

If you select a component with inferior performance or features does this hobble you compared to another competitor?

Company risk – Are you looking at the risk the technology is stale or going away.

A well known risk that large companies look at is if the company they are looking to buy a product from will be around to help if there is a problem.

With open source that problem is somewhat mitigated by the fact the source code is always available.

The community activity and size is an equivalent measure instead of size of the company or financials. If a component is open source but the community is not active you will have a tough time if anything breaks or needs to be changed. Today, the open source project may have a good following. Even if you select a company or open source project that is well serviced today things can change.

Thus many companies look to have multiple sourcing and to have alternative sourcing if the company proves problematic. Choosing a project that is a perfect fit but later turns out to be a problem to support is a huge risk. If you are a giant company this can be mitigated because if you have the source code you can ultimately take over supporting it. That's not desirable but at least with open source that is always an option.

Technology risk – Does the technology actually solve the problem?

Do you know if the component you have selected will actually solve the problem you want it to? You read the documentation. You read the comments. It seems right, but when you use it you don't get the performance others get. Maybe your use of the technology is more special than you realized. Maybe you write more than you read and the product turns out not to be as good at writes as reads or some special use causes it to crash constantly.

Without using the component with your other pieces you don’t know if it actually works with your other components to solve the problem you want it to. It may perform badly in your scenario because you didn’t understand that it doesn’t work well for the use case you have.

This is a very common problem actually. It may turn out you need to configure the product in a special way to get the performance you need. Information on these open source projects is meager usually and hard to get the expertise you need and there is frequently nobody you can hammer on the head or threaten if you can't easily get the answer.

Integration risk – Will the technology integrate with other components easily

Some technologies only work with specific other products. If there is no well defined API or interface that is a standard you may have to write your own integration code to make 2 things work together that weren’t designed to work together.

In some cases your products are incompatible and can’t be made to work together in which case you need to abandon it and choose a different one.

It could be that subtle distinctions in the API or interface make something else not work. You may not know in advance all the components you will be using and thus you think it integrates fine with the first 5 things, but it has problems with the next 5 things. Most architectures today consist of 30 components or more. The integration of this many components ultimately has issues.

Integration risk is one of the most common risks and its mitigation is in most cases simply writing some integration code or finding someone else who has found a way to integrate the components. However, this risk is also the most common reason for delays.

Longer term as you take on and use more and more technology and the complexity grows the interdependence of these technologies becomes a concern. This is sometimes called the spaghetti architecture problem. It is to some extent unavoidable although careful understanding of standards and interdependencies helps reduce the brittleness that develops.

One technique that was used in the past to eliminate this integration risk I created 30 years ago was called a message bus. By integrating everything to a message bus the complexity of integrating everything to everything else went down. This allowed you to substitute or add new components easier because all you needed was to interface to the bus. This is still a widely used approach but the new bus is http.

However, http only encapsulates and underlying protocol for communication and the syntax of the data. It doesn't describe the functions above that that describe what the components actually do.

Performance risk – Does the component perform adequately to meet your needs?

You may have heard great things about performance of a product but it still can not meet your needs.

Sometimes if you turn on options like guaranteed delivery or you run with a certain replication factor or using some storage services the component will not work well or as fast as others.

You may find you can write the data fast but certain types of queries are very slow. There may be scaling issues where a component can't be replicated or grow to meet your demand.

This is also a very common risk and ultimately may require rewriting and changing architecture. There is no way to know if something will scale until you try it.

Security risk – Are there security gotchas including Malware, poor coding practices as well as Inappropriate licenses for the component making your use illegal

This is the risk I talked about initially regarding licensing but is broader.

If you don’t scan the open source and see if it has malware or meets some security guidelines you could be opening yourself to vulnerabilities. This risk is ongoing because vulnerabilities change and as upgrades happen there is frequently new vulnerabilities exposed.

They don’t always work but there is risk always that code has a severe vulnerability. Node.js about a year ago had a problem discovered that created a massive problems for its users almost imperiling the project.

This is a risk you take and you need to keep up to date. However, if a project is not well supported you may find it difficult to get a fix in a timely way leaving you open to vulnerabilities for long periods. This is probably one of the most severe risks you can take.

Support risk – is the component supported well should a problem occur?

This is one of the most important risks. Many open source projects have companies that provide support for them. You may choose not to purchase the support and use the "upstream" version as it is called or the "community version" but this will have risks. Usually the cost for support of open source products is quite small compared to typical Enterprise license deals.

Companies used to make the argument that paying support was a way they made sure that the company providing them with critical technology for their business stayed in business and supported them. Companies are mature and realize that other companies can't give away everything for free and prosper. They don't eschew the other companies their livelihood if they think it might mean such a company ultimately fails then their "cost savings" was imaginary. Enterprise companies sometimes referred to this as a hidden cost of open source.

In any case you can buy support for many open source projects. However, that doesn't eliminate the risk completely.

One of the complexities of this open source world is the number of components. You may determine that you think the problem is in component A. However, when you contact component A's support they say, no, it is component B that is at fault. This kind of he said she said blame playing is common and enormously stressful and difficult to manage.

This is why enterprises frequently buy from companies that have "more of a solution." The fewer pieces they have to buy, the fewer possibilities for cross-blaming. You can point to vendor A and say, fix it and if you don't I will end the contract.

The existence of 30 or more components in today's architecture results in a lot of finger-pointing. This is why some companies will use outside vendors to take responsibility for their cloud technologies.

This is related to the company risk, but without a good community supporting a project and without a company backing it you may find it hard to get support. Sometimes it is a matter that a component should be modified to meet new standards or fix security issues. Without the right support you may find yourself in trouble.

Training risk – do you have a minimum skill or competence in the component

Over many groups in your company you may be using 3 or more components in a certain category. In fact, you may be using a dozen compatible components. What this means is that different groups are doing something similar but not the same. When you move someone from one group to another they may have to be trained from scratch on the new product because it is significantly different but the same.

Big enterprises usually limit the vendors they use for a component because they want to gain in house expertise in the product. This allows their internal engineers and others to be more valuable across the company. You can solve problems faster and people in other groups can get help from different groups. If you allow every group to pick a different component for the same function then you will have a broad smattering of experience on everything but no deep experience on any one thing. This leaves you vulnerable and makes everything more expensive.

There are lots of consequences of this. Companies are very well aware generally of this risk in the past but by adopting open source you don't know now if you are gaining any experience in any specific product. You could be in a precarious position depending on a component and when something goes wrong you can't find anybody inside the organization who knows how to fix it.

You may have a problem that when you lose an individual and they were the only one to know a particular product that you've suddenly become very vulnerable. Having an unlimited number of open source projects inevitably will lead to lots of these problems and you may be forced to consolidate the components you use. This can be very expensive and risky. It is better if you had chosen up front to use a small set of components from each category and qualified those components.

There is also the issue if the component is well enough documented for anyone to gain expertise.

A component or product may sound awesome, perform awesome but it is poorly documented. People may have a lot of trouble using it properly.

This risk is a risk many Enterprises are ignoring today allowing groups all over the company to choose all kinds of technology and create dependencies and special knowledge ratholes that can come up and bite you.

Summary

Reading about all these risks may make you nervous especially if you are a corporate executive who worries about the future. Open source has enabled a huge advancement in productivity, efficiency and agility. It is hard to say no to those advances. You don't have to but you must be aware of the risks you are taking.

The proliferation of many different open source projects throughout companies has become standard but some companies are trying to constrain the choices of architects to pick components down to a limited number so that the organization as a whole develops expertise in a couple products.

It means being able to consolidate support issues and to have less risk by consciously applying some of the criteria in the other risk categories I describe above.

Also, without vetting the components for the risks listed above companies may find that 5 years down the road a number of components you are using in critical services are not supported and have vulnerabilities, need to be fixed or otherwise changed and you end up having to do this yourself at great cost. Re-engineering a product is rarely a good thing. It is far more costly than most companies realize, disruptive and is an expense they didn't plan on.

Most companies hate to change something that is working. So, they don't upgrade, they don't re-engineer. They will let something get crusty and brittle and learn to live with the limitations rather than spend the money to fix it. Thus in 5 or 10 years if you've used a wide variety of components throughout your company you may find yoursef with vastly less agility than if you had consolidated on fewer components more carefully selected.

You don’t want to constrain your teams to not be able to use whatever they need to work productively and fast but you also don’t want to create enormous cost down the road that is unanticipated by engineers frequently just trying to get the job done.

At the minimum you should have engineers think about the long term consequences of using some product and think about the issues I’ve brought up above.

Ideally there would be a resource for us all to look at to find answers. There is no single source for such information unfortunately.

Here is a list of several companies that provide some information on these topics and help you select components for your enterprise stack:

The New Stack
Agile Stacks, Inc
Stacks hare
Stack Overflow
Stack Exchange
Dzone

Leave a comment