In this brief article, we’ll explain what Docker Swarm is, why we use it in our non-orchestrated container environments and what considerations you should keep in mind in case you start using it
The features we will describe correspond with Docker version 17.03.0-ce (Docker Community Edition).
What is a service?
A Service is the definition of a task that must be executed on a cluster node, in turn, a task is the smallest unit of deployment (planning) in a cluster. This definition is given in a declarative service model, which defines what is the desired final state and not what to do, how to do it or when to do it, which are rather part of the structured programming paradigm.
What is Docker Swarm?
Docker Swarm (from now on referred to as Swarm) is Docker’s native cluster and container orchestrator. It pools together several Docker hosts and exposes them as a single virtual Docker host.
The same Docker API is used to interact with Swarm, with no need to install anything else. When we initialize a new swarm (cluster) or join nodes to a swarm, the Docker Engine runs in swarm mode.
Unlike Docker Compose, where each container launched is tied to a single instance, with Swarm those same services are deployed in an equitable way between the various hosts available. This distribution between multiple hosts drastically changes the deployment paradigm to one of distributed systems of containers, and this generates a series of technical questions to solve.
Swarm consists of a set of Docker Engines managed by a Leader Manager who makes the management (resources) and the planning (containers) decisions. There are two types of nodes: the Workers and the Managers. Workers (Agents) do the job and Managers plan and manage the swarm.
In order to maintain a consistent global state of Swarm in the event of failure, the Raft Consensus Algorithm is implemented among the Managers, so that if the Leader Manager dies, another secondary Manager can be chosen and restore the consistent state. For this reason and to guarantee high availability, it is necessary to have several Managers.
If for some reason there are not enough Managers to get quorum [(N / 2) +1 members to agree], the cluster suspends any new planning task, keeping current jobs running.
Besides that, all swarm nodes are integrated into a network called ingress routing mesh, whose main objective is to accept incoming connections from the outside world on ports published on behalf of services and to balance that load between the containers instantiated with the related service.
- Planning, efficiently taking advantage of resources and load balancing between all of them.
- It provides mechanisms to check the state of our application and, if necessary, migrates failed components, as well as to migrate all the containers from a failed host to a healthy one or some of them.
- Scale up or down the number of tasks in execution of a certain service depending on our definition.
- Service Discovery, Load balancing, Multi-host networking and internal DNS.
- Rolling updates
- Desired state reconciliation. This is done to preserve the state we define in the manifest file associated with our solution.
Why migrate our host-affinity environments to distributed environments?
If we take into account that there is a wide variety of use cases in which multitude of related containers are deployed, such as in a continuous integration environment, it is not practical to use a single host for deployment, nor that the deployments are done manually. Solutions deployed on a single host have a single point of failure with a high probability of occurrence, which is when for any reason the host state becomes unhealthy.
In that case, unless we have a mechanism to provision contingency environments, our solution will no longer offer the service for which it was created.
To mitigate this vulnerability, it is necessary to have a multi-node environment, in which we can distribute our services and replicas.
We also want our applications to be Cloud Ready, to make deployment more flexible and this implies that it meets some quality attributes that should be fulfilled in architectural paradigms of distributed Cloud solutions, such as Availability, Reliability, Security, Scalability, and Elasticity.
What should we consider when migrating our host-affinity environments to distributed environments?
Depending on the characteristics of our solutions and the environment in which they are deployed, we must take into account some of the following subjects:
- The handling of stateful containers with high availability or an automatic migration feature, since in these environments to couple a volume to a certain container can cause an interruption in the node and the consequent loss of data.
- Interaction with external load balancers or reverse proxies. With Swarm, our services change host/container without us knowing (so it should be), and our applications listen to non-standard ports and in some cases, to random ports. Therefore, a dynamic mechanism is necessary for the name resolution.
- A good centralized log management is indispensable.
- Automation of the process of cleaning dangling elements, especially for environments with a lot of dynamics, such as the QA or CI environment.
- Internal and external security.
- DevOps: It is mandatory to have active monitoring of solutions that provide productive services (directly or indirectly). It is important that each service can expose its internal state, regardless of the state of the container in which it lives. In this case, the monitoring is automated with no need for an Operator. Intraway has a monitoring solution called Sentinel. Sentinel can take metrics out of each container and every deployed application. It is configurable and ensures an efficient and automated control.
Companies that are working with docker understand the enormous advantages that distributed systems of containers provide and want to bring all those competitive advantages into production/QA/CI.
Unfortunately, all that glitters is not gold, and it is necessary that in addition to solving technical problems, we have the right human resources, and especially to have multidisciplinary teams that are related to technology on a daily basis.
Swarm is one of docker’s subprojects with the most potential growth and although there are technical issues for which there is still no general agreement, it is already feasible to advance in the incorporation of this technology, at least for environments with less exposure to faults.