Kubernetes Service Type LB for On Prem Deployments

Published in

ITNEXT

6 min readMar 21, 2022

This article was inspired from this blog post.

Introduction

Deploying a Kubernetes service of type load balancer in a public cloud environment is very simple. A manifest like the one below is applied to a cluster and then “magic happens” and the service has a load balancer setup in front of it.

Applied Service in Cloud Environment

After this manifest is applied, the cloud provider asynchronously configures a load balancer and then publishes information about it back to the service definition (documentation here). The output of kubectl get svc my-service -o yaml may look as follows after the load balancer is configured:

Service After LB is Configured in Cloud Environment

The address of the load balancer in this case is 192.0.2.127 which fronts service my-service. The cloud provider writes this value back to the service.

On Premise Deployments — Calico + MetalLB

With Kubernetes being the project around which a large percentage of cloud native development occurs, it’s no shocker that this process is seamless in public cloud environments. What about for on premise clusters? There are some solutions out there that can be integrated with load balancer vendors, one of which is Citrix.

For open source options, the combination of Calico and MetalLB can also be used in order to provide service type load balancer functionality for on premise clusters and will be the focus of the rest of this article.

Calico

Calico is one of the most prevalent CNI solutions in the Kubernetes ecosystem. More generally, it is a networking and security solution for container, VM or host based workloads.

Calico’s use of BGP in a Kubernetes context is critical for this solution and as such, this solution is only valid if the networking north of the on premise cluster is running BGP and the top-of-rack (TOR) switches are the gateway for the Kubernetes nodes. These TOR switches must also have dynamic peering enabled. The reason being is to be able to support the horizontal scaling of nodes in the cluster. The rest of this article assumes these items are in place but will not provide any switch configuration. The last item that must be supported on the network side is equal cost multi-path (ECMP) routing. The network will be receiving the same route from potentially tens of nodes.

MetalLB

MetalLB is a load balancer implementation for bare metal Kubernetes clusters and by itself can provide service type load balancer functionality for on premise clusters… so why not just use it whole hog? For production grade deployments, there are some serious limitations which are documented here. In short, slow failover, traffic bottlenecks in L2 mode, and potential collision with CNI BGP speakers in L3 mode.

So what function does MetalLB serve in this solution? The MetalLB controller can be used independently of the speakers. The controller can perform the address assignment for services of type load balancer and thus, only that component will be used.

The Solution

This section will walk through each part of the entire solution except for the TOR switch configuration.

Logical Topology

The TOR switches north of the Kubernetes worker nodes sit in BGP autonomous system 65116 while the worker nodes themselves sit in 64700. This makes them eBGP peers which means that the TOR switches will automatically tell their neighbors about any routes that the worker nodes advertise. The nodes peer with the TOR switches in VLAN10. The switches are the default gateway for this network and this is the only interface on the Kubernetes nodes.

While it’s possible to have the TOR switches advertise routes to the worker nodes, this topology is intentionally simple and the nodes use a static default route in order to forward traffic off of the nodes. The nodes meanwhile advertise a range of 192.168.5.0/24 which will be used to assign load balancer IPs within the cluster.

Each node in the cluster will advertise this range whether it is hosting a pod that is behind a service of type load balancer or not and ECMP routing on the TOR switches will get traffic to the nodes. This is one area that could be optimized but this article is focused on just getting traffic flowing first.

Calico Configuration

Assuming the TOR switches have peer authentication enabled, the first step is setting up a secret for the Calico peers. From the Calico documentation, this password must be created as a Kubernetes secret in the same namespace as the calico/node pods and referenced in the peer configuration.

$ kubectl create -n kube-system secret generic tor-peer --from-literal=tor-pw=<<PASSWORD>>

The next step is configuring the BGP peer which must be done using the calicoctl utility.

Calico BGP Peer Configuration

The nodeSelector parameter ensures that the BGP peer is only formed on worker nodes and not ETCD or master nodes. In order to apply this configuration, use the calicoctl utility as depicted below:

$ ./calicoctl create -f bgppeer.yaml 
Successfully created 1 'BGPPeer' resource(s)
$

The next step is to update the BGP configuration of Calico. Specifically, the cluster needs to be configured with an updated BGP policy to advertise the load balancer IP range to the TOR switches. This is required in order to inform the rest of the network how to reach this space.

apiVersion: projectcalico.org/v3
kind: BGPConfiguration
metadata:
  name: default
spec:
  logSeverityScreen: Info
  nodeToNodeMeshEnabled: false
  asNumber: 64700
  serviceClusterIPs:
  - cidr: 10.43.0.0/16
  serviceLoadBalancerIPs:
  - cidr: 192.168.5.0/24
  listenPort: 179
  communities:
  - name: bgp-service-community
  value: 64700:300:100
  prefixAdvertisements:
  - cidr: 192.168.5.0/24
  communities:
  - bgp-service-community
  - 64700:120

This configuration get applied again using the calicoctl utility. After being applied, the 192.168.5.0/24 subnet should appear in the routing table of the TOR switches.

MetalLB Configuration

Though Calico handles the BGP piece, there is still the matter of handing out load balancer addresses when they are requested in the cluster. The MetalLB controller can perform this function. Again, it is important that the speaker component of MetalLB NOT be installed. This could potentially clash with Calico as they are both BGP processes.

For this demo cluster, the installation option by manifest was used (raw manifest here) and all of the speaker components were commented out.

The only additional configuration required is a config map that MetalLB will read to understand what it should be doing in the cluster.

The load balancer subnet of 192.168.5.0/24 is put in place under the addresses parameter. The controller will read in this configuration and hand out addresses.

Testing the Solution

With this configuration deployed, the next step is to test it. A simple HTTP echo server will be used in a deployment along with a service configuration of type load balancer to expose the deployment outside of the cluster.

Deployment

Note, the container exposes both 80 and 443 so the deployment and service will do the same. The testing will only focus on HTTP.

Service

Once the service is deployed, MetalLB should hand out an address to the service. This can easily be confirmed by looking at the services in the default namespace of the cluster.

$ kubectl get svc -n default
NAME               TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
http-server-svc    LoadBalancer   10.43.236.6     192.168.5.0     80:32761/TCP     11d
kubernetes         ClusterIP      10.43.0.1       <none>        443/TCP          63d
$

Sure enough, 192.168.5.0 has been assigned for this load balancer address (note, this a completely valid address to be used for this purpose, if the 192.168.5.0/24 range was instead being used to provide network access to hosts, the 192.168.5.0 address would be the network address of the subnet).

Next step is to test the echo server.

$ curl -X PUT -H "Arbitrary:Header" -H "Host:test.example.com" -d aaa=bbb http://192.168.5.0
{
  "path": "/",
  "headers": {
    "user-agent": "curl/7.29.0",
    "host": "test.example.com",
    "accept": "*/*",
    "arbitrary": "Header",
    "content-length": "7",
    "content-type": "application/x-www-form-urlencoded"
  },
  "method": "PUT",
  "body": "aaa=bbb",
  "fresh": false,
  "hostname": "test.example.com",
  "ip": "::ffff:10.42.61.64",
  "ips": [],
  "protocol": "http",
  "query": {},
  "subdomains": [
  ],
  "xhr": false,
  "os": {
    "hostname": "http-echo-server-5c5877dfbb-8k7fs"
  },
  "connection": {}
}
$

Success! From here, there is still work to do get a production grade deployment working. Obviously a DNS record needs to be setup for the load balancer IP. If the applications being exposed in the cluster are soley HTTP(S) based, additional load balancer services can be stood up for the ingress controller.

Thanks for Visiting Medium

I hope you found this article useful. Thanks for visiting Medium.