Handling OOB Network Changes

In this blog I would like to showcase the power of Ansible Content Collections to build powerful abstractions. Collections are a distribution format for Ansible content that can include playbooks, roles, modules and plugins. For this blog post, let us address an Infrastructure as Code(IaC) use case for network configuration management of BGP. We will walk through examples for both Cisco IOS and Arista EOS devices.

First, let us define a data-model that encapsulates the vendor-agnostic configuration.

bgp_global:
    as_number: '65000'
    bgp:
        log_neighbor_changes: true
        router_id:
            address: 192.168.1.1
    neighbor:
    -   activate: true
        address: 10.200.200.2
        remote_as: 65001
bgp_address_family:
    address_family:
    -   afi: ipv4
        neighbor:
        -   activate: true
            address: 10.200.200.2
        network:
        -   address: 10.25.25.0
            mask: 255.255.255.0
        -   address: 10.25.26.0
            mask: 255.255.255.0
        -   address: 10.100.100.0
            mask: 255.255.255.0
        -   address: 10.200.200.0
            mask: 255.255.255.0
        -   address: 172.16.0.0
        -   address: 192.168.1.1
            mask: 255.255.255.255

As you might have observed, this data-model matches exactly the input expected by the <vendor>.bgp_global and  bgp_address_family modules within the IOS and EOS Collections respectively. For a team embracing IaC, any updates to the BGP configuration are made by updating this data model.

OOB network blog 1

As you see in this workflow, the last-known-good-config is typically stored into a Source of Truth (SoT) database of some sort in post deployment,- maybe even another git repository.

 

Inevitable Out of Band Changes

During steady state operations, this model works great and gives organizations the power of building vendor agnostic data-models. There might be reasons however, that force an operations team member to make a manual change to the running configuration of the devices. This might be due to an outage caused by a flapping link or a misbehaving IGP neighbor, for example. It might even be because this newer member is more familiar with the CLI and just made the change directly to the device. Whatever the reason, at this point, the running configuration of the production network is out of sync with the SoT. Now, if another change were to be introduced via the IaC, it might end up overwriting the changes that were introduced manually since the IaC is expected to capture the desired configuration of the devices! If this leads to an outage you are left trying to troubleshoot how a well tested, reviewed IaC workflow caused an outage and some employee education time.

 

The Solution

With the power of Ansible Collections, you can build intelligence within abstractions that can help avoid such a problem. If you are new to Collections, I would recommend you start with this blog post that  introduces a hands-on approach to Collections. Let us break down the problem statement in such a way that we can use a plug-n-play Collection to achieve the following goals:

  1. During normal operations, the Collection is to do nothing to interfere with configuration deployment.
  2. Detect when there is a mismatch between the running configurations and the last-known SoT configuration.
  3. Alert the operator and provide a path for reconciliation before deployment.

A Collection illustrating such an abstraction is provided here. For the remainder of this blog, let’s look at how this demo repository helps satisfy the three goals listed above.

First, let’s take a look at a playbook that deploys the configuration in the introduction:

---
- name: Deploy IaC configs to BGP routers
  hosts: rtr1,rtr2
  gather_facts: no
  tasks:
    - name: Update the Arista BGP configs
      arista.eos.eos_bgp_global:
        config: "{{ bgp_global }}"
      when: ansible_network_os == 'arista.eos.eos'

    - name: Update the Arista BGP configs
      arista.eos.eos_bgp_address_family:
        config: "{{ bgp_address_family }}"
      when: ansible_network_os == 'arista.eos.eos'

    - name: Update the IOS BGP configs
      cisco.ios.ios_bgp_global:
        config: "{{ bgp_global }}"
      when: ansible_network_os == 'cisco.ios.ios'

    - name: Update the IOS BGP configs
      cisco.ios.ios_bgp_address_family:
        config: "{{ bgp_global }}"
      when: ansible_network_os == 'cisco.ios.ios'

We have a play with four tasks that simply deploy the configuration invoking the appropriate vendor module as needed.

NOTE: This example is using the new BGP resource modules. Check out the previous blog Rohit Thakur wrote here.

As a creator of the caretaker Collection in this scenario, we strive to make the Collection a plug-n-play option to existing playbooks such as the one above. I am going to add the caretaker role (as defined by namespace.collection.role) to the previous playbook:

---
- name: Deploy IaC configs to BGP routers
  hosts: rtr1,rtr2
  gather_facts: no
  pre_tasks:
    - name: Invoke Caretaker
      include_role:
        name: redhattelco.caretaker.caretaker
      vars:
        sot_repo: [email protected]:redhattelco/caretaker-sot.git
        running: "/running_configs/{{ inventory_hostname }}/running.cfg"
        archived: "/archived_configs/{{ inventory_hostname }}/config/running.cfg"

  tasks:

    - name: Update the Arista BGP configs
      arista.eos.eos_bgp_global:
        config: "{{ bgp_global }}"
      when: ansible_network_os == 'arista.eos.eos'

    - name: Update the Arista BGP configs
      arista.eos.eos_bgp_address_family:
        config: "{{ bgp_address_family }}"
      when: ansible_network_os == 'arista.eos.eos'

    - name: Update the IOS BGP configs
      cisco.ios.ios_bgp_global:
        config: "{{ bgp_global }}"
      when: ansible_network_os == 'cisco.ios.ios'

    - name: Update the IOS BGP configs
      cisco.ios.ios_bgp_address_family:
        config: "{{ bgp_global }}"
      when: ansible_network_os == 'cisco.ios.ios'

Note the pre-tasks section where we are invoking the caretaker role using the FQCN (fully qualified collection name) redhattelco.caretaker.caretaker .  We also pass three variables: 

  1. sot_repo which points to our Git SoT,
  2. running_config – path to our running configuration file name
  3. archived – path to archived configuration file name.

Now, when this playbook is run, the caretaker role is first invoked and forces a comparison between the running configurations against the most recent copy of the archived configurations from the SoT. If there is a difference, the role fails.

 

Playbooks within Collections

Did you know that Ansible Content Collections also allow you to package in opinionated playbooks? As creators of intelligent abstractions, this presents a very powerful tool for us. As seen in the previous section, caretaker now refuses to deploy configuration if it detects a difference between the running configurations and the SoT. The Collection further provides playbooks for the user so they can choose to restore the configuration from the SoT to the devices. View our playbooks here.

caretaker_collection/playbooks
├── ansible.cfg
├── eos.yml
├── ios.yml
├── junos.yml
└── sync_to_devices.yml

0 directories, 5 files

Here, the sync_to_devices.yml playbook provides the user with pre-written playbooks that handles the updating of full-configurations to EOS, IOS and Junos based endpoints. This allows them to then construct a workflow within Red Hat Ansible Automation Platform that might look as follows:

OOB Network blog 2

This would achieve all the three goals we set to solve using our abstraction. During normal deployment, Caretaker does not detect a difference between running and SoT configuration;therefore completing the deployment and updating the SoT with the latest running configurations. If there is a difference, Caretaker will cause the first Job Template to fail, thereby alerting a human to help the automation with conflict resolution (similar to a git merge conflict). If the operator chooses to sync the running and SoT configurations, the workflow invokes the sync_to_devices.yml playbook provided by the Collection to update the device before applying the original change requested by the operator, which then goes on to save the latest config to the SoT. Alternatively, if the operator decides to update the SoT with the running configurations of the devices, they could choose that path – possibly reusing the configuration backup job to achieve this.

 

Conclusion

Collections are the turbo-charged batteries of the Ansible ecosystem. This content distribution mechanism has provided automation engineers a powerful tool to build abstractions that leverage even other Ansible Collections. Through this post we walked through a Collection that handles a specific network automation use case that is increasingly important as organizations begin to embrace IaC. We saw how easy it is to create value-added Collections that can be used in a plug-and-play fashion to extend the use case of existing automation workflows.

 

Where do I go next?

If you want to learn more about the Red Hat Ansible Automation Platform and network automation, you can check out these resources:

Originally posted on Ansible Blog
Author:

Deja una respuesta

Tu dirección de correo electrónico no será publicada. Los campos obligatorios están marcados con *