Friday, August 8, 2014

Configuring Overlay Transport Virtualization (OTV) using Multicast as its Control-Plane



Hi Everybody,

Today I’m gonna share my study (finally after having a long time dormant) regarding the Data-Center Interconnection, which is OTV.

Initially this technology is only available for Nexus 7000 (and Nexus 7700 also), but since the Customer demand for this technology is quite huge, and I think Cisco tried to answer it, so currently, OTV can be deployed in the following platform:
  • Nexus 7000 & Nexus 7700 Families
  • ASR 1000 families
  •  CSR 1000v Families

That is great news right ;), so in this blog, I’m gonna use the CSR1K to emulate the OTV feature, since CSR1K is using IOS-XE, so the following example can also be deployed on the ASR1K as well. Just to make sure that we are using the latest IOS-XE (the minimum is XE 3.5 for Multicast OTV Control-Plane Deployment or XE 3.9 for Unicast OTV Control-Plane Deployment) and in case of IOS-XE, make sure you have the appropriate license to run OTV Feature J

So the basic idea of the OTV is that providing the L2 connectivity across Data-Center, so I can say that we can broader our L2 Data-Center Network across different Data-Center, why do we need that? Well one of the main major application that have been used for the web and application tier in DC is VMWare, and they have feature called VMotion, where we can move one VM Instance, let say in DC1 to DC2 without configuring from the scratch. We can do that easily using this VMotion feature, but the requirement is that the application should be in the same Subnet / VLANs.

Well several technology are available in order to address this requirement, let say L2TPv3, EoMPLS, or VPLS Families, but these technology just natively extend the subnet, they are not designed to optimize for the Data Center Interconnect (DCI) in the first place, so one of the potential issue that may arise, let say STP will be flood from DC1 to other DCs and that will make another consideration, if DC1 having Broadcast-strom, it will natively ‘move’ to other DC right?

So that is why Cisco invented this OTV Technology. Ok let’s we jump to the Diagram Here

In my Lab I have 2 Data Center, let just say Left DC (DC1) and Right DC (DC2), we want to extend the VLAN 10 & VLAN 20 on DC1 to DC2 and vice versa, R1-R2-R3 are Multicast capable, they can deliver either ASM or SSM, and we are assume this Mcast part is already been configured in the first-place.

So our focus here to provide the DCI between DC1 and DC2 using CSR1K1 & CSR1K2. Note that If I was using GNS Router as a transport, the Interface between ED and the first hop router will always failed after running for about 1 minutes or so, So I used direct Interface between CSR1K1 and CSR1K2 to emulate the Overlay Interface, which is GigabitEthernet3.

First of we can Configure the CSR1K1 and CSR1K2 to have reachability using directly connected Interface:
CSR1K1
!
ip multicast-routing distributed
ip pim ssm default
!
interface GigabitEthernet3
 description P2P with another ED
 mtu 1542
 ip address 10.10.20.10 255.255.255.0
 ip pim passive
 ip igmp version 3
 negotiation auto
 no shut
!
end
CSR1K2
!
ip multicast-routing distributed
ip pim ssm default
!
interface GigabitEthernet3
 description P2P with another ED
 mtu 1542
 ip address 10.10.20.20 255.255.255.0
 ip pim passive
 ip igmp version 3
 negotiation auto
 no shut
!
end

Note that since OTV will have an additional Header Overhead, which is 42 Bytes, make sure we count this when we configuring OTV, since OTV will having DF bit to one, which means, if some router in the transport path do the fragmentation, the OTV will completely failed L, make sure to consider this, especially when we are connecting to the Service Provider, make sure we buy a higher MTU in order to use this OTV Feature. And also from the Host perspective, in my case, the host will not exceeding 1500 Byte of the MTU.
 


Now we have to specified the OTV parameters, which will includes the following attributes:
  •  Site Bridge-ID or site VLAN in the case of Nexus. I used VLAN 1 as a site vlan for DC1 and VLAN 2 for DC2, this site VLAN will be used if we are deploying more than One OTV Router or Edge Device (ED) for multi-homing or redundancy purposes, so one of the ED, provided that we are deploying more than one ED/OTV Router, it will become an Authoritative Edge Device (AED), the same behavior like DR in ospf or DR in PIM deployment.
  • Site identifier (different DC = Different Site Identifier value, same DC = same value)
CSR1K1
!
otv site bridge-domain 1
!
otv site-identifier 0000.0000.0001
end
CSR1K2
!
otv site bridge-domain 2
!
otv site-identifier 0000.0000.0002
end

  •  Overlay Interface which will contain the Control-Plane Mcast address for propagating hello and update OTV Message, Data-Plane Mcast Address in order to encapsulation our native Multicast traffic in our DCs, and join-Interface, which is the Interface that will be used as source IP in encapsulation the extended VLANs between DCs
CSR1K1
!
interface Overlay1
 no ip address
 otv control-group 239.255.10.20
 otv data-group 232.10.20.0/24
 otv join-interface GigabitEthernet3
!
End
CSR1K2
!
interface Overlay1
 no ip address
 otv control-group 239.255.10.20
 otv data-group 232.10.20.0/24
 otv join-interface GigabitEthernet3
!
End

  • VLANs information that we want to extend, which is a little bit different from Nexus 7000, in the IOS-XE, it’s quite cumbersome because we have to deal with L3 Port instead of native 802.1Q trunk port on the Nexus Switch. In our Lab we are going to extend VLAN10 and VLAN20 on both Data Center. Please note that DC1 will use VLAN1 (Native-VLAN) as an site-VLAN, the VLAN that is not used to be extend, but for Internal Control-Plane within DC, while DC2 will use VLAN 2 to be site-VLAN, so this site VLAN should not be included on the Overlay Interface. In the case of the NX-OS, we just simply put ‘otv site-vlan [vlan-number]’ under global config, and ‘otv extend-vlan [vlan-range(s)]’ for extending the VLAN, while in the IOS-XE we should create something line ‘Bridge-Domain’, which is essentially like creating some sort of VPLS Technology eh :p  
CSR1K1
!
interface Overlay1
 service instance 10 ethernet
  encapsulation dot1q 10
  bridge-domain 10
 !       
 service instance 20 ethernet
  encapsulation dot1q 20
  bridge-domain 20
 !
!
interface GigabitEthernet2
 no ip address
 negotiation auto
 service instance 1 ethernet
  encapsulation untagged
  bridge-domain 1
 !
 service instance 10 ethernet
  encapsulation dot1q 10
  bridge-domain 10
 !
 service instance 20 ethernet
  encapsulation dot1q 20
  bridge-domain 20
 !
 No shut
!
end
CSR1K2
!
interface Overlay1
 service instance 10 ethernet
  encapsulation dot1q 10
  bridge-domain 10
 !       
 service instance 20 ethernet
  encapsulation dot1q 20
  bridge-domain 20
 !
!
interface GigabitEthernet2
 no ip address
 negotiation auto
 service instance 2 ethernet
  encapsulation dot1q 2
  bridge-domain 1
 !
 service instance 10 ethernet
  encapsulation dot1q 10
  bridge-domain 10
 !
 service instance 20 ethernet
  encapsulation dot1q 20
  bridge-domain 20
 !
 No shut
!
end

  • And the final step is to no shutdown the Interface Overlay itself
CSR1K1
!
interface Overlay1
 No shut
!
end
CSR1K2
!
interface Overlay1
 No shut
!
end

 
Great we’ve already done with the configuration wise, now it is time to do the verification. First of we can make sure that our router has becoming AED on both DC and having an OTV Adjacency one another
CSR1K1#show otv detail
Overlay Interface Overlay1
 VPN name                 : None
 VPN ID                   : 2
 State                    : UP
 AED Capable              : Yes
 IPv4 control group       : 239.255.10.20
 Mcast data group range(s): 232.10.20.0/24
 Join interface(s)        : GigabitEthernet3
 Join IPv4 address        : 10.10.20.10
 Tunnel interface(s)      : Tunnel0
 Encapsulation format     : GRE/IPv4
 Site Bridge-Domain       : 1
 Capability               : Multicast-reachable
 Is Adjacency Server      : No
 Adj Server Configured    : No
 Prim/Sec Adj Svr(s)      : None
 OTV instance(s)          : 0
 FHRP Filtering Enabled   : Yes
 ARP Suppression Enabled  : Yes
 ARP Cache Timeout        : 600 seconds

CSR1K1#show otv adjacency
Overlay 1 Adjacency Database
Hostname                       System-ID      Dest Addr       Up Time   State
CSR1K2                         001e.bdf0.a000 10.10.20.20     00:01:01  UP  

Note that by default OTV configuration will be optimized for FHRP, so that both DC will become an Active FHRP for its own Subnet, where in the case of NX-OS, we should do this manually, with a bunch command, which is quite complex L.

OTV using IS-IS to carry the MAC-Address information from one AED to another, MAC-Address simply become an additional TLV for IS-IS, note that IS-IS process is running background automatically as we configuring OTV, so we don’t need to do anything, except if we want to do the authentication.
CSR1K1#show otv isis database detail

Tag Overlay1:
IS-IS Level-1 Link State Database:
LSPID                 LSP Seq Num  LSP Checksum  LSP Holdtime      ATT/P/OL
CSR1K1.00-00        * 0x00000020   0x106C        1150              0/0/0
  Area Address: 00
  NLPID:        0xCC 0x8E
  Hostname: CSR1K1
  Metric: 10         IS-Extended CSR1K2.01
  Layer 2 MAC Reachability: topoid 0, vlan 10, confidence 1
    0000.0000.0004 0000.0c07.ac01
  Layer 2 MAC Reachability: topoid 0, vlan 20, confidence 1
    0000.0c07.ac01 000c.2968.32db
CSR1K2.00-00          0x0000001F   0xA6CC        1166              0/0/0
  Area Address: 00
  NLPID:        0xCC 0x8E
  Hostname: CSR1K2
  Metric: 10         IS-Extended CSR1K2.01
  Layer 2 MAC Reachability: topoid 0, vlan 10, confidence 1
    0000.0000.0005
  Layer 2 MAC Reachability: topoid 0, vlan 20, confidence 1
    0000.0000.0005
CSR1K2.01-00          0x00000001   0x0F2D        491               0/0/0
  Metric: 0          IS-Extended CSR1K2.00
  Metric: 0          IS-Extended CSR1K1.00

From the above verification, we can see that OTV was using IS-IS Level-1 adjacency between the router in order to propagate the MAC-Address in its TLV. We can see that DC1, which represented by CSR1K1, were having two MAC on VLAN 10 (0000.0000.0004, which is R4 and 0000.0c07.ac01, which is the default HSRP Gateway or R1) and one MAC on VLAN 20 (0000.0c07.ac01, which is the default HSRP Gateway/R1 and 000c.2968.32db which is Server), while DC2, which represented by CSR1K2, were having one MAC on VLAN 10 & 20 (0000.0000.0005, which is R5 that having separate subinterface on different VRF). Why we don’t see MAC for HSRP default gateway from DC2?!? If DC1 received the MAC of HSRP Gateway from DC2, it means one or more of the Router in the Data-Center will become Standby or Listen, where in this case it can lead to suboptimal routing, for Example, R5 in DC2 should traverse to DC1 Active HSRP router to do the L3 Routeing, yaiks…… but in IOS-XE it was ‘Automatically’ being optimized, COOL!!!

If we take a look on CSR1K2, it will not having HSRP MAC from DC1 for both VLAN10 & VLAN20
CSR1K2#show otv isis database detail

Tag Overlay1:
IS-IS Level-1 Link State Database:
LSPID                 LSP Seq Num  LSP Checksum  LSP Holdtime      ATT/P/OL
CSR1K1.00-00          0x00000021   0x26E2        1160              0/0/0
  Area Address: 00
  NLPID:        0xCC 0x8E
  Hostname: CSR1K1
  Metric: 10         IS-Extended CSR1K2.01
  Layer 2 MAC Reachability: topoid 0, vlan 10, confidence 1
    0000.0000.0004
  Layer 2 MAC Reachability: topoid 0, vlan 20, confidence 1
    000c.2968.32db
CSR1K2.00-00        * 0x00000021   0x21C2        1162              0/0/0
  Area Address: 00
  NLPID:        0xCC 0x8E
  Hostname: CSR1K2
  Metric: 10         IS-Extended CSR1K2.01
  Layer 2 MAC Reachability: topoid 0, vlan 10, confidence 1
    0000.0000.0005 0000.0c07.ac01
  Layer 2 MAC Reachability: topoid 0, vlan 20, confidence 1
    0000.0000.0005 0000.0c07.ac01
CSR1K2.01-00        * 0x00000001   0x0F2D        1092              0/0/0
  Metric: 0          IS-Extended CSR1K2.00
  Metric: 0          IS-Extended CSR1K1.00

And we can see now OTV will do MAC-in-IP Routing, where each of the AEDs within each Data-Center will have some-kind of ‘MAC Routing Table’, to determine if the Dest-MAC is local it will send natively using Ethernet Encapsulation, it the Dest-MAC is on different DC, it will be encapsulated. By doing this, it is natively prevent the L2 heartbeat, such as STP, MAC Flooding, Etc extending to the other DC, another Cool Stuff ;)
CSR1K1#show otv route

Codes: BD - Bridge-Domain, AD - Admin-Distance,
       SI - Service Instance, * - Backup Route

OTV Unicast MAC Routing Table for Overlay1

 Inst VLAN BD     MAC Address    AD    Owner  Next Hops(s)
----------------------------------------------------------
 0    10   10     0000.0000.0004 40    BD Eng Gi2:SI10 ß Local MAC, Encapsulate natively
 0    10   10     0000.0000.0005 50    ISIS   CSR1K2 ß Not-Local MAC, Encapsulate using IP
 0    10   10     0000.0c07.ac01 40    BD Eng Gi2:SI10
*0    10   10     0000.0c07.ac01 50    ISIS   CSR1K2
 0    20   20     0000.0000.0005 50    ISIS   CSR1K2
 0    20   20     0000.0c07.ac01 40    BD Eng Gi2:SI20
*0    20   20     0000.0c07.ac01 50    ISIS   CSR1K2
 0    20   20     000c.2968.32db 40    BD Eng Gi2:SI20

8 unicast routes displayed in Overlay1

----------------------------------------------------------
8 Total Unicast Routes Displayed



CSR1K2#show otv route

Codes: BD - Bridge-Domain, AD - Admin-Distance,
       SI - Service Instance, * - Backup Route

OTV Unicast MAC Routing Table for Overlay1

 Inst VLAN BD     MAC Address    AD    Owner  Next Hops(s)
----------------------------------------------------------
 0    10   10     0000.0000.0004 50    ISIS   CSR1K1 ß Not-Local MAC, Encapsulate using IP
 0    10   10     0000.0000.0005 40    BD Eng Gi2:SI10 ß Local MAC, Encapsulate natively
 0    10   10     0000.0c07.ac01 40    BD Eng Gi2:SI10
 0    20   20     0000.0000.0005 40    BD Eng Gi2:SI20
 0    20   20     0000.0c07.ac01 40    BD Eng Gi2:SI20
 0    20   20     000c.2968.32db 50    ISIS   CSR1K1

6 unicast routes displayed in Overlay1

----------------------------------------------------------
6 Total Unicast Routes Displayed

So the Final verification that R4, which is emulated as a host in VLAN10 in DC1, can reach R5 (VRF VLAN10), which is emulated as a host in VLAN10 in DC2.
R4#ping 192.168.10.5 

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.10.5, timeout is 2 seconds:
.!!!!
Success rate is 80 percent (4/5), round-trip min/avg/max = 12/22/28 ms

R4#sh ip arp
Protocol  Address          Age (min)  Hardware Addr   Type   Interface
Internet  192.168.10.4            -   0000.0000.0004  ARPA   FastEthernet0/0
Internet  192.168.10.5           20   0000.0000.0005  ARPA   FastEthernet0/0
Internet  192.168.10.254         55   0000.0c07.ac01  ARPA   FastEthernet0/0

But R4 can still reach outside using the Default-gateway, which in our Lab Case using virtual-IP address from HSRP Feature
R4#ping 3.3.3.3

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.10.254, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 12/28/40 ms

Save for the Last, I managed to Capture the traffic when R4 try to ping R5. Note that this capture I did where I used the gramps GNS, so I managed to capture it before Routers in GNS that connecting to the CSR1K is error within one minutes interval, which is painful, but worth the result ;)


So the bottom line is that OTV is a feature to extending efficiently the Subnet or VLANs within different Data-Center in different Location, which has native built-in Optimized Control-Plane Mechanism. It can use Multicast address as its OTV Control-Plane as we’ve seen in this example, which means our Network must be Multicast Aware network between DC1 and Other DCs.

I hope that has been informative, and I’d like to thank you for reading :)