IPv6: Why the first IP of my block decided to live its own life without me

I operate servers where we use BGP (EVPN - VXLAN). The architecture is similar to the one described by Vincent Bernat. The hypervisors act as routers for the VMs.

When I designed the addressing plan, I used IPv4/IPv6 networks in /24 and /64 respectively with the gateway VIPs positioned on the first available IP: 10.0.0.1/24 and fdc0:3489:e0fd::/64.

This topology has been working perfectly for 2 years. Recently, we had to deploy a dedicated VM for interconnecting two VRFs. We use iBGP sessions for route exchange between a VM and a hypervisor.

hv-vm-bgp-sessions

In our case, the session is properly established and the exchanged routes are correct. I enable IPv6 forwarding on the VM (net.ipv6.conf.all.forwarding=1). At this precise moment, the BGP session drops. What happened?

First debugging step, analyzing the daemon logs on the VM:

1
2
3
4
Dec 25 09:42:23 firewall bgpd[10797]: [PXVXG-TFNNT] %ADJCHANGE: neighbor fdc0:3489:e0fd:: in vrf vrf1 Down Peer closed the session
Dec 25 09:42:23 firewall bgpd[10797]: [PXVXG-TFNNT] %ADJCHANGE: neighbor fdc0:3489:e0fe:: in vrf vrf2 Down Peer closed the session
Dec 25 09:42:24 firewall bgpd[10797]: [TXY0T-CYY6F][EC 100663299] Can't get remote address and port: Transport endpoint is not connected
Dec 25 09:42:24 firewall bgpd[10797]: [TXY0T-CYY6F][EC 100663299] Can't get remote address and port: Transport endpoint is not connected

The error seems quite clear, our BGP neighbor appears to be unreachable.

1
2
3
root@firewall:~# ip vrf exec vrf1 ping fdc0:3489:e0fd::
PING fdc0:3489:e0fd:: (fdc0:3489:e0fd::) 56 data bytes
64 bytes from fdc0:3489:e0fd::1: icmp_seq=1 ttl=64 time=0.025 ms

First clue, the response is very quick and the reply’s “from” field comes from the VM.

I decide to check the route used by my kernel to reach this IP:

1
2
root@firewall:~# ip route get fdc0:3489:e0fd:: vrf vrf1
anycast fdc0:3489:e0fd:: from :: dev vrf1 table vrf1 proto kernel src fdc0:3489:e0fd::1 metric 0 pref medium

We notice an anycast route. I’ve never seen this type of anycast route before, but its behavior seems identical to local type routes.

In summary, the kernel adds an anycast route on the first IPv6 of our network as soon as forwarding is enabled.

While digging through the kernel code, I come across dev_forward_change.

We will use function tracer to understand the implementation. I plan to write a separate article about this topic later.

Let’s check if a function tracer is available.

1
2
root@firewall:/sys/kernel/debug/tracing# grep -i dev_forward_change available_filter_functions
dev_forward_change

It’s available, we can enable tracing and activate forwarding.

1
2
3
4
root@firewall:/sys/kernel/debug/tracing# echo dev_forward_change > set_ftrace_filter
root@firewall:/sys/kernel/debug/tracing# echo function > current_tracer
root@firewall:/sys/kernel/debug/tracing# echo 1 > tracing_on
root@firewall:/sys/kernel/debug/tracing# sysctl -w net.ipv6.conf.all.forwarding=1

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
root@firewall:/sys/kernel/debug/tracing# cat trace
# tracer: function
#
# entries-in-buffer/entries-written: 6/6   #P:2
#
#                                _-----=> irqs-off/BH-disabled
#                               / _----=> need-resched
#                              | / _---=> hardirq/softirq
#                              || / _--=> preempt-depth
#                              ||| / _-=> migrate-disable
#                              |||| /     delay
#           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
#              | |         |   |||||     |         |
          sysctl-2147    [001] ...1.  7568.153838: __ipv6_dev_ac_inc <-dev_forward_change
          sysctl-2147    [001] ...1.  7568.153875: __ipv6_dev_ac_inc <-dev_forward_change
          sysctl-2147    [001] ...1.  7568.153957: __ipv6_dev_ac_inc <-dev_forward_change
          sysctl-2147    [001] ...1.  7568.153981: __ipv6_dev_ac_inc <-dev_forward_change
          sysctl-2147    [001] ...1.  7568.154053: __ipv6_dev_ac_inc <-dev_forward_change

This confirms that the dev_forward_change function is indeed being used, we can continue reading. A few lines down, we see the call to addrconf_join_anycast

Here’s the function that decides which IP to steal from our hypervisor:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
static void addrconf_join_anycast(struct inet6_ifaddr *ifp)
{
        struct in6_addr addr;

        if (ifp->prefix_len >= 127) /* RFC 6164 */
                return;
        ipv6_addr_prefix(&addr, &ifp->addr, ifp->prefix_len);
        if (ipv6_addr_any(&addr))
                return;
        __ipv6_dev_ac_inc(ifp->idev, &addr);
}

In this function, we can see that if the IP on the interface is in a prefix larger than /127, then we call the __ipv6_dev_ac_inc function with the device and prefix.

The __ipv6_dev_ac_inc function adds this anycast route using the ip6_route_info_create function.

After understanding this process and finding the right keywords, I found a series of articles like Daryll Swer’s and Karl Auer’s. Finally, if you want to learn more, I recommend reading RFC 4291.

In summary, enabling IPv6 forwarding on our VM had an unexpected effect: the kernel automatically added a “Subnet Router” anycast address (thanks to RFC 4291), which nicely short-circuited our BGP session by taking over the address we were using.

Today, I’ve never seen people intentionally use the properties of this address, and Linux’s less visible behavior can be a source of errors.

However, the solution is quite simple: avoid using the first address of the network (::). A specific address like ::1 will work perfectly fine.