GCP Internal LBs , How dirty? // BrainBit Latest Articles

If you ever decide to tcpdump on a node behind an GCP internal Loadbalancer you will be surprised to see that the packets aren’t destined to the interface ip on your VM , the ip headers maintain the destination address of the load balancer , let me draw it:

I have a normal NLB passing traffic to a instance group with a bunch of dnsmasq nodes , nothing out of the ordinary.

The issue is that if you tcpdump on the dnsmasq nodes, you will find that packets aren’t destined to the node , the interfaces aren’t in promiscuous mode and the packets are still accepted and responded:

The dnsmasq node has this address:

root@dnsmasq:~# ip -4 -br a | grep ens4  
ens4 UP 10.1.16.238/32

No alias nothing else , now if i tcpdump a request:

11:35:47.356361 In aa:aa:aa:aa ethertype IPv4 (0x0800), length 100: 10.10.1.52.55173 &gt; 10.1.16.240.53: 24391+ [1au] A? [www.pepinos.com](http://www.pepinos.com). (56)

So 10.10.1.52 its the user (me) , i am querying the lb (fwd rule) ip (10.1.16.240) and that traffic is fwd to the dnsmasq instance , so far so good.

But how the dnsmasq instance responds to a packet that isn’t destined to itself?

I’ve looked into the syscalls of dnsmasq and i get pretty much the same:

root@dnsmasq1:~# strace -fffffffff -e trace=recvmsg,sendmsg -p 5454 2&gt;&amp;1 | grep pepinos  
recvmsg(4, {msg_name={sa_family=AF_INET, sin_port=htons(53408), sin_addr=inet_addr(&#34;10.10.1.52&#34;)}, msg_namelen=28-&gt;16, msg_iov=[{iov_base=&#34;\2F\1 \0\1\0\0\0\0\0\1\3www\7pepinos\3com\0\0\1\0&#34;..., iov_len=4096}], msg_iovlen=1, msg_control=[{cmsg_len=28, cmsg_level=SOL_IP, cmsg_type=IP_PKTINFO, cmsg_data={ipi_ifindex=if_nametoindex(&#34;ens4&#34;), ipi_spec_dst=inet_addr(&#34;10.1.16.240&#34;), ipi_addr=inet_addr(&#34;10.1.16.240&#34;)}}], msg_controllen=32, msg_flags=0}, 0) = 56  
sendmsg(4, {msg_name={sa_family=AF_INET, sin_port=htons(53408), sin_addr=inet_addr(&#34;10.10.1.52&#34;)}, msg_namelen=16, msg_iov=[{iov_base=&#34;\2F\201\200\0\1\0\2\0\0\0\1\3www\7pepinos\3com\0\0\1\0&#34;..., iov_len=76}], msg_iovlen=1, msg_control=[{cmsg_len=28, cmsg_level=SOL_IP, cmsg_type=IP_PKTINFO, cmsg_data={ipi_ifindex=0, ipi_spec_dst=inet_addr(&#34;10.1.16.240&#34;), ipi_addr=inet_addr(&#34;20.86.0.0&#34;)}}], msg_controllen=28, msg_flags=0}, 0) = 76

So it is clear , the dnsmasq node , has no problem replying to an interface address that it does not own (known maybe as promiscouous mode) , but this box isn’t set to do anything fancy (let alone promiscuous mode)

Ebpf?

Initiall i thought it could be some EBPF rule attached to XDP , rewriting the ip headers , but that should be visible from userland when you strace , and it was not the case ,

ip link|grep xdp -i

returned nothing , I’ve also looked with bpftool and still nothing

Aliases?

No , nothing in the box , the dnsmasq node has only one interface with a single address.

Documentation?

There’s some lazy mentions to what it’s happening but at no point its documented what happens on the instance:

So that’s exactly what we see, the question is how the node responds to a packet that isn’t destined to it.

Virtio?

GCP instances come preloaded with a lot of tools and drivers , they use virtio for network drivers and that is a hook where the hypervisor could control some of the traffic comming in:

Creating instances that use the Compute Engine virtual network interface

Internal TCP/UDP Load Balancing overview | Google Cloud

So there’s some data there and if you lsmod or sysctl list-unit-files you’ll see the bloat:

But i still think this is too complex , mangling with drivers and/or ebpf to do something so silly and simple.

Routes:

Routes would be something that you could potentially use to divert traffic to yourself so i started to look at normal places:

root@dnsmasq:~# ip route  
default via 10.1.16.1 dev ens4 proto dhcp src 10.1.16.238 metric 100  
10.1.16.1 dev ens4 proto dhcp scope link src 10.1.16.238 metric 100

Nothing there , remember that we’re looking for traces of the 10.1.16.240 (which is the address of the loadbalancer)

I remember that there is the fib_trie that also display local rules , but before lets say show me the routes i need to get to 10.1.16.240:

root@dnsmasq1:~# ip route get to 10.1.16.240  
local 10.1.16.240 dev lo src 10.1.16.240 uid 0  
    cache &lt;local&gt;

Wait what? is it a local route? so i can ping you? (this lb is only passing udp traffic)

root@dnsmasq1:~# ping -c1 10.1.16.240  
PING 10.1.16.240 (10.1.16.240) 56(84) bytes of data.  
64 bytes from 10.1.16.240: icmp_seq=1 ttl=64 time=0.020 ms

Plot thickens , so somehow this route has been injected on the fib_trie:

root@dnsmasq1:~# grep 10.1.16.240 /proc/net/fib_trie  
              |-- 10.1.16.240  
              |-- 10.1.16.240

AH!!!!

root@dnsmasq1:~# ip route ls table local | grep 240  
local 10.1.16.240 dev ens4 proto 66 scope host

So the local kernel routing is a very special table that is maintained by the kernel and it allows you to set unicasts or ip aliases (http://linux-ip.net/html/routing-tables.html)

So that’s it , But who triggers this ?

Im not gonna go a long way about this , but:

They’ got a daemon on your instance:

Here’s the repo:

GoogleCloudPlatform/compute-image-packages

And more importantly here’s the magic:

Look at the args … sounds familiar ? it’s running ip route!

Let’s test this in a local env!

That won’t work obviously , i don’t own the ip 1.2.3.4 (i wish i do)

But let’s do what the daemon does by hand!

ip route add local 1.2.3.4 dev lo

perfect:

root@x1:~# ip route ls table local | grep 1.2.3.4  
local 1.2.3.4 dev lo scope host

And now!

that’s the mystery solved!

Welcome to BrainBit

Tags

About

Hire

rss

GCP Internal LBs , How dirty?

Ebpf?

Aliases?

Documentation?

Routes: