Connect Working Group session
24 October 2017
11 a.m.
REMCO VAN MOOK: Good morning everyone. Find your seats. Yes, I am intentionally loud, so you all wake up. If you are not awake, go find yourself some coffee and come back in. If you are looking for Address Policy, you are too late, because that's already finished.
All right, let's get this show going.
Welcome to the connect Working Group in the main room. This room has four emergency exits, two in the back, and two over on that side. And there is one over there, but I mean that means running over me and I have no idea where that goes through.
Welcome we have got a lovely agenda that I'd like to see on the screen.
Okay. I suggest we just start with this presentation then. So, welcome to connect. There is an agenda, if you don't believe me go find it online. We'll put it on the screen later.
My name is Remco, I am one of the co‑chairs of this Working Group, the other co‑chair is the lovely Florence who is sitting in the front who has flown in last night from 11 time zones away to be here. So, please be gentle with her. And then the first item on the agenda is the Working Group Chair selection.
The bit of navel gazing that we have to do now and then to make sure this is an open and democratic process. So here we go.
So, Working Group Chair selection. Believe it or not, this Working Group has now been around for three years. Amazing. And we are still sitting in one room, that's the most amazing bit. We have covered the Working Group Chair selection stuff twice. Once at RIPE 69, where this was my slide and apparently there is a red squirrely, there is a typo in there. That was under contested at RIPE 69, but we never actually adopted this as the selection process. So, in order to make up for that and because I had completely foregone about this slide, we adopted the following at RIPE 72, and this is a call for interested parties to make on the Working Group mailing list at least every three years or whenever needed. You can read it yourself.
Since it has been three years, Florence and I decided that we're just going to, well go through the process which means that based on it, we'll do a call for interested parties on the mailing list this week. I am probably going to send it out tomorrow. Then two weeks later we will announce the full list of candidates and have a discussion on the list. And then after two weeks of discussion, which is going to be fruitful and plenty, Florence and I will make a decision on who gets to succeed us. No feedback from you means that Florence and I get to decide whatever the hell we want. And still lots of feedback from you still means that Florence and I get to decide whatever the hell we want, but that's a different story.
So, just as a bit of comfort, I just had a chat with Florence and we'll both put our names in the hat again. So if you are happy with us running the show for another while, we'll be happy to do that. If you'd like to relieve us from our duties, that's also fine.
So, this is the timeline for that. October 25th, call for nominations. November 8th, announcing the list of and dates. November 8 through 21 is the discussion phase. And on November 22nd, we'll publish our decision at which point the Connect Working Group will have two new Working Group Chairs, or newly appointed Working Group Chairs, and that's pretty much it for this agenda item. Any suggestion, remarks, questions, people ready to already stand and put themselves forward as a candidate? I didn't think so.
Right. Okay. On to the next agenda item, and by now I could really use an agenda slide because my memory is not that good ‑‑ good enough.
Next up is a presentation about detecting peering infrastructure outages in the wild.
VASILEIOS GIOTSAS: First of all, I want to thank the RACI for supporting me and giving me the opportunity to be here. I will present you the work on detecting outages at peering infrastructure by persons publicly available routing information. And in particular I will focus on two types of infrastructures, Internet Exchange points and carrier neutral co‑location facilities. Both which are important. Both of these infrastructures support hundreds of thousands of peerings, so the availability of this infrastructure is almost be constant. The operators try to provide SLA of four 9s, in the five 9 in the case of... maximum of five minutes of down time per year. And although this infrastructure is very well‑maintained and provisions, outages still happen for various reasons, such as podcasts, human errors, misconfigurations. So we have seen from news reports that when these incidents happen, the impact on operator services can be severe, we had disruptions of banking services of communications, even navigation services have been affected. Therefore, it is very important to have the ability to detect the monitoring such outages to improve our situation and awareness and develop the appropriate risk assessment and mitigation techniques.
However, today the current state is a bit manual, so network operators largely rely on the infrastructure providers to notify them when something goes wrong. And if they are not notified for any reason, they resort in asking each other in mailing lists whether they experienced the same problems in the same locations, which is obviously not the best way to react timely in the face of failures.
And we have this problem because the ecosystem is very complicated to these infrastructure are symbiotic. In this figure here we see the topology of France IX, a large IXP in Paris so, it has distributed its peering fabric across multiple facilities all over Paris metropolitan area, and when something fails in this setup, it's hard to distinguish the epicentre of the outage from the cascading effects. And to make things more complicated we have practices such as remote peering that extend the reach of IXPs far beyond the local markets, so here we see in the map the footprint of the apples dam Internet Exchange, which is global and again when something goes wrong the implications can be tricky to predict. So we tried to tackle these challenges, by having these research goals. First of all, we want to develop a technique that will automate the detection of outages at co‑location facilities and IXPs in a timely manner.
We want to be able to distinguish the source of an outage from the cascading effects.
And have the ability to track the evolution of the outage in order to determine how long it lasts, what is the impact on the routing paths and what is the geographic scope of the affected networks.
And these goals are far from trivial. And I'll try to explain the challenges through an example which is based on an actual incident. So we have four IXes here, interconnected through four different facilities in one IXP and when everything works normally, AS 2, which is our advantage point, which is AS 0 through the facility to IXP. Now when facility 2 fails at some point, the AS reroutes the path through facility 1 and the first thing to note here is that the AS host remains exactly the same. So we need to have the ability to observe the paths between ASes, not at the AS level only but also at the infrastructure level. But even if we have this ability, we may still not be able to distinguish exactly the source of an outage. So, in this example here, we have vantage point at AS 2 and that observes that both Facility 2 in the IXP disappeared from the path, so, the outage can be neither of these infrastructures. And to distinguish exactly the source of the outage, we need to combine, correlate the views from multiple vantage points. Again we need a large number of measurements because we need to continuously monitor the routing system in order to compare the paths before, during and after an utage. So would he have threes three requirements that are seemingly contradictory because of the limitations of our measurement data sources. So on the one hand we have BGP measurements, of let's say from route views which are lightweight so hey allow to us continuously monitor the paths through multiple vantage points but the paths are at the AS level. On the other hand, we have trace route measurements through let's say RIPE Atlas, which give us a very good level that we can map to IXPs to ther facilities. But, they are expensive and especially for the scope and the scale of thedy, the cost of these types of measurement campaigns are pro hip tif. So how we need to combine the lightweight measurement of BGP data with the trace route data. And in this, we see that yes BGP information hiding protocol but the AS path is not only BGP attribute that encodes the political data. We also have the community attribute, which is optional. But it's becoming increasingly popular in this RIPE meeting for instance we saw earlier presentations about blackholing or graceful BGP set downs and one very popular use case of communities is the annotation of point where a prefix has been received. So, if, let's say, AS 2 uses the community to to an Nate the ingress point where it receives a path from AS1, we can reconstruct the infrastructure level hops and by gradually doing it through collecting data from multiple vantage points, we can collect the transfers infrastructures for all the ASes that we want to study. And if we monitor how these communities change we can capture path changes at this infrastructure even when the AS level paths remain exactly the same.
Now one of the problems that communities are not standardised. It's possibly the only BGP attribute without standardised semantics, but fortunately many operators provide public documentations in their WHOIS its records or the web sites so we will developed some natural language parsing tools to construct these recommendations and we have created a dictionary of over 3,000 community values that provide ingress point annotation at three levels of granularity, at IXP level, facility level, and the other one. You can see it here on the map. And although it is, these communities are just by less than 500 ASes, these ASs are quite large, they are present at hundreds of facilities and IXPs and they have thousands of links, so, we can map at least one traverse facility for half of the IPv4 paths and one‑third of the IPv6 parts. And paths. And we can track one‑forth of the facilities in BGP. And importantly 98% of the facilities with more than 20 members, which are the most prominent ones.
So by having this dictionary, we can start doing the actual detection. So, let's say that we want to monitor Facility 2. The first step that we do is that we collect all the BGP paths that have annotated ‑‑ that are annotated with communities of Facility 2. On the slide we see the number of some of paths across time and we monitor these paths for fluctuations. So, fluctuations mean that we try to find when we have changes in the communities attribute that indicate deviations away from our facility. And when we have time periods where we have multiple concurrent community changes, then we have an indication that something happened at this particular location, and this can be, let's say, a partial outage, because the paths don't go all the way down to 0, but we have a significant fluctuation. But it also may be some peering between large ASes or some major routing policy change. So we trigger an investigation process, which is a very involved process. I invite you to read the paper for all the details. I have a link at the end of this presentation.
But initially we run a few targeted measurements, guided from the BGP hints, and we study the zone affects paths and how many ASes and how many linking are involved. And through this process we can determine the loot goes of the sing easily and if it's an outage we continue monitoring the community value and we find that the outage has been resolved when the majority of the paths return back to the original location.
So this whole process is more complicated because the community for our location that we monitor may be affected by outages in facilities further away, so for instance, here we see that links paths may be affected by outages in one of its an affiliated facilities. And we find that signals through studying its ‑‑ studying the paths not in an aggregated manner but at the granular separate to facility to co‑locations as I show on the right‑hand side. Again, this is more complicated process, but it allows us to accurately pinpoint the source of the outage. And this, essentially this technique enables us to deknows the BGP activity that normally arises from the dynamics that are caused by an outage. So here we see an outage that happened in France IX at the point of the grey bar and the blue BGP updates with the red and purple State changes and as you can see at the point of outage we don't get a strong signal that something happened. But when we filter the paths based on the communities that annotate France IX as the ingress point, we have a very strong signal which allows us to trigger the investigation process and localise the outage accurately.
So we have applied this technique in five years of historical BGP data and we were able to find almost 160 outages, most of which were not widely reported. We tried to validate them through a series of means such as status reporting, network operating centres, direct communication with operators and social media posts. We found that we have pretty good accuracy. The few missing ones that we have were actually fibre cuts very close to the infrastructures, and we missed only a few partial outages that did not result in a signal of high magnitude.
In 70% of the cases, where we had failures in facilities, the uptime went bloat the promised five 9 and even in 5%, it went below the three 9 of uptime with very clear implications, I don't want to expand on that. And we also used active traceroutes to assist the geographical scope and the impact on the end‑to‑end latencies, so here we have an outage of the AMS‑IX Internet Exchange in 2015. We see that due to remote peering, over half of the affected links are in a different country. And 20% are even in a different continent. And we observe a 50% increase of the paths with latencies over 100 milliseconds which obviously affect the upper layer applications.
So to conclude. We developed a new technique that allows us to timely and accurately fill outages in co‑location facilities. We have seen that the majority of these outages are not widely reported and we have also seen evidence that remote peering, let's say, globalises the impact, amplifies the impact of the localised events. And lastly we hope that through this approach, we provide high dividends to operators and researchers to help them improve the accountability, transparency and develop new mitigation techniques to improve the peering system. Thank you very much. I am ready to accept your questions.
(Applause)
FLORENCE LAVROFF: All right. Any questions? Second time... last time? Okay, all right. Thank you.
Let's move on to the next item on our agenda this morning, which is about IXP automation with IXP manager and NAPALM from Barry O'Donovan from INEX. Thank you, Barry.
BARRY O'DONOVAN: Thank you Florence. As Florence said, I am Barry O'Donovan, I am from INEX where I work on the operations team and also a developer of IXP manager. What I'm talking about here today is how we have started automating IXP manager at INEX to automate the peering LAN.
So for anyone who doesn't know what IXP manager is, it's a tool that we built at INEX many years ago to help manage and provision IXPses, its released at Open Source software, it's in use by about 40 or more IXes around the world that we know of. We have recognised a value of automation at INEX for many years, over 12 years ago when we first started the route server service where we do strict layer 3 ACL filtering, that was automated from the beginning, using IXP manager and that's kind of where IXP manager became a real thing at INEX.
We have never gone ‑‑ up until recently we have never started automating at the network device level, and that was mostly around tool chaining. I am sure everyone in this room has blood on their hands from trying to pipe some form of Perl configuration script through RANCID to a network device to find out it's failed after firmware upgrade because they've change the format of the log‑in prompt, or the things that always go wrong when you are dealing with that kind of a tool chain.
But it's now 2017, and a lot of the frameworks that have been in use for server automation have moved into the sort of network device dimension domain and I'll thinking of Ansible and in our case SAL stack, to help this a lot of network systems have APIs available so you don't need to push configuration over SSH and pars, the output. Also at INEX we have been growing, we have a lot more points of presence, a lot more networking devices. There is four people on the INEX operations team but we operate as a two FTE, so two full‑time equivalent and they try and maintain that ratio as we expand we need to move more towards automation than a lot of manual repetitive tasks.
So we split this automation project into two phases. And the first phase aligned with a forklift upgrade we were doing of INEX LAN 1 at the beginning of 2017 where we were swapping out networking kit in several different points of presence. Our Phase 1 goal was to configure all of the IXP edge ports straight out of the IXP manager. And that's where we had all the kind of details like speed, where the report is VLAN tagged or not, LAG ports, layer 2 filters and so on. Part of that process was obviously validating all of that data because when you have a manual process of changing the IXP manager and manually going to a switch, you are going to find there is a gap in the data you have.
The other thing that we were moving from was a layer 2 spanning tree topology to a layer 3 routed core with VXLAN. At the time we were using 8 by 10 gig LAGs everywhere. Some of them have been replaced with hundred gig links, but there are still a lot of 8 by 10 lags around the network and when we stat down and did a sort of back of the envelope calculation on that we realised there was about, when you are using layer 3 for VXLAN with ECMP, there is about 200 individual configurations you have to do, so, there is 200 configuration units that comprise a layer 2 interface configuration, a layer 3 interface configuration and then your BGP configuration. So, 200 individual versions of that.
So, in Phase 1 we put that into a Perl script. We described to generate the config and the idea was to come up with how that data needs to be represented in IXP manager.
Phase 2, which has gone on through 2017, has been to use Phase 1 to fully understand the process, to try and come up with a data model that's abstract enough so it's useful for all IXes, not just INEX, and then to eventually by the end of this year to release it all as Open Source in IXP manager.
When you are looking at automation, there is typically three approaches, OpenFlow Yang and then vendor API. To cut a long story short we vent with vendor API, and that was enabled because of a package called NAPALM which abstracts how you interact with individual devices.
So when looking at our project, we looked at what kit we have and what we were going to to support in terms automation. At INEX we have a couple of old Brocade devices, two different families, that are basically end of life. They have 5 to 7 or 8 years old and they are mostly edged devices that just connect 1 gig UTP connections. They are not going to support automations and we have no plan to look at those. In terms of the early life and pre‑deploy stage. We have a RIS and cumulus. They now have full support in the automation process. We have extreme on LAN 2, it's mid‑life. NAPALM has had a not‑yet asterisk on it because that's about to be fixed because links have contracted someone to write a NAPALM plug‑in for Extreme. That will only ever have partial support. We'll talk about why that is later but it's to do with the way Extreme configure their devices.
So, this is a typical data flow for traditional network device. In our case for an Arista switch, the typical data flow is that the first three boxes are basically IXP manager, so we have an UI where we can put in core ports, edge ports, that all goes into the database. We have written an export controller that gathers up all that information. And then a presentation that will spit it out as YAML. That's what Salt uses usually for configuration. You can also put on a dot Jason to the exact same URL and get that same information as Jason so both are supported.
We have effectively four rest end points for pulling this information, we may refactor that, but right now this seems to fit. And how it works is for a particular device that you are interested in configuring, you put in a rest request to IXP manager, with that device ID, to four different end points and the end points are one that the here are the all VLANs you need to configure on this device. Here are all the layer 2 interfaces you need to configure, layer 3 interfaces you need to configure, if you are doing a routed core here's all the BGP information you need.
This is a simple example of an edge port that comes out of IXP manager for the layer 2 interfaces. You can see the name is SWP2, if you are familiar with Cumulus, you'll recognise that. It's an edge port, it has a description, it is VLAN tagged. All the negotiation is enabled. It's a 10 gig port. It is a member of a LAG. A LAG with unique index 1 on this particular device but it's not the LAGmaster and if this port need to be tagged on one VLAN which happens to be the secondary peering LAN at INEX and then you have Mac addresses or an address you need to put a layer 2 ACL on.
Conversely this is a layer ‑‑ it's a core port, it happens to be a layer 2 spanning tree port. You just say yes, it's in the prescriptive on the flavour of spanning tree you are using. That will come later. It has a cost and then some of the other information the same. The only other bit that's slightly unique here is there is two VLANs that have to be tagged. There is no information about Mac addresses because there is no layer 2 filters on core ports.
So that was the IXP manager bit. Then the next bit is Salt. So the first question is why Salt? Which is totally not a religious war. ASNsable seems to be the go‑to place for people who are configuring network devices. As Nick said we had a rationale that resulted in a sound engineering decision which boiled down to at INEX, we love SaltStack. We have been using it for years to configure servers. But one of the things when we were thinking about automation, I guess what gave us permission to use or to consider using SaltStack was Merka Unik in Cloud Player who gave a presentation at RIPE and also at NANOG about how Cloud Player have automated their network configuration using Salt and he also has written all of the NAPALM Salt integration modules to give it future comparity with Ansible.
So, in the process, you have all your data as YAML out of IXP manager which goes into the Salt master, we then use Salt templating to turn that into configuration. Salt is written in Python so is uses Jinja templating. This will look familiar to you. It's important to say you do not need to be a programmer to implement this. When it comes to templating, the kind of control structures you need are really boiled down to if, Ls if, and iteration. Both demonstrated on this tiny snippet. So it's easy. You do not need to be a programmer to generate this type of configuration. All this is doing is taking the YAML information from the layer 2 interfaces, iterating over it and then for each interface setting it to the default and then building up the configuration as new, leading to a complete configuration file.
There are some vendors have certain intricacies, you kind of wonder what Arista were thinking when they decided how to configure an interface rather than speed, integer, speed etc., we have had to work around this construct to match what they have. But the main point of this construct is it shows that you with the YAML information you get out of the IXP manager, with a little bit of templating, you can match whatever the config required on the target platform.
And while I might knock Arista on that, what I will say is they do brilliantly and what is an absolute requirement for automation, is Idempotent session based configuration merge. What that means, if you look that confixed snippet about how we configure our routed VXLAN core, our configuration gets sent to the Arista starts off with no router BGP. If you were doing that on a non‑sessioned based device like say an old Cisco IOS, you'd blow up your entire network. Because we can start off by saying no router BGP, we effectively start with a clean config, build it up exactly how we want it and then the configuration merge will only look at the differences after the fact.
Another key element here that enables the automation for us is IXP manager has the state of the network as we want it right now. It doesn't have a concept of memory, of, oh you need to add in this BGP but you remove this interface so you have to delete these other BGP sessions or interfaces. That's not how it works. The way IXP manager automation tool chain works is it defaults everything, builds up your config and only the changes then get merged through. That's why Idempotent session based configuration is critical.
Modelling problems is the kind of thing that why we had a Phase 1, Phase 2 approach and LAG is an excellent example of this. You know, on some devices like Brocade, you configure LAG on an interfacing one and the slaves become inaccessible unconfigureable. Extreme does something else. Arista something else. The lesson here was to make sure YAML structure was flexible enough to support all of this.
So, at this stage, we have every IXP manager, we have our YAML presentation, it's now in Salt Master, so how do we get that, and Salt Master has generated the configs, the next thing is getting that onto the system. Salt is a Master minion, master/slave way of working, we use what's called a Salt proxy so, for every switch there is a very small process running that talks to Salt as if it was a device and anything that Salt says to it, it then passes onto the device. And between the Salt proxy and the device is the appropriate NAPALM plug‑in. So for Arista that will talk to the device over Arista's API, for an old Cisco device it will open an SSH session.
NAPALM support. This is taken straight from NAPALM's website. You really, for proper automation you need all unqualified yeses on this. So you can see a lot of vendors are lack lacking greatly, certainly if you are talking to your vendors ‑‑ the Idempotent session based atomic configuration merge really requires those five features and without them you have ‑‑ without them you have a much more complex automation attacks.
Cumulus is a bit different. When it comes to Cumulus, it's running on a switch. You can apply Kate, install Salt minion, there is no NAPALM part to to: It's in into Salt Master, generate your config templates and let Salt push them up to the device.
The other bit that's different is unlike your typical network operating system where you have one long long configuration file. On Cumulus, everything is like in Linux, if you want tonne edit an interface, if you want to change BGP, it's /etc., FRR/FRR dot con. You just create the individual creation files.
Here is a very quick example of how we push this out. So, if I want to update the configuration on a given switch, I just refresh the Salt pillar, and what that means in Salt terms is, it tells Salt, pull all the information from rest off IXP manager and update your local state database. Then the second step is to push that config to the switch and we'd normally say test equals true and what you can see here is Salt will come back and say, if you didn't have test equals true, here is the changes I was going to make. And if you are happy with that, you can do it without test equals true.
For free range routing it works the exact same. Here is an example for FRR.con, and the only difference here we have Salt configured to watch certain configuration files. If it sees this configuration file has changed it's going to reload the FRR daemon.
Phase One results. All went perfect. All ISP ports are configured through IXP manager. The core is now configured through IXP manager. It handled the forklift upgrade successfuley. And it's more reliable and safer.
Phase 2 is just trying really to package this up for proper release. IXP manager, the code is all there, we need to do a quick review. I guess the biggest gap at the moment is documentation and creating of suggested operational work flow procedures. My colleague Nick is allergic to documentation. This is Nick building himself up to be in the mind frame to sit down and write documentation by finding anything else he can possibly do.
Operations then, configure IXP manager, manually log into switches, find your mistakes years later. Now, configure. Refresh the pillar in Salt and deploy via SaltStack. This is all be released in IXP manager and we'll have a dedicated get Repo as well for all this. Thank you very much.
(Applause)
REMCO VAN MOOK: All right. Any questions for Barry? No one? Well, in that case, job well done. Thank you very much.
Next up is Falk, but before he starts I would like all of you to remind, remind all of you that you are very welcome to rate the presentations on the RIPE 75 website. It's very helpful for us as Chairs to see what you like and what you don't like and it helps us decide what we bring to you at the next meetings.
So, with that said, up next is Falk from Deutsche Telekom, he is going to talk about new levels of cooperation between eyeball IXPs and OTT /CDNs. Falk.
FALK VON BORNSTAEDT: Welcome everybody. I would like to start thanking Florence and the Working Group Chairs because they helped me out when I was stuck in the hurricane in Cuba, didn't have Internet, you can guess it was quite a hard time for me without Internet, so she helped me far more than I ever would expect from a Working Group Chair who does it in his free time somehow.
Also, thanks to the stenographer, I really admire their work. So, now let's start with the presentation.
STENOGRAPHER: Thanks.
FALK VON BORNSTAEDT: This is how we see the Internet. Here we have the different OTTs that are sending traffic through a Cloud and content delivery structure to the eyeball operator and on the end side you have the users. So, as of today, there is no transparency, so they know what they are sending here, but they don't know what happens in the eyeball network. So far we consider this the business secret. I would never admit that maybe Frankfurt is congested and Hamburg and Munich are still free because we are selling our network and there never, ever is congestion. So that's true for most parts of the network but there may be some situations like an IOS upgrade, for example, or some singular events where we might have congestion. So, so far we treated this as a business secret. So we have thought shouldn't we be a little bit more flexible in sharing information, what's happening inside our network.
So this presentation is about what could we do to share a little bit more of insights into our network to the outside world.
This is the situation we are having. Deutsche Telekom is focused very much on eastern Europe. We moved out everywhere out of UK, so in many cases we own the mobile and the fixed operator and sometimes only the mobile operator or sometimes only the fixed one. You don't need to read this one, these are the AS numbers we are dealing with and it's like herding cats to work together with these people because they are quite independent. They used to buy 90% of their upstream outside of the group so in the last years we focused a bit and made them more streamline that they right now buy 90% of their transit within Deutsche Telekom group so we can paint the countries magenta.
What we are doing now we had a zoo of IT tools. We are trying to have the same IT tools in all the NATco's and what I'm presenting here is one kind of tool where we do the traffic measurements within the group.
This is an example of the tool we are use using so we can do a very detailed flows. It looks complicated so I couldn't show you in a bigger graph because our customers consider this also a business secret, how much NetFlix traffic would we have, NetFlix wouldn't be amused if we shared this kind of knowledge here, so I have to make it very abstract. If you want to see the tool you can approach me, I can launch HTTP and give you a deeper insight so we have a very flexible view of looking into the traffic.
We have been using our peak flow before and we are still using it in parallel but this tool gives, has a much easier user interface so you can put in here the source, the hand over AS, next hop AS and destination AS. This is one is a bit more complicated we also put the routers in there because it helps us to better steer the traffic. So it's used now with more than 50 people in our network. And the non‑expert users are finally able to manage together with this kind of tool.
But the goal is to give more transparency to the whole situation.
What triggered us was an incident we had together with a big OTT, who, of course, do these kind of traffic engineering on their side as well. But Europe is a little bit different, the OTTs comes from the US where they have a big continent. In Europe things are a little bit different, there are more borders and sometimes, for example, a foreign country is better positioned to serve the traffic than their the own country. The example here is that we have north Ryan west failure here which is very close to Amsterdam, you can serve it from Amsterdam and of course, you can serve it from Frankfurt. What we saw was the mechanisms of the OTT always thought this is a different country to we should serve not west Ryan failure also from Frankfurt and even Frankfurt was congested and we had low utilisation in Amsterdam, where traditional tools of traffic engineering led to an non‑optimal situation where traffic would simply come, not come from Amsterdam but from Frankfurt, which was already full. So we thought, whose mistake is it? Is it our mistake? Is it the mistake, just an optimisation process? We can do better. And this is why we introduced the tool and it was two things we had to change. We had to change our mindset, so the mindset for my colleagues and myself was for many years what's inside in our network we don't share with external people because we don't want to share that one. But, we went over this barrier and from the situation tomorrow will be like this, we will have serve north Ryan west failure from Amsterdam and from Frankfurt and get the right situation.
This shows the situation more a little bit in detail. This is our network and we have path, you can hardly see it. The green one is a free path and there is a red path, these are the congested paths for example. So the CDN could send this way, but this is not a good idea because these two paths are full. But we have to tell the CDN how should the CDN know that this green path is still empty and deliver the content on this path? So, what we are doing is maintaining a realtime map of our own network and our own capacities, and the network state information is offered to CDNs and OTTs. And the whole thing was developed in a start‑up from Deutsche Telekom which is called Benox, and we have done it with the start‑up but what's happening now, it's not only theory, but we are in real life now and we are not only in a lab environment we offer this to one or the first OTT and we are testing this in real life. The motivation is not only ‑‑ the motivation is threefold.
We can make ‑‑ the user will be more happy because they get better quality and of course, it's also good for the OTT if the whole thing gets better, and we can save some kind of network cost. So I don't want to share real load factors here, but just as a matter of of example, if we would run our core network at 30%, together with this tool, we are able to one it at 33%. So these are not real values which I'm not allowed to share, but it gives you an idea you can run the network a little bit hotter. And in a big network like ours, 1 or 2% higher utilisation yield millions of less investment.
So, what I showed you here is what we call flow director. I don't ‑‑ it can be used by many companies and I put here some examples of companies who might be able to use it. And the second element is the analytics where you can really look deeper into the traffic flows which is good for sales or which is good for peering coordinators to see where we have the network is full or where we still do have places.
So, what is really what we have been changing. Why is Deutsche Telekom opening up? I have said it before, I just repeat it. We have been too restricted and too reactive in the past. So it's not a good idea just to have a closed shop and say, okay, the others can move but we don't share details from your own network. So, we learned about how our network works and we were cooperated with CDN and OTTs, and we really think it's the way forward to cooperate with OTTs and CDNs. The product status is the product is ready but we have only a first OTT connected so far. The analytics tool works within Deutsche Telekom and magnificent a telecom, the Hungarian subsidiary as well. And we are preparing for the implementations in the EU.
So, it's operational since July 2017 and please feel free to approach me if you are from an OTT and would like to go deeper into the whole thing.
So, the idea is initially we thought we would sell the whole product that was also motivation to do it, but I think we get so much out of a better cooperation that we are willing to give it away for free to the OTTs, because we make the benefit in our own network and in better customer experience.
Thank you very much for your attention.
(Applause)
REMCO VAN MOOK: All right. Thank you Falk, that was very interesting. Any questions for Falk? You are being awfully quiet this morning. I think ‑‑ oh, Aaron.
AUDIENCE SPEAKER: I am Aaron. Just to offer a question of interest. Did you try for example to use a known attribute like a multi exit distributor to adjust with your flow analytics to see the difference between the performance and a more specific OTT implementation?
FALK VON BORNSTAEDT: Ruediger could answer this better. Of course we are using the well‑known attributes to steal the traffic, but the whole thing doesn't give you an insight. So then we are the active part. So we would really share insight into our network which we didn't do so far. So the best person to address is Ruediger, he is unfortunately not in the room.
AARON HUGHES: Fair enough, I'll ask him off‑line.
REMCO VAN MOOK: All right. Any further questions for Falk? Going once... twice... thank you very much, Falk.
Next up we have a lightning talk from Christoph from DE‑CIX, welcome Christoph.
CHRISTOPH DIETZEL: So, Hi, I am Chris, I am with DE‑CIX, so actually I am head of R&D at DE‑CIX and also a student at INET and today I'd like to talk about inferring BGP blackholing activity in the Internet.
So basically, the idea is we know that blackholing is there and it's there since, let's say, decades. But no one really looked into it to which extent it is really used, like, are there ends which correlate or ‑‑ I mean, there is a lot going on in DDoS mitigation and protection but we don't really have a clue as of today like what in detail is going on, and we performed the measurement study to get some insights on this.
So basically, the motivation is quite simple, and I think in this audience, it's quite easy for me to make this point. There are like really huge DDoS attacks going on in the Internet and it's not that some end‑user services such as like just some websites get DDoSSed and sort of blocked, but we also see, for example, like the second picture here, the OV edge thing, it's like also infrastructure that really gets done by the DDoS attacks, and that's why we actually need some sufficient measures to counter DDoS attacks. I mean, since we don't really want to do traffic scrubbing at an IXP, right, like not touching anything beyond layer 2 or maybe a bit layer 3, we still are there with blackholing.
And since blackholing, or let's say like the triggering mechanism, meaning like announcing a prefix for a blackholing basically, got standardised recently, we thought okay it's about time to really get into the details and see whether networks really use it on a large scale or not as much. So, for now, or blackholing used to be a bit implemented differently so each ISP or IXP might have used a community or maybe next hop to just announce blackholing. But since a year or something like that, we have this standardised community defined in the RFC, and since this situation changed, we are even more curious, okay, so since it's standardised, so how is the usage across the Internet?
And here are the typical scenarios, how blackholing is used. So, either it's used in the context of an ISP. So, where we have like sort of a rather simple upstream transit relation, meaning, like, I announce to my business partner, like, to my upstream provider that actually traffic that is directed towards my AS, towards some prefix I own, gets blackholed, and this is done basically as defined in the two RFCs, using the standardized community, and this community also applies for blackholing at IXPses which basically is the same but just across the route server you can also announce it to several peers at the same time. So same scenario, but impact on multiple ASes at the same time.
So, basically, when we started off our research goals were due to DDoS attacks are getting more and more important, doing more harm, not just to the Internet, but also to our society. We wanted to understand like how is the Internet wide adoption of blackholing? Is it a useful tool or is actually the industry not really using it but rather going for scrubbing services or like in their AS or yeah, solutions which just apply to themselves or just building their own solutions to counter DDoS attacks? And we also want to profile the targets using blackholing, so, what seems to be a typical victim of DDoS attacks and we also like to understand the blackholing practices meaning, like, for how long is a prefix announced, how long is it used and can we really correlate well known DDoS attacks or huge DDoS attacks reported in the media with actually some behaviour of network operators. And we also like to understand the network efficacy to some extent. So therefore we performed the measurement study and looked for about two years into control plane data and also verified the findings of that.
This is basically one ‑‑ a picture that tries to cover how our system works, but the take away for a lightning talk is not ‑‑ let's not get into the nasty details but rather understand the big picture. So the idea is that there is this standardised community. But there are also like a tonne of other communities as defined by a specific AS which offers blackholing and what we did was to really sort of crowdsource this information and like getting it from operators, websites, like really depends where you specify the community's useful blackholing, but actually the idea was we want to have like a global, say, dictionary, which has, like, each AS in it that offers blackholing with the corresponding community. So that we can have like an Internet wide understanding of who is using it and when someone is using blackholing.
So, the high level picture is like when we analyse like two years of BGP data using public available BGP streams or BGP feeds using, I don't know, BGPmon, the RIPE RIS data, which is very valuable for our research, and we saw, over the past two years, an increase of about 100%. So, the usage of blackholing Internet wide doubled, and this is just like from the provider perspective, right. Like, the ASes actually offering blackholing. And if we focus on the user groups like the ASes really using blackholing and seeing something announced for blackholing, we see a 600% increase across the same time.
So, when we started off with our research in 2014, December, we saw about 485 different prefixes, and that actually ‑‑ so the increase is to almost 5,000 and we really observed 160,000 different prefixes, like unique ones, not really looking into the on /off announcing and withdrawn patterns, but across an entire period.
And indeed, like in the beginning, we were wondering whether there is a correlation between well‑known events and indeed we found such things. So, if we focus on this plot, we can really see that the attack on the Russian government, so there was really an increase, and just to remind you, we are just focusing on control plane here. So not necessarily mean that there was a lot of traffic, but there could have been a lot of traffic also. So, we saw the attack on the Russian government. But we also saw, like, the event of the Olympic games in Brazil where we just saw that there was publicly reported that there was like really a huge number of DDoS attacks and we saw a corresponding announcements of blackholing and the curbs on security event was also visible within our data.
So, if we focus on blackholing provider ASes, actually who is offering blackholing and who has those communities publicly available so that we could look into it for our research, we found that the majority is in the US, Russia and central Europe. So, also, 184 ASes out of 242 are transit access networks. So, the majority is really those guys who have the services or like the end users, and the minority is like CDNs or other types of networks.
And it's about only 10% of IXPs available. So, actually, from 242, the number of IXPs is quite low, it's just like 10%.
And we can also see in this plot that actually there is a long tail, like ASes that announce more than 1,000 prefixes, but it's only 20 that announce more than 1,000 unique different prefixes.
So, actually, there is quite a huge number of small ones, but there is also a long tail, even though the long tail is not too high.
So from our perspective, one would probably expect that there are really networks that heavily use blackholing, or even for offering such a service such as DDoS crabbing as a fallback using blackholing, but in this scenario, we see that it's not really the case, so there is just 20 with really more than 1,000.
So, if we focus on the blackholing users, so the ASes that actually make use of the blackholing servers, we see the same pattern as before; that it correlates somehow to the providers. But a part of the Russia, US and central Europe, we also see that there is quite a number in Brazil and also like Ukraine. And we see that the users of blackholing ‑‑ the group of CDNs is quite dominant with, like, 18% of the users account for 43% of the prefixes, meaning, like, okay, this service is there like for the entire community, but actually it's mostly used by content providers and 18% of the users account like really for the majority of announced prefixes that we can observe.
And we also have a longer long tail here, meaning that there was quite a number of networks that used it extensively and the number was bigger than the provider side.
And when we looked into the data, we found that mostly small Cloud providers and hosts using blackholing, is it's quite diverse. So you have the big players really using it heavily but there are the smaller players who make use of it once in a while, but really a huge number of smaller providers.
So, I mean, blackholing is one thing, but on the other hand, you want to get some understanding of who actually, or which service is actually using blackholing. This is also just measurement thing which means that might be a bit biased, but we found that open host ports like just 60% of the hosts, we addressed, or like the hosts, we tried to measure with our active measurements, so, just 60% responded so we just had 60% open host ports, and we found that http was quite dominant with more than 50%, and of those 53%, 61% replied with http path, which means that like the dominant service using blackholing or like the business of the ISP using blackholing is basically http, which somehow is not too surprising if we look at the Internet we know that the majority of services using http, right.
But, there was also like interesting number of set of protocols and sort of replies on different ports, which it brought us to this plot where we see obviously http, http S, but also a thing such as Telnet or NTP. And on the other plot, like here, we can see that if we want to infer actually the AS distance that has been used for blackholing, like is it really like my neighbour or is it an AS which is further away? We can see actually that the ‑‑ actually the blackholing provider AS is just not on the BGP path like at all. Either we have ‑‑ like it's the direct peer and we see that actually, in general, that the blackholing provider ASes is quite close, so it's not really far in AS is) it's not really far with regards to number of AS hops.
And so, as the numbers tell us here, like, first hop is about 10%, and at least one hop is 30%, but we see that it's even up to 6 hops away. So, that someone is announcing blackholing to an AS which is like 6 hops away on the AS path.
So, when we focused a bit more on the actual blackholing events, we found that the duration of a blackholing announcement across the Internet is mostly rather short, like less than an hour. But we also found that there is quite a number of, like, significant durations like up to two years. So I mean, we just observe the behaviour on the Internet, not really knowing like what were the reasons for it, and if we really focus on the long tail, we can assume that it's probably not really used to mitigate a DDoS attack, right, so it might be that some ASes use it for a different purpose in this scenario.
But we see that the majority use it rather like one hour, 12 hours, one day, so rather short, and especially here for this group and this group.
What we also can see is that the duration in hours across, like, the entire period is like really really long, and we still have the long tail here again. So, the number of events that we observed for a really long time was still here, but as we can see, like, the majority is really short. So this appears to us that blackholing is really used to mitigate DDoS attacks and to not do it like as a filtering ‑‑ or use it as a filtering service but really use it as a last resort DDoS mitigation technique.
So at some point we understood that like how Internet wide adoption and using of blackholing is, but we still wanted to verify our findings with active measurements because control plane is one thing and we also wanted to understand the data plane. Therefore, we performed the measurements, it looked into trace route data, and actually just sort of confirmed our findings from the control plane.
So, basically, we see here the IP level hops difference, so again how far actually ‑‑ or how long the AS path is curing and after the event and we understood that it sort of increased, like the AS path got longer during those events, but this was not just like significant enough at this point. And again the AS level hop difference is like, there is a difference basically, but we expected to have ‑‑ we expected it to be more significant when we started off with our research basically. And we again see that the content providers are dominant and, in this picture of ‑‑ yeah, like how far an AS hop or an AS is away that's being used.
So, let me just conclude. We performed the first Internet wide study of the state of adoption of blackholing, and we saw a significant increase of black hole usage across the past two years, so probably if someone would look for the past five to ten years, we would really really see that it increased a lot. Maybe it's a bit due to since DDoS is a hot topic right now and therefore a lot of networks understood that blackholing is there, it's free of charge usually, and that's why a lot of network are using it as of today who probably did not use it too long ago. And since there is a need right now, and we also see a rise of, like, number of prefixes and users and as of today, we see about 400 ASes using blackholing and see about 5K prefixes announced for blackholing per day.
And one thing might be that blackholing is sufficient to get rid of DDoS traffic or attack traffic and it seems to be, or it appears to me that it is sort of a last resort, but if we really want to do better, we might want to really have more fine‑grained blackholing. So, say, get rid of UDP traffic or a specific port and that's what we we learned that actually blackholing is used but we believe in the future it might be even more useful for for operators, for us, to have it way more fine‑grained.
And this was just a lightning talk, so I just tried to give you the high‑level findings but if you really want to understand all the details, next week we are going to present the entire paper at the Internet Measurement Conference in London and then the paper will be available with all the details, and I am happy to take any questions. Thank you.
(Applause)
FLORENCE LAVROFF: Thank you Chris for the detailed insights. Questions?
GERT DORING: If I may. Gert Döring. User of BGP blackholing every now and then. You have seen people announce 1,000 prefixes and announce maybe other people announce prefixes for over half a year. If I do that, I basically burn lots of IP addresses, so I could just return them to the registry if I'm not intending anyone to reach them. So, have you actually tried reaching the most heavy hiters to get a bit more insight on what they are doing and why they are doing it for such a long time.
CHRISTOPH DIETZEL: Yeah. So actually this is quite a tricky one. So I approached people and since the first reaction was like, why do you care about like what we do? Like, why do you look into how we manage networks? And I was like okay, we are using public available data. It's not that we have some sort of hidden vantage point. And one guy was just sort of acknowledging that he configured blackholing, like first time, played around with it, and just sort of forgot about it. So ‑‑
GERT DÖRING: That's sort of the thing I would expect to see, so that's why I'm so curious.
CHRISTOPH DIETZEL: So my answer is basically we didn't do it on large scale since it's hard to really get answers, if you reach out to the community and ask people, like, why do you use blackholing as you use it, but I have a few samples like people from, okay, we forgot about it, or basically we used to get a lot of DDoS attack on this specific prefix and that's why we sort of switched on blackholing, announced it for blackholing and it remained there and at some point we didn't care too much. I mean, whether we want to give this address space back to the RIRs or even to, let's say, RIPE or something, I think that's a different question. Like, at the end of the day, it's hard to really say why they have such long announcements.
GERT DÖRING: Understood. But the thing is really if I am not willing to receive packets on a given IP address for half a year, then why have that IP address configured on anything anyway?
CHRISTOPH DIETZEL: Right. One could argue that, yes.
AUDIENCE SPEAKER: Martin Leavy from CloudFlare. Thank you very much for this as we're heavy using of blackholing communities, I see value in this going forward. But I want to take you back to the registry part of your talk, and my question is: If that effort goes forward, should blackholing be the only community that is built into a registry, although it has an RFC now, or could we, in fact, generalise that because as an Internet community, we actually use, or should use a large amount of community control within our connections to various backbones, and either documenting that or building consistency into that over and above the no export and the other minor ones in the RFCs, would that be useful? ‑‑ well, sorry, I believe that's useful. Therefore, is this effort in any way capable of expanding or bringing in other players in order to increase that registry?
CHRISTOPH DIETZEL: Yes... so, I'm not sure about the registry part, but in general, I am on the same side. I think we should standardise communities because that eases the usage, for instance, CloudFlare or other players, so I really ‑‑ I'm really a big fan of let's standardise the communities the usage of communities and the meaning of communities across networks. Like, part of the research was did was probably not really showed in this presentation, but basically a lot of time for this project we spent on to understand hey, and it's a simple question, hey AS, like, you are offering blackholing, but I need to find out which communities actually signal blackholing. So, right, we spent really months and weeks of understanding which ISP or which AS is using blackholing and announcing, or like which is the community. Even though the standardised community is there, not anyone adopted yet. I believe like there is a ton of topics, let's say latency, or signalling location information with communities, and I really believe that this would make our life a lot easier if we would define and standardise more communities across each user case.
AUDIENCE SPEAKER: I agree with you of course 100%, of course, going back to Barry's talk, using automation tools gets you around some of this complexity although it adds enormous configuration to your things. My closing comment and maybe it's more general is: If something goes ahead like this, it would be really good to expand it over and above just a blackhole community, because this is this is an industry‑wide backbone‑wide requirement. Come find me afterwards, that's my offer.
CHRISTOPH DIETZEL: Maybe ‑‑ yeah, okay, I will see you afterwards. Because ‑‑ okay, I make this comment. You are a user of blackholing, so you are the right one to address. So if we were to offer like, let's say, as an IXP, more fine‑grained blackholing, I mean, I don't know, tell me if you want to get rid of UDP traffic or certain ports, UDP, TCP ports, what would be your preferred way of triggering this? Like, first question might be, are you interested in such a thing at all? And if you are, like, what would be your performed way? And if that might be communities, would you like to see those communities standardised?
MARTIN LEVY: A perfect topic for another talk as opposed to the microphone.
CHRISTOPH DIETZEL: Okay. Ruediger?
RUDIGER VOLK: For many years no new communities have been defined and then there was actually quite a rush, and ‑‑ well, the place to really discuss this is IETF Working Group; actually, I have been looking forward to get more standardised stuff there. On the other hand, I have the opinion that the set of potentially standardised functions is actually fairly limited. The interactions that are signalled with communities between networks in most cases actually are fairly specific, and it does not make a lot of sense even if they are very similar. However, with the large communities around now, we actually have code space where we could do interesting things like any party that gets an AS number can actually kind of do their own registry of suggested standardised communities, and they can be used between any two other parties that like the semantics.
CHRISTOPH DIETZEL: Okay, yeah, but I think that ‑‑ I mean, just because it's possible with large communities and anyone can do it as they please to do it, it still might make sense to use it the same way across different parties, to make just the life for us as users easier, right. I mean, just because anyone ‑‑ the issue is like, of course, I can configure whatever I want, but at some point someone has to look up my configuration ‑‑
RUDIGER VOLK: Well, the thing is, when you are talking and thinking about standardised communities at the moment, you are essentially pointing to communities that are registered by IANA.
CHRISTOPH DIETZEL: Yeah.
RUDIGER VOLK: What keeps you from opening the AS registry for communities that are the repertoire of Euro‑IX?
CHRISTOPH DIETZEL: Okay. Fair point. Okay. Thank you.
FLORENCE LAVROFF: Thanks. So to respect the time of the session, I would recommend that if you have other questions, you take them off line directly with Chris.
(Applause)
All right. And now we move on to one of the latest topics of this agenda, which is the connect update. We wanted to make room at the end of this agenda for any update which are relevant for our Working Group, so if you would have any news, any highlights to share with us about any topics related to interconnection in our great RIPE region at the moment, have your say, come to the mike, anything related to IXP creation or other exciting news, feel free to let us know. If you also have interesting videos that you think are interesting, also just let us know.
So as you can see here on this slide, we already have received a couple of recommendations, some of them were made by the Working Group Connect mailing list. So as you can see here, we have the secret of depeering that was also presented a couple of times in this Working Group a number of years ago. Just to continue with my list of examples here, interconnection agreements at sale, secret or simple by Marty Hannigan and a couple of other usual suspects. They are all named here.
Another white paper/article that was recommended to us is the future of peering: Opportunities and obstacles; and of course we have here a video from MENOG 17 that was also recommended to us by some of the members when we comprised the agenda for this Working Group. So if there is anything that you wish to see added to this list, please let us know, come to the mic.
If you don't have anything for us now, that's also fine. We can, of course, have that conversation later on on the Connect Working Group mailing list, and I hope that we will have more feedback for next time so that we can do this as a recurring thing for the next Working Group session. If, of course, this is still Remco and I presenting next time.
Two last points that I would like to mention for this update. Tonight is voting time, so please have your say.
And also, don't forget to rate the presentations. That's always very useful for us to have your feedback about what you think is useful and what you think is less useful for this Connect Working Group.
And that's it. If you don't have anything to add to that, then I will give back the mic to Remco for general feedback and for closure.
REMCO VAN MOOK: Thank you Florence. So coming to the end of this session, as we are still pretty new at this, as always we'd very much like to hear your opinion about what you thought about today's session, what direction you'd like us to take with this Working Group, what kind of presentations you'd like to see, more, what kind of presentations you'd like to see less off, anything like that, please give us a shout now or talk to us in the hallways or send us an e‑mail.
Any feedback right now? I don't see anyone running up. I do see people walking. Oh dear!
MARTIN LEVY: I would like to commend the co‑chairs forth lack of IXP updates within this session.
REMCO VAN MOOK: I'll take that as a compliment, thank you. Anyone else? Well, then I'd like to close with thanking our lovely scribe Rumy and our chat monitor Sandra for doing a splendid job once again. And with that, I'll release you to lunch. Thank you for attending.
(Lunch break)
LIVE CAPTIONING BY
MARY McKEON, RMR, CRR, CBC
DUBLIN, IRELAND.