Anti‑Abuse Working Group
25th October 2017
At 11 a.m.:
BRIAN NISBET: Hello. How are we all this morning? If people could take their seats. And if Peter and Rudiger could sit down and stop talking to each other.
Hello, good morning. Welcome to the super edgy and exciting RIPE 75 version of the Anti‑Abuse Working Group I am your co‑chair, Brian Nisbet. Tobias unfortunately, after making travel plans and had some stuff happen and can't be with us this week. But he is ‑‑ Tobias next is the other co‑chair of the Working Group.
So we have quite a bit to get through today, so hence we are starting. Taking you all away from the absolutely gorgeous food outside, I have got to say the standard of catering has been wonderful.
So, first off, the session is being recorded, the session is being stenoed, will have minutes, everything you say will be and can be taken down in evidence against you, etc., and thank you indeed to the RIPE NCC, scribe and minute taker and our wonderful stenographers without whom I would have no idea what I say every meeting.
The remote participants who may get involved, if you do wish to say something in the microphone, please remember to state your name and some suitably comical or made‑up affiliation.
So, what else? We have minutes from RIPE 74. They were sent to the list, they weren't any statements from the list, so at this point I am assuming it's all good. This is your last chance to make any comments about that. And no one is rushing to the microphone, so I am going to deem the minutes to be approved.
We had one addition to the agenda, which is from Jan Zorz because BCOP are looking at doing some things about mail servers and IPv6 and anti‑spam and things like that, he will give a very brief introduction to that at the end. Are there any additions, questions, changes, etc., people want to make in regards to the agenda? Nope. Okay, great. That's that then.
So I have ten minutes here to talk about recent list discussion and recent list discussion on the anti‑abuse mailing list is a very variable thing. Sometimes there is nothing, sometimes there is lots. I don't particularly have anything I really want to reference here, I mean you can all see the list the same as I can. Obviously, there is conversation ARIN the policy proposal 2017‑02 which we are about to have a presentation and discussion on. There were a few interesting links sent on and a few bits of discussion but there is nothing I particularly want to reference. With a I will say at this point, and I think this is important, given the year we have had, so to speak, that the RIPE community code of conduct has been extended to the whole community, not just the meetings. That includes the mailing lists. Some of the content on the mailing list in recent times was not perhaps the best kind of content. It was neither a pleasure nor a privilege to have to deal with that, but I think what we did in relation to removing two people from the mailing list was required. And I think had the code of conduct been extended to the mailing list at that point in time it would have been a clear action to take. Equally, nobody seemed to object to those actions and in fact, I got substantial positive feedback on those actions, so I am going to consider that the Working Group agreed with me.
Are there any other pieces of recent list conversation that anyone wishes to talk about here or raise or discuss? Nope. Cool. Great.
We are getting loads of time back on this.
So, the next piece up is a presentation from Greg and Herze on the live policy proposal which is 2017‑05.
SPEAKER: So hello, so perhaps we will very shortly introduce ourselves, I am Herve Clement from Orange, particularly for the ‑‑ very concerned about the completeness or reliability of database and registries.
GREGORY MOUNIER: Good morning everybody. My name is Greg Mounier, I work for the European cybercrime centre at I don't remember owe poll and some of you know me a little bit I have been presenting a couple of times how using RIPE database to support their investigations and we are very happy today to talk to you about a recent proposal for policy change that we have submitted to the mailing list.
HERVE CLEMENT: Okay. So we are here as Greg recalled, for the presentation of the proposer so we have made, so in the frame of the Anti‑Abuse Working Group, this is a proposal, so 2017‑02, regular abuse‑c validation and so we will just explain what it is about in the context, what is the goal of this proposal. Great.
So, why this proposal and in which context this proposal take place. So generally, we, as RIPE community, so we are committed to improve trust and safety in the IP space. So in general, so it's something I recall just previously. So, some are responsible and we try have an accurate and very detailed information in the RIPE database. Why? Because it creates a trusted environment for networks operators, for instance. It can help the Internet to function at all levels, and help us attribute many activities.
And including that, but the point, we have to be caution, we have to have also valid contact points for a certain number of issues because we have to ensure security and reliability of our network, of course. We have to ensure accountability of IP resource orders. We ‑‑ it's another thing is to ensure public can resolve abuse practice, so if, there there is some of the practice to experience so it's something if they have to resolve they can resolve and to ensure effectiveness of existing abuse reporting systems.
How did we get there? So because there was a situation, so in 2012 there was expectation of public policy proposal which was a proposal 2011‑06, as I can remember, and so it was a source of the question of the RIPE 563 document, who say that it was mandatory to have abuse‑c contact attribute. So this is contact information for automatic and manual reports of abusive reports. And the abuse‑c is part of RIPE community accountability model.
But to tell views, there is a lack in this document and decision because, so it's great that this filter is created but floss validation mechanism so it can create out of date and inaccurate and the existence of no valid contact point. That, in our views, undermines the objectives of RIPE‑563. And RIPE NCC also experiences hundreds of reports of invalid contact information, so I think regularly. And even if you read so they achieved 2011‑06 proposal, you can pattern that is in the end, that data accuracy issues can be, will be, one day, tackled in separate policy proposal, so this policy proposal, we are presenting to you. So this is the 2017‑02 proposal. The objective is to complement so the document RIPE‑563. As data accuracy issues are not in the scope of RIPE‑563.
So what are the objectives the proposal? So increase technical reach the of abuse‑c contact. Is to reduce the likelihood of unresponsive abuse‑c contact. And thousand do that? Is to mandate RIPE NCC to validate this, the abuse mailbox, which are the part of the field on the attribute, at least once a year, and to mandate RIPE NCC to follow up with resource holder in case contact found invalid and we will explain a little, and Greg will explain little, and resolve the issue in the most flexible manner. It's very last report in the frame of non‑cooperation, not answer so the RIPE NCC at the very end can trigger the RIPE 676 which is related to the discloser of numbers and there is ‑‑ of resources. And just because I was talking about ‑‑ it's something, we were aware, we have been aware since long time no, so even in the remarks before the creation of the attribute C, so there is a present of abuse mail and so designed for security problems and mail.
GREGORY MOUNIER: So the point, of course, of the presentation, of the policy proposal is indeed to validate these abuse at Orange, for those who are not familiar with the RIPE database. So we launched the proposal on the mailing list approximate approximate four or five weeks ago and we had lively discussions so thanks to all of you who contributed,Peter, Malcolm and the rest, it was really interesting discussion. So basically you had, we have tried to summarise a little bit and to group the various comments we got, so the first one was really, I would say it's fairly standard, we get that fairly often in a community, it's too costly and bureaucratic, too aggressive too much regulations, we have too much, so okay. Well points taken. Nonetheless, there is indeed RIPE's 563 and it feels that we need to complement it.
I do what I want with my resources, registrations is property theft, that was on eye‑opener for the way of thinking for some part of the community for me really coming from the public sector. There was an interesting comment as well. We got some constructive feedback, what about looking for alternative, maybe there is a policy which is already in place and procedure which is already in place, for instance, we have got the assisted registry check ARC, so Andrei is at the back of the room, and so we talked to RIPE NCC, we looked into the procedure and I will come back to it a bit later. Another interesting point made by Malcolm was the auto response, is that a valid response from RIPE perspective? We look into it as well. What will be the concrete procedure to validate? Again that is a very valued question. And what about the legacy resource holders?
So I will just go through a number of those. The two first points, I think that is more difficult than anything, looking at the alternative, for ARC, so ARC, the point of ARC is really to help the members to strengthen registry data and covers four areas, it doesn't really cover abuse‑c contact and also in terms of figures, ARC reviews are conducted and carried out for about 25 or 30% of the resources per year, that is in the objective for 20 /18 so it doesn't cover the whole membership every year. So we think and in cooperation and discussion with the RIPE NCC it feels that ARC is not designed for validation of abuse‑c contact.
This then in terms of procedure, we wanted to have a really light approach so we think that the RIPE NCC has the ‑‑ is best placed and qualified to assess the type of resources, financial human that would be necessary to do this type of validation process. So we thought it was for them to come up with a procedure that would match the interests of the community but also match their resources. So they are going to come up impact and analysis in four weeks and we would expect they would come up with the detailed procedure that would be fit for the community. Nonetheless there are a number of principles we would like to see included in this procedure but again this is open for discussion, so we would like to propose some steps that should be followed. So the first one could be you validate the abuse mailbox at least once a year by e‑mail or at least there should be some human interaction saying I am monitoring this e‑mail address. If no response within 14 days subject to discussion with the community, include bounce back and automatic response, RIPE NCC would consider that as unresponsive and it would be marked as invalid. We open dialogue, so RIPE NCC will engage with LIRs or the resource holder and require to correct these really dialogue, let's look into the issue, it might be a technical problem, it might just be a lack of dedication or anything. Then for us it seems that there has to be a deadline because you can't go on for a deadline with RIPE NCC and waste resources for two years and at the end no, no result, to be discussed, two months, three months, if there is an obvious unwillingness to cooperate and to resolve the issue then RIPE NCC would be mandated to trigger RIPE 676. So that is really our main procedure and steps.
If we look at auto response, this someone probably one of the most thorny issues. It would be valid, we think, if it leads to human interactions. Someone who says yes, I have received your e‑mail RIPE NCC and this is being monitored, that is good for us. If this is a system of ticket, for instance, yes, we have received your e‑mail, your request has been logged and here is a ticket, why not? It could be potentially valid, if it leads again to human interaction within 14 days, if somebody follows up and say your ticket was logged and we looked into the issue, and, yes, RIPE, it is valid, it is being monitored. What about a form? If you have to fill in a form, what is your problem? Let me know and so on. That is for RIPE NCC to decide: Do they want to hire people to fill in a form, why not? If they have got the resources, no problem. My gut feeling would say let's say that for the abuse‑c there wouldn't be any form, you have to send back an e‑mail to say yes, I have seen your e‑mail and I am monitoring it. That is only my personal opinion. Legacy resource holders they would of course not be directly impacted but our assumption is if you are a legacy resource holder you are also committed like any other members of the community to the same objective of safety accountability and trust in the IP space, therefore you would establish your IP ‑‑ abuse contact and you would monitor it.
That is the only answer we could come back with.
Version 2, we are waiting IPv4 for your feedback, looking forward to the discussion on mailing list. We will work on a new version taking into consideration your feedback. We will finalise version 2 and then RIPE will create an impact analysis on the basis of that version and will kick off the review phase. In conclusion, speak out, have you ever been frustrated because you couldn't reach somebody because you didn't' have valid abuse‑c, are you also frustrated because of the ‑‑ free rider attitudes. You enter ‑‑ some people are not ready to clean up or take the necessary steps to actually fight abuse. I think that is frustrating. Do you think this proposal is commonsense, that it won't resolve anything? But it's a step in the right direction. If so, contribute to the conversation and send an e‑mail to Anti‑Abuse Working Group mailing list. Thank you very much.
BRIAN NISBET: So I am pretty much going to close the lines now. Wave limited amount of time. I will also, because we don't do a lot of policy here, we will remind people that the concrete discussion takes place on the mailing list, input, absolutely, please, discussion. But the mailing list is the location where the policy is discussed. So we are going to start with Jan.
JAN ZORZ: Speaking as a random guy from the community. I like the idea of automatic responder. I like the idea that we would give the RIPE NCC the mandates to be able to validate. I am not sure about the attack on everybody every year. I think that, you know, if there is no problem, why creating more work? Why we don't go and say if there is a complaint, nobody is responding from there, then we give the RIPE NCC the mandate doing and actually check and start the procedure, because probably, I don't know, over 90% of the contacts are pretty much okay. I am speculating. But, you know, I would say let's take action if there is an actual problem. Thank you.
RUEDIGER VOLK: Deutsche Telekom. Looking at the stuff right now I found points I did not discuss before the official discussion started with Greg. Well, okay, first off, I repeat my old point that the most helpful thing for actually guiding people to put in useful abuse‑c would be to get some guidelines document out of this Working Group, it could actually produce something.
BRIAN NISBET: I have got stop you right there, we have answered this question on more than one occasion. We have sent things to the mailing list. I have asked you if what we have sent is useful and you have given no response. So, if you keep on saying that, you have got to ‑‑ you have got to give some reprocicity.
RUEDIGER VOLK: I have not seen anything useful that I could use in my context.
BRIAN NISBET: Okay.
RUEDIGER VOLK: Just go beyond that. Looking at the presentation text right now, I feel, considering that the RIPE NCC, anyway, has the task and the goal of keeping the registry current and correct, there is no formal need for mandating that with anything special by a policy. The other end of that comment is, if you want to have the closure process invoked you cannot do that by policy because that is interfering with the SS A. Well okay, kind of, if you want to ‑‑ is the closure document RIPE or RIPE NCC? And it would have doing in there.
NIGEL TITLEY: Very quick answer is, if this becomes a policy, and and people are in violation of the policy then the closure procedure can be invoked automatically through the SS A.
RUEDIGER VOLK: I see.
AUDIENCE SPEAKER: Peter, speaking for myself, just three things I missed the financial impact slide. Was it there?
GREGORY MOUNIER: No.
AUDIENCE SPEAKER: Why?
GREGORY MOUNIER: Because we thought RIPE NCC would come back with the financial aspect and the impact analysis.
Herve: It will be in the next phase
AUDIENCE SPEAKER: Did you guys attend yesterday's Jennings meeting.
AUDIENCE SPEAKER: Because a lot of money we have and we are starting a lot more projects then we had accommodate, at least I understand that way. So is there voices from the community ‑‑ from the members of that association that we are going to be fat with projects, that is another one, and I think that members will be against that, to spend more money, to increase the ‑‑ probably increase the staff members or full‑time, and to increase the fee or whatever. So ‑‑ that was the second thing. I don't expect that you answer the question. But the third thing is that if that is going to pass I am waiting for the first robot who will automotise, people will start sharing the ways the NCC is going to chase LIRs and start exchanging things and they produce or kind of automation looks like a human response, just to fool you guys. Thank you.
ANDREW DE LA HAYE: RIPE NCC. On the impact analysis and the question from Peter, so we will deliver an impact analysis but many of you know what happened with 2007‑01, proposal on PI holders where there was a mix between policy and pricing of what it would cost. What we have requested the Board to do is the following process: The impact analysis will shed some light on the spectrum of options we would have if this policy would go through, that would be step one in the impact analysis. We will not dive into details on how much the cost will be. If the policy would reach consensus in this Working Group we will go into detail and do some trial runs even to see how big the issue is and then we will bring forward to the RIPE membership the price tags of the different option we see suited. Then we can separate the policy discussion from the RIPE NCC membership pricing discussion. That was our proposal to make sure that we can move forward without having all these details in place.
ERIC BAIS: I think that the ARC process that you currently discarded is the right process for this and I say that because the RIPE NCC already has enough issues chasing the LIRs to do an ARC and they are doing that typically once every three to four years, they are touching for each LIR and some they have to keep chasing for the ARCs as well. I ‑‑ adding that strongly against the closure part in this whole policy, strongly against it, based on that.
Herve: Of course, we have spoken about ARC, in the discussion phase we have the same issues and the decision with RIPE NCC so we aired that so ‑‑ the process is pretty like light process and so if we have the opportunity to check abuse, abuse‑c or information in one year so it will be more constructive to make that way.
BRIAN NISBET: For for a comment from the NCC.
AUDIENCE SPEAKER: About ARC, what I wanted to add to this, indeed it would be very difficult if we were to have to contact to do the verification on a yearly basis to do this with ARC. I think we all agree on that point. We wouldn't be able to contact all the LIRs as we are standing currently. But we will be able to do and I think there is the bulk of the work would be in this policy proposal to follow up on the LIRs or organisations that do not have correct and valid abuse‑c e‑mail address and we can partially use ARC to prioritise those LIRs and put them at the top of the list for ARC. We do not know the numbers yet so ARC may indeed not be enough to fulfil this but a part of the work could be done by prioritising those organisations at the top of the ARC list. It's a bit of a mix. ARC can help but it's not the full solution.
RICHARD HESSLER: Random Internet citizen. Extremely against this, I feel it's a waste of resources both for the resource holders, I feel it's a waste of resources for people sending in abuses and for the RIPE NCC, those who care to respond to abuse requests already do and don't need a policy to do this. Those who don't care will either, as said earlier, share ways to get around this, they will develop their own techniques because cheaper to spend 40 hours development than this band customer service time dealing with this sort of thing and I don't see this actually making any valuable improvement on the abuse process, and I am extremely upset at the death penalty clause of this. Taking away our resources is death.
AUDIENCE SPEAKER: Alexander from Russia, which recognised by somebody as a police State. I am really sad about such policies getting into discussion, because when I first came to RIPE meetings and asked this guy RIPE NCC chairman how RIPE and RIPE NCC enforces policies, you say no, but we are not enforcing something, we are not not law enforcement agency, community organisations itself. So I am really sad that such enforcement policy came out of law enforcement agency, it's really sad. And also, as doing discovery on RIPE account thes I would like to remind Gregory, who is task force member, not very active that RIPE is not about helping law enforcement agencies, RIPE is about IP connectivity. All database is a tool for supporting right IP connectivity and not finding criminals. Abuse contacts for me mostly as network engineer, I need to understand like contacts to improve IP connectivity, to resolve, user, whatever else, not to help law enforcement. Please don't get to our community with your tasks. Thank you.
BRIAN NISBET: I would like to remind everyone, law enforcement are part of our community. Everybody is part of our community. So ‑‑
GREGORY MOUNIER: Very briefly. If by mentoring an e‑mail address you think that we can investigate and find criminals, then really, we need to talk because that is really not ‑‑ that wouldn't really help the investigators. This is good for the community so that there is one point where you can peek to LIRs, that is all we want to do.
AUDIENCE SPEAKER: William Sylvester. Just to clarify, this policy excludes legacy and uncontracted space?
BRIAN NISBET: Yes because we have no ability to impose policy on legacy address space.
WILLIAM SYLVESTER: That said, I don't support any policy that would reclaim any space. I would maybe offer an alternative of blocking, which is something that has been used in the past very effectively, so it's unable to be updated until somebody comes back in and revalidates. That is my two cents.
BRIAN NISBET: That raises an interesting point, actually, because it's been said a couple of times and this is something for the community to consider, not specifically to this. If people are going we don't support a policy which may lead potentially to the revocation of resources, that for my mind, purely speaking as me, put the community in a very interesting position in regards to what we or may not be able to do in the future and the policies which currently exist which can lead to closure of LIR and revocation of resources. It is a very general comment, that is an interesting thing because it was referenced during the GM last time as well.
AUDIENCE SPEAKER: Jordi.
BRIAN NISBET: Any response about that comment.
GREGORY MOUNIER: If it were me I would also include the legacy holder but I have been told no, no, no.
AUDIENCE SPEAKER: I don't support including legacy, that was my point.
RUEDIGER VOLK: But the NCC mandate for taking care of corrected accurate data still remains on the legacy.
BRIAN NISBET: Yes. Yes.
AUDIENCE SPEAKER: I am in favour of this proposal, I don't see it as law enforcement coming from proposal. I think it's good for all the community. I am getting about my personal experience is, I am needing to report about 10 or 12 attempts to abuse my network either by spam or by border scanning or many other attempts, every day, as a small company, this is really bad for us because we don't have resources and I will strongly suggest to include that a form is not allowed as a way to report the abuse, because if I need to fill a form for every abuse attempt, that's going to consume my time so I think we should mandate that people just have an auto responder telling if you have not included the logs or whatever, do that, otherwise we don't follow your ticket, right? That is fine. But if you just send an e‑mail it should be answered in a certain number of days. Otherwise, it's wasting the time. Because if I am an ISP or an operator, for me it's very easy to make an automated system that from the e‑mail comes into my internal forum but asking that to every possible community member for every other possible community member is not possible, it's just impossible. Thank you.
BRIAN NISBET: Thank you. So, any closing comments? No. As I said, we are now at the point the impact point, so feedback given, mail was sent so a bit more chance for feedback and version 2 of the policy will be sent to the mailing list for further discussion at that point in time. Cool. Thank you very much.
Well that neatly used up all the remaining spare time we had. So very briefly, to make everyone aware, database Working Group has been doing some work on abuse‑c and abuse mailboxes. I am not going into this here because it's a work item which is ‑‑ I think the NCC will be doing the last piece of work on Monday or Tuesday of next week under work item 7 in database. I would encourage you to look at the conversations on the database mailing list and indeed attend the Database Working Group session if you want to discuss that later. I am not going doing into it here because it's a piece of work on the database, it's worth referencing though because it's about the abuse‑c.
So, we have two presentations which may well lead to discussion, next, and the first one is from Alireza Vaziri on NetFlow based botnet detection.
ALIREZA VAZIRI: Good morning everyone. I am going to talk about ‑‑ can I have the slide? I am going to talk about NetFlow based botnet detection in next 25 minutes. Background, my name is Alireza Vaziri, I have been working in network for more than ten years. I have started to do a career job as security administrator last year and actually I am trying to mix machine learning and security as new field for detecting botnets.
So today we are going to start talking about botnets, what are the botnets, what are the usage, history. After that we are going to talk about modern botnets, botnet detection and the temperatures. NetFlow based botnet detection and we have a sneak peak over the machine learning techniques to detect botnets.
All right. What is the Bot, the Bot is a single infected machine or the device smartphone in the network because of vulnerability or unpatched software or unattending or hard coding user name and password it has been infected by malware. So the botnet is consists of BoTs so we use it to send spasms, attacking victims and of course malware distribution.
As a history, these names have infected more than millions of computers and machines over the recent years. So you see the map is the map of botnet infection that has been infected, 500,000 devices last year in Europe.
So, for the dictionary we have about the, the device that has been infected by malware, we have CnC server which is the serve that every Bot is connecting to and we have Bot master who owns all the BoTs, so commands the CnC and commands the Bots to do an action. This is a simple so the red circles are the BoTs so they are all connected to the CnC serverer and in the hand of bot master, this is the modern botnet, so you see in the previous slide cut down or take down the previous address of the CnC so the whole network of Bot is going down but modern BoTs P2P connectivity. They have two roles, basically each of the Bots playing just like as a major role of CnC and the Bot, so there is no single point of failure. If you take down the one CnC so another pops up.
So modern use P2P communication, there is no single point of failure, use encryption, they use randomness to stop detecting you the pattern in the behaviour and obfuscation. So when CnC commands a Bot, it says, for example, hello, which might mean attack the victim. So you couldn't guess what is the obfuscated message.
This is the Bot lifecycle, from left it starts with infection, so the malware infect the client and the infected client started to try to reach the CnC server, so start to tell the CnC, okay, I am here, I am ready to do your commands and start to listen. After listening for a command, the Bot receive a command from the Bot master or the CnC server, it retrieve the payload, it might be a spam message, malware distribution, a target to attack as DDOS. So it started to execute the payload or the command. After doing such a thing, start to report it back to the CnC. This is the whole life cycle. So what are the current methods: Using IDP S, deep packet inspection, using signature based and amomly based detection on Bots. As long as you have a signature of the Bot, you are good. You could find them. But what if it changed its signature every time, distribute itself.
All right. Dealing with Bots, we are dealing with two types of Bots, one of them is external, that attacking our infrastructure. Another type is internal, which reside in our network and attacking others. Today I am going to discuss about internal Bots, the Bot that reside in your network and attacking others. All right.
So what about NetFlow, we have NetFlow, sFlow, IPFIX, we have every kind of them and most of our devices so we could use them to actually record the flows and analyse them later on. In comparison with the DPI, DPI is a resource I agree for a gigabit network you need to pay more than 10,000 dollars to buy a DPI devices, appliance, but NetFlow, every switch, every router, most of the firewalls support net flows. All right. After net flows, we have blacklists. So we have different, dozens of blacklists, public, that are free to use, like ISC, C Y M RU, Spamhaus, you could use them as a blacklist of CnC servers.
So up to now, we have NetFlow, we have blacklist, so we capture the traffic from the switch on a NetFlow, we checked the destination IP address with blacklist to check if the inside client is trying to connect to the CnC server, easy? But the problem is, the data is big, and network with a gigabit of speed generates millions of records per hour so how could we handle that? I chose ELK as the search engine for the flows. Basically, it has three key components, elastic search, long stash and Kebana, log is a log and data collector and the Kebana is basically the data visualiser. ELK is a open source, could handle millions of records easily and could scale pretty well. You could deploy multiple instances with cluster in your network. So, this is the diagram. So we capture the traffic from the Net flow and put them into log stash and that puts them, the records into elastic search and Kebana queried the elastic search for the data visualisation.
Let's do some log stash filtering. We had a blacklist IPs, so we stored them as a dictionary for the key end value, the key is address, the Val sue type of malware. So we mark malicious traffic and we do some geoIP translation. This is the log stash diagram. So we capture the flows, we add extra information with the geoIP database, after that we mark malicious traffic, we mark the client that are trying to connect to the CnC server. This is the result. So, as you see, these are the malicious traffic that has been visualised by Kebana. The peak that you see, this is a corporate network so the personnel start their computers at 8:00 in the morning so the Bot start to trying to connect to the CnC. So we know the type of traffic. This is the activity and we know the malicious flows. So let's do some stuff with machine learning.
We wanted to find similar flows that we didn't have the blacklisted IP address so the IP address in the blacklists are updated every week or every day, so we don't have some of the CnC IP address. I use supervised machine learning which you tell the machine, this is the target, this is the features, this is the criteria of malicious flows. And this is the malware. So, I tried to inject infected flows as data for the training and the test into the algorithm. And after that, I have tried to classify flows based on the model that I have trained. Okay. What are are the feature for machine learning, source IP address, of course, and internal client that have tried to connect to the CnCs infected. So destination port range, number of packet that has been transferred in the flows, duration, and of course AS number. What are the targets? I told the the packet or the flows with this criteria is for any other malicious activity.
To reduce the chance of false positive, I have added trusted flows. This is the flows that flows of DNS, this is the flows of http, this is the flows for https, this looks like trusted. I have used Scikit, this is a machine learning library in Python it's easy to use, you usual the data from elastic search and train the model and after that you push the data you don't know if it's infected or not, it tells you that if the data or the flows is infected or not. This is a case study for Zeus, use UDP flows to talk to the CnC servers. As you could see, I have trained the model with different data sets, the data set number is about the flows, so 60,000, 80,000 and 100,000 flows, and you can see that I got accuracy up to 89.3. This shows that more data bits better algorithm. I use K N N if you are familiar with machine learning. And why is not 100%? First, as you may know, net flows, the flows are unidirectional, so it's better to find by directional way or related flows to the one flow. Second, flows are not classified into life sick he will of Bot. Basically, I didn't know if this is a retrieving the payload, if one reporting to the CnC, if this one is telling the CnC I am done, if this one is telling the CnC I am ready, so you have to classify the Bot lifecycle with each of the flows.
So we have time out ‑‑ different time outs in network, issues of speed and bandwidth so the flows might be different in every kind of situation and we have different version of Zeus.
This is the final diagram. So the Net flow push the NAT into log stash and it does some filtering and filters and marks some malicious traffic. It pushed to elastic search, query the elastic search for the infected flow to train itself and check the same source IP address of infected machine for the similar flows. If it could find the new CnC IP addresses it updates the blacklist.
All right. This is the diagram for the AS traffic. So, as you might know, network with the gigabit of speed generates millions of records. So on my network you see the green part is the traffic to Google, to ‑‑ Google AS number. So I started to remove them to reduce the chance of false positives and also increase the performance so we know we trust that Google doesn't host any CnC server. And to do, I am going to try find related flows and trying make bi‑directional flows. Also I am trying to have ASN and prefix reputation system so basically, if an AS hosting IP address of recent CnC server, so the chance to be blacklisted, new blacklisted IP to be found on AS is higher, and to the action for the detected Bot. Do we have to sinkhole, to blackhole the traffic from the detected Bot? So NetFlow is cheap and handy. Machine learning is amazing. I think machine learning is the tool that will rescue us from eternal threats. This is the repository that I have ‑‑ I am going to push the code, actually there is no stable release yet, but send me requests, thank you, any comment or question?
BRIAN NISBET: Thank you very much.
I worry that machine learning may be the solution at the moment until the A Is and the machine learning becomes the threat. But that is a whole different thing. Are there any comments, questions, etc.? No. Okay. Fair enough, thank you very much.
So our second presentation is from Eric Bais talking about obviously, as you may have heard, we have less IP address than we used to have or at least free, so what happens to that address space which is used badly and then becomes available for better, good people with clean souls to use, and how can that be cleaned and reused.
ERIC BAIS: Thank you very much. I am going too far presentation about how to clean up IP space, one of the things that how we got to this is, I got involved at some point in the take‑down of the GRUM bot, that was quite a large Bot specifically around, did a lot of spam, turned out to be 18% global botnet) so I will take you a bit back about the history of how we got here and then how we used those same principles for the grim Bells botnet in how to clean up the dirtiest IP space that we could find on the market.
I think it was somewhere around July 2012 when Fire‑i accomplished a couple of IP addresses specifically around where those control servers were located, there were a couple of IPs in Netherlands, Panama, in Ukraine. I used the ‑‑ I contacted the owner of the hosting company that actually had some of the command and control servers in the Netherlands and I said, well, look at the publication there, your name and ‑‑ your company name is listed, IPs are listed, you might want to shut this down because they are going after the command and control servers here. So, and actually interesting, because he reported back to me and said well, we no routed IPs and if you want to have a look at them, come and see me. I reported back to Fire‑i at that point and they want after together with the guys from Spamhaus after the guys in Panama and Ukraine so to basically within two or three days the whole botnet command and control infrastructure was down and then I thought, well, let's keep him on his promise and see if we can get a hold of the IP space that were used for this particular botnet. And since our cabinets were in one of the data centres next to each other I suggested can I have a cable in the data centre and basically have those IPs, he said I have a better idea, the server is still up. So, let's see how this goes. So this is a result of the message from SIM an particular labs on the e‑mails that stopped after three days and the zombies and itself didn't have anything to do any more. So, and this is basically 18% of global spam stopping suddenly after the command and control servers were all shutdown because they basically didn't get any action any more. So we had the server down, the IPs were available and we thought that is actually interesting, why don't I go to the data centre and make this copy of the command and control server, make another just for safekeeping, and you never know where it's being handy for, do a clean install, make a secure sinkhole.
The actual disks it self are used to upload into a ‑‑ we find 350 gigabytes of uncompressed mail lists running on that server, so you can imagine how hard it is to actually publish 18% global spam if you look at the amount of e‑mails that were found on that specific server, I can imagine how they did it.
So, the issue if you actually half shut do you know command and control server, is the dormant IP address and zombies and infected PCs that you have, which are still infected. And the question then becomes: How do you clean all those PCs? Some malware actually have an option in them and say, well, it's a kill switch or kill button and you can basically, if you invoke it it will uninstall. From a researcher perspective, that actually brings the onus or the risk if something was not done correctly to you because you actually invoked that command or you installed an update in malware that basically said you know uninstall the software or don't do anything again. But it doesn't fix the issue and the owner still does not know what actually happened with his PCs. So, we opted for the option to basically report back through the ISPs connection to the command and control server for each and every connection they did within 24 hours. So this is what it looks like, compared to countries. So there was a lot of infections in Mexico, United States, Argentina, Spain, Brazil, Germany, and this was ‑‑ all these computers were basically reporting back to those IP addresses. This is a map of basically the same where all the infections came from.
So once we had access to the command and control server IPs we are together with the guys from the ISC SAN S and shadow server for building the right infrastructure, do we need to send out all those e‑mail depresses and set up complete infrastructure and it appears that they were more than happy to actually include the feed in their reports that they were already sending out. So instead of having a requirement of building your own infrastructure for reporting abuse, shadow server would be more than happy to take your fees and include it in their reports because it adds (feeds) value to their reports, and a lot of ISPs are already parsing the data.
So one of the things that we thought, you know, we have the feed set up, we know exactly what is going on here, a lot of reports went out and you would expect some clean‑up somewhere, and we did see some improvements and numbers dropping but not a lot. And that was when we realised that shadow server is opting in required for the receiving the reports, so you only reached the parties that actually want to receive the report and parse the reports. So then we actually went to, I had a discussion with Tobias and he suggest the following: Their reporting system would actually take the feed as well and they report per hour, not per day, so you have a quicker follow‑up and that means that if somebody has an infected PC online you will get 24 e‑mails for the same PC per day, so that has a bit more impact. And it was not opt‑in, basically we will take the abuse addresses because everybody ‑‑ all research ‑‑ resources in the databases have ‑‑ if it's a valid abuse it's something else but at least there is an abuse to be found, so they ‑‑ and they are sending out reports every hour to ‑‑ about those customers. So this is where we came in September, so we started this whole project in July, you know, setting up the whole infrastructure in August and in September we had the first reports. We had initially about, between 125 and 140,000 unique IPs per day, this was by far one of the largest spamming botnets worldwide. You see some decline over days. You see the unique ASNs dropping from 2,500 to 2100, and this is what you have in November when we started to use the system, and you actually see things progress pretty quickly, we are now at 70,000 unique IPs per day, this took almost about nine months actually get up to an optimum where we had 25,000 connections per day and this is basically where you have the ‑‑ returns. Those people after nine months of getting reports if they don't do it by then they will never do it anyway.
One of the interesting things that we found with Level 3 and their abuse task is they parse those messages. That is not an issue. The only issue was, they parsed the messages incorrectly because the sinkhole was actually in the message itself and they thought that the hoster that was hosting the sinkhole was the cause of the issue and they were receiving a lot of e‑mails for their customers. They were only parsing them to the wrong customer. So that took a couple of days to understand their issue and fix the reporting and the lesson learned we found out to remove the sinkholes from the reports so they could parse it to the correct IP address to the other customers that were actually having the issue.
So, a huge thanks to shadow server and Internet storm centre and also abuse IX for helping on this. And once we had that covered, we came into the option of, well, we have some ‑‑ we came across the IP space from a Dutch bulletproof hoster, he was at that point just being released by the Dutch police, and he was planning to sell his IP space. So it looked like a proper challenge to get the IP space usable again because the other option he might have sold it to his body somewhere and they were trying to get their command and control servers back, so we thought might as well buy the IP space, clean it up and get it to some good use. The IP space was blacklisted for several years, for known abuse, it was 72 times on SBL, it was on DR O P, it was the worst of the worst. That was just on Spamhaus. But there were other RBLs as well.
What we did as new approach is we got full owndership of the LIR, we basically did merge and acquisition of the LIR at that time, to one of our companies. Changed all the references in the database from the previous holder, built a new sinkhole, starting routing IPs to the sinkhole and then see what we find because we didn't even know how bad it would be. So, let's see if we can get lucky.
And this is what we found. We found 12 command and control servers, GRUM bot zombies and we had some experience with that so that was easy to recognise. Site Dell, black energy, fake AV, it was a nice combination of everything and it was by far the ‑‑ some really bad experiences that were happening there. So, we were happy, and we thought, this was expected. And we said, now the question is, is anybody actually null routing this, using shadow server drop or the Spamhaus drop? Because can we actually use the space, is it actually routable, and we used RIPE Atlas to do some tests on that. We found one or two ASes that were actually filtering on it but that was probably more to their personal experience with the hoster than to other reasons because once we actually explained what we were doing, they were more than happy to remove the null routes to the whole prefix and everything got fine again. And obviously you have the R B R owners that need to delist the prefix and all the entries in there. That was the most challenging part. Because the RBLs actually said, well, can you explain a bit how ‑‑ what are you doing here? Because we have this listed under this company and you know we basically don't even talk talk to ‑‑ any more. So you basically buy a car as being used for some bad activities in the past, you own the car with the same registration number and then you say, I am going to drive here but the police keep stopping you over, what is going on. And you cannot be held accountable for what happened in the past because you are the new owner, you have nothing to do with that. So, you need to explain a little bit with more logic and show them what we were doing, show the initial results of the sinkhole and kindly explain we can't hold hostage or accountable for what anybody else said. And still some said, you know, I see the logic, I still hate you for it. It was fine. We fixed in the end all the entries, we got it usable again. We sold the IP space actually almost before we actually bought it because the owner that was ‑‑ the new company already new what we were working on, and he knew that it would take sometime and they had time before they actually needed the space. So, it gave them to provide payments in installments over periods of time, that was convenient for them. It gave us the time to actually clean up the process and actually get the experience with it. So it was an interesting option and it was very nice to see for the shadow server and abuse IX to get some additional feeds. So they were actually quite happy with that. If you ever run into a situation where you have Bots and you actually can have the ‑‑ botnets or control over IPs for known command and control servers put sinkhole on it, those guys will love you, they will actually take your fees and be very happy with that because that will improve their system and it will help you fix their clean ‑‑ clean up those PCs and in the end clean up your own IPs.
So there was lessons learned, almost all IP spaces can be cleaned. I think we have proven that that. There are some historic issues and it can take a huge amount of time. This is not something you can change or clean up overnight and definitely people more than happy to help you with setting up sinkhole or setting up using the feeds and you might be able to, if you have dirty IP space that you can buy for somebody, if you have the time you might as well, you might get a good deal out of it. And even if you have to null route 10 IPs, that is I think something you can live with. Any questions?
BRIAN NISBET: Thank you very much for that. Very informative.
I certainly have a question. One of the big problems that people find in this kind of thing and in general getting IP addresses off block lists and things is how to make contact with some of the larger RBLs and how to kind of really persuade them. So did you just use normal channels, were there ‑‑ do you have pre‑existing relationships, what is that position like?
ERIC BAIS: We had some experience of the work that we did with the GRUM bots thing but also people that have ‑‑ that helped us with the feeds, they have extremely good contacts, so the guys from shadow server and abuse IX, those guys they have all the back channels in there so they helped in some occasions as well but typically it was not really hard to basically ‑‑ at least not for us, to find what we were doing and get the right people. So ‑‑ but definitely, those organisations can definitely help. Yes.
AUDIENCE SPEAKER: Gregory from Europol. Do you have any idea or generally the bad guys are getting the IP space to run the ‑‑ in the sense would you have like different ways of were they hijacked or stolen or got illegitimately at the beginning when they established the infrastructure? You look at the step afterwards, how could clean them, how did they bet the IP space to build the infrastructure.
ERIC BAIS: Was from a Dutch hosting company, an LIR, and he caught it like ‑‑ /19 somewhere in 2003 or something. And he was basically using it for bulletproof hosting. So, it was customers that were in his space that were using it for this type of activity and basically hiding their command and control servers and we knew he was on that kind of activities, you know, so that was ‑‑ we had a pretty good indication that it was ‑‑ that we would find this kind of stuff in there. Yeah.
BRIAN NISBET: Okay. Anything more? No. Cool. Thank you very much, Eric.
So, I think we are now moving into AOB. And the one known piece of AOB we have is Jan, as I mentioned, and BCOP stuff.
JAN ZORZ: Thank you, Brian. Known as AOB this time. I will be fairly quick. Where is the pointer? As you may probably know if you are following the IPv6 Working Group and the BCOP task force we just finished with IPv6 prefix delegation, BCOP documents and got the number 690 and we got more ADS to write more best operational practices document and I am travelling quite a lot around the world talking to operators around the world and they say we are not enabling IPv6 on our mail servers because we are afraid that we will not survive. There is no IP reputation based anti‑spam tools and things like this. And we don't know how to protect our mail servers on IPv6. So, I said well, then it's time to write the best current operational practice, how to run mail server on IPv6, all this thing, D mark, SPF and my question is, because you are on an Anti‑Abuse Working Group and you are also dealing with precisely the same stuff that I am talking about, is there people in the audience or in the Working Group following remotely that would take this chance and help with the initial draft and then with writing this stuff? Anyone?
PETER KOCH: I have a question about the scope of your suggestion. Because you mentioned v6 and then you, when you said ‑‑ talked about the solutions like SPF D mark and so on, they are not v6 specific and the question attached to that is, do you think that there is already an established or at least promising operational practice or would that document be more like prescriptive for the future, what exactly are you looking for?
JAN ZORZ: Yes, there is, at least I know about LinkedIn, they are running a huge e‑mail operations on IPv6 and I believe that Frank already wrote something about it, and I was thinking about getting that piece of information and talk to Frank also, but I would like that it becomes more bottom‑up, more widely accepted and agreed operational practice with the consensus from the operational community that this is really the best way of doing things. That is why I am asking this question. So, if there is anyone that is interested in this stuff, please find me around, I will also ask the IPv6 Working Group for the same thing and I hope we get some good people to work together.
BRIAN NISBET: I think if you could send a mail to the mailing list, it would be a good thing as well.
JAN ZORZ: Thank you very much.
BRIAN NISBET: Thank you very much, Jan. Right. So, is there any other any other business? Gosh. Sorry, no one tell Rudiger I could have let him ask more questions about 2017‑02. So, there you go. I think that is all of our business. I will remind everyone that we do need an agenda for RIPE 76 for our Working Group session there. So think about that, lots of cool things. There will obviously be more discussion on the mailing list around 2017‑02, not 201207, which is something else entirely. And yes, and again, obviously whatever else is of interest to the Working Group. And so, with that ado, I will hope to see you all in Marseilles. Thank you all very much.