Automating the Detection of Malicious URLs

Hi All.

I am a Machine Learning Engineer, with zero knowledge in cybersecurity. I have been tasked to automate the detection of malicious URLs for end users, using machine learning techniques. Can you all please advise me on how to proceed?

I have actually gone through some research papers on this. As far as I can tell, they are not using much of cybersecurity domain knowledge. They are only using some statistical properties like length of URL, frequency of special characters/digits, no of query parameters, no of external links in the webpage etc.

So, I am more interested from the cybersecurity perspective. How do cybersecurity professionals approach this problem? Once I understand that, I can see if I can try to incorporate some of those techniques into my automated solution.

To be specific, I have the following specific questions:

* How do cybersecurity experts detect whether a URL is malicious or not? I also see some open-source databases like Phishtank (
https://phishtank.org/) and URLHaus (
https://urlhaus.abuse.ch/). How are the URLs classified as benign/malicious by these websites?
* What parts of answer to (1), can be automated, either using machine learning, or some other techniques?
* Will I be needing some knowledge of cybersecurity to proceed with my task (I am sure I will be needing). If yes, what areas specifically? I am happy to put in effort and skill myself up in the areas required.
* What all tools already exist out there, which detect malicious URLs, from which I can take inspiration from, or compare my solution with?

Assume that I wont be having only the URL. I will be able to access the HTML content and other metadata. (And maybe even the network layer level data - like the packets sent / received etc.)

Thanks In Advance!

submitted by /u/jlteja
[link] [comments]

http://dlvr.it/T1B4d3

Author Description