Home / Series / Chaos Communication Congress / Aired Order / Season 32 / Episode 97

Gibberish Detection 102

Speaker: Ben H. DGAs (Domain Generation Algorithms) have become a trusty fallback mechanism for malware that’s a headache to deal with, but they have one big drawback – they draw a lot of attention to themselves with their many DNS request for gibberish domains. When basic entropy-based Machine Learning methods rose to the challenge of automatically detecting DGAs, DGAs responded by subtly changing their output to be /just/ plausible enough to fool those methods. In this talk we’ll harness the might of the English dictionary, cut corners to achieve sane running times for insane computations, and use fancy Machine Learning® methods – all in order to build a classifier with a higher standard for gibberish plausibility. In recent years, there has been a rising trend in malware’s use of Domain Generation Algorithms (DGAs) as a fallback mechanism in case the campaign is shut down at the DNS level. DGAs are a headache to deal with, but they have one big drawback – they make a lot of noise. To be more precise, they generate a very large amount of DNS requests for domains, and the domains are often complete gibberish. This situation looks ripe to be exploited with your favorite Cyber™ Machine Learning® Big Data© solution; and indeed, advances were made by basic language processing methods that could detect and stop the outright complete gibberish. These worked well, until DGAs mutated, and started producing more reasonable gibberish. A milestone in this regard was the introduction of KWYJIBO, a DGA that generates gibberish where every other letter is a vowel (e. g. „garolimoja“), which stumps the old methods completely. How do you thwart KWYJIBO and other DGAs of its sophistication? How do you look for meaninglessness in string-space? In this talk we’ll harness the might of the English dictionary; cheat mathematics to cut running times from impossible to reasonable; and demonstrate a fancy Cyber™ Machine Learning® Big Data© tool bas

English
  • Originally Aired December 29, 2015
  • Runtime 60 minutes
  • Production Code 7243
  • Created September 19, 2017 by
    Administrator admin
  • Modified September 19, 2017 by
    Administrator admin