Cyber Security and Artificial Intelligence
by Nikos Karapanos, CTO and Co-founder Futurae Technologies AG
Artificial Intelligence (AI) and Machine Learning (ML) are nowadays used in a plethora of use cases across different industries and sectors: from getting new movies recommendations on Netflix, to optimizing corporate processes, to driving autonomous cars. Simply put, machine learning gives the ability for computers to learn from past behavior and react to newly encountered input without requiring prior explicit programming.
One of the areas that can benefit from the use of AI and ML is Cyber Security. There is actually a significant number of articles online that discuss about how AI helps in detecting and blocking cyber-attacks. While there is certainly truth in this, it is also important to state that AI is not the silver bullet that can magically solve all the challenges in the cyber security landscape.
How can Artificial Intelligence augment Cyber Security?
Broadly speaking, machine learning can be used in order to detect anomalies. Anomalies are deviations from already seen behavior, and such deviations may signify malicious behavior caused by attackers, malware, disgruntled employees or other malicious entities. In particular, we distinguish two phases; the training and the detection phase. During the training phase, the machine learning algorithm is being trained by being fed existing data in order to build a model of what constitutes normal behavior.
Once the system has been trained, we can enter the detection phase during which the system is being fed with new, previously unseen data. Based on the model built during the training phase, the system can now detect and flag anomalous behavior. Moreover, as time goes by, the system can be continuously trained with new data in order to maintain or increase its performance.
Detecting attacks (especially new, previously unseen ones) as they occur is the objective that we are after and an appropriately trained machine learning system can help us achieve this goal. Nevertheless, the key point in this generic machine learning process described above is "training data". In particular, for a machine learning system to be effective we require a sufficient amount of "good", labeled training data that will help define what constitutes normal behavior and what constitutes anomalous, attack behavior.
Without good training data the system has high chances of missing attacks (false negatives), or even so, flag normal behavior as malicious (false positives), both of which are equally as important. A system with high false negative rate is obviously not considered effective in terms of security. Nevertheless, a system which is very good at catching the actual attacks (low false negative rate) but does so at the expense of mistakenly flagging a lot of innocent activity too (high false positive rate) is also of rather limited usability. In the latter case, increased manual inspection will be required in order to differentiate the actual attacks from the false positive which may be ineffective or even impossible to perform correctly.
At this point, you might be asking "so, how hard is it to get good training data in order to build a machine learning system that is actually useful?". Well, the answer is, it depends on the use case. For example, malware and spam detection are two example use cases where we typically have millions of labeled samples that allow us to train a machine learning systems very well. On the other hand, detecting attacks by inspecting network traffic is a use case that lacks good training data which means that machine learning-based approaches in this area are not going to be effective and robust enough.
There are of course many other cyber security use cases where machine learning can be applied, but in all of them, with no exception, the effectiveness of machine learning-based solutions heavily depends on the quality of the training data.
How does Futurae use Artificial Intelligence?
At Futurae, AI is used in a different way as to what has been described above. In particular, AI enhances the security of ‘SoundProof’, Futurae’s innovative two-factor authentication (2FA) solution. The beauty of SoundProof is that it is completely hands free for the user. Upon authenticating to a SoundProof-enabled website, you would only enter username and password like normally, yet still benefit from the increased security that 2FA offers. But how is this possible?
SoundProof works by comparing a few seconds of ambient sound captured by your phone and the computer from which you are logging in. If the recordings of the computer and phone match, it means that the two devices are close to each other, so it must be the legitimate user logging in (and not, say, a hacker that stole their password). All this happens automatically and transparently in the background while you log in, in a matter of few seconds. All that is needed is to have your phone somewhere close to you and that's it!
Where does AI come into play in SoundProof?
The audio comparison algorithm of SoundProof leverages machine learning in order to ensure high accuracy in all cases, regardless of the type of ambient noise that is present in the environment. For this Futurae researchers collected tens of thousands of audio samples and used them in order to train the system and make it able to identify and categorize the recordings captured during SoundProof execution into different categories, depending on their sound content. For example, they might be categorized as music, human speech, animal sounds, urban sounds and so on. Depending on this classification, the SoundProof comparison algorithm ensures high accuracy for the comparison result.
Of course, it is important that the whole process, besides being secure, happens very fast and transparently during the login process. AI and SoundProof make sure that you stay both secure and happy!