Ehsan Aghaei, Ph.D.

Sr. NLP Scientist and Researcher


SecureGPT: A Domain-Specific Text Generation Model for Cybersecurity

SecureGPT is an innovative AI tool trained on cybersecurity data. It aids in various cybersecurity tasks by generating, analyzing, and interpreting text. Its applications include analyzing threat intelligence, automating report writing, detecting phishing attacks, creating security awareness content, reviewing code for vulnerabilities, drafting policies, aiding research, and ensuring legal compliance. SecureGPT's versatile capabilities promise to enhance cybersecurity efforts across the board.

Git Repo (TBA)

CyberEmbed: Domain-Specific embedding generation model for cybersecurity corpus

CyberEmbed is a robust platform that leverages SecureGPT, SecureBERT+, and a Siamese network to process an extensive collection of cybersecurity text pairs. Its primary objective is to empower semantic search in Retrieval Augmented Generation (RAG) models. By doing so, CyberEmbed substantially improves the quality of embedding representations for cybersecurity texts, offering analysts a valuable tool to enhance their information retrieval capabilities when utilizing Large Language Models (LLMs) within the cybersecurity domain.

Git Repo (TBA)

CyberARM: Security Controls Grid for Optimal Cyber Defense Planning

An innovative model and optimization techniques for the selection of the required CSC to achieve optimal risk mitigation while considering factors such as acceptable residual risk, budget limitations, and resiliency requirements. 

Predicting Attack Actions from Vulnerabilities Using Cybersecurity-specific Contextual Language Model

We have developed a semi-supervised transfer learning framework on top of SecureBERT and using semantic role labeling to generate, collect, and annotated threat-related textual data and classify cybersecurity vulnerabilities to tactic, technique, and procedures (TTPs).


E. Aghaei, X. Niu, W.Shadid, B. Chu, E. Al-Shaer

We have released the first transformer-based  domain-specific language model for representing cybersecurity text which is trained and tested on large in-domain textual data.

> Github

> Huggingface

> Paper

> YouTube

SecureBERT predicts course of defense actions

CVE to CWE: Hierarchical Classification

I developed a hierarchical design on top of the SecureBERT to classify CVEs to CWEs in different levels of CWEs' tree-based structure.

CVSS Base Metric Prediction

I developed a tool by combining S.o.T.A SecureBERT language model and classic approach, TF-IDF, to predict the value of CVSS base metrics for CVEs on a highly imbalanced dataset.

Research Interests

Machine Learning, Deep Learning, NLP, Language Modeling, Text Mining, Information Retrieval, Cyber Security, Cyber Analytics, Adversarial Machine Learning.