Introduction to OpenAI and Microsoft Sentinel

danielbates · ‎Mar 08 2023

Welcome to our series on OpenAI and Microsoft Sentinel! Large language models, or LLMs, such as OpenAI's GPT3 family are taking over the public imagination with innovative use cases such as text summarization, human-like conversation, code parsing and debugging, and many other examples. We've seen ChatGPT write screenplays and poetry, compose music, write essays, and even translate computer code from one language to another.

What if we could harness some of this incredible potential to help incident responders in a Security Operations Center? Well, we sure can - and it's easy! Microsoft Sentinel already includes a built-in connector for OpenAI GPT3 models that we can implement in automated playbooks powered by Azure Logic Apps. These powerful workflows are easy to write and integrate into SOC operations. Today we'll take a look at the OpenAI connector and explore some of its configurable parameters using a simple use case: describing the MITRE ATT&CK tactics associated with a Sentinel incident.

Before we get started, let's cover a few prerequisites:

If you don't already have a Microsoft Sentinel instance, you can create one using a free Azure account and follow the Sentinel onboarding quickstart.
We'll use pre-recorded data from the Microsoft Sentinel Training Lab to test our playbook.
You'll also need a personal OpenAI account with an API key for the GPT3 connection.
I would also strongly recommend checking out Antonio Formato's excellent blog on incident handling with ChatGPT and Sentinel, where Antonio introduces a very useful multipurpose playbook that has become the reference for almost every implementation of OpenAI's models in Sentinel to date.

We will start with a basic Incident-triggered playbook (Sentinel > Automation > Create > Playbook with incident trigger).

Select a subscription and resource group, add a playbook name, and move to the Connections tab. You should see Microsoft Sentinel with one or two authentication options - I'm using Managed Identity in this example - but if you don't have any connections yet, you'll be able to add the Sentinel connection in the Logic App Designer as well.

Review and create the playbook, and after a few seconds, the resource will deploy successfully and bring us to the Logic App Designer canvas:

Let's add our OpenAI connector. Click on "New step" and type "OpenAI" in the search box. You'll see the connector in the top pane and two actions below it: "Create an Image" and "GPT3 Completes your prompt":

Choose "GPT3 Completes your prompt". You'll then be asked to create a connection to the OpenAI API in the following dialog. If you don't have one already, create a secret key on https://platform.openai.com/account/api-keys and be sure to save it in a secure location!

Make sure to follow the instructions exactly when adding your OpenAI API key - it expects the word "Bearer", followed by a space, then the secret key itself:

Success! We now have our GPT3 text completion action ready for our prompt. We wanted to ask the AI model to explain the MITRE ATT&CK tactics and techniques associated with a Sentinel incident, so let's write a simple prompt using dynamic content to insert the incident tactic(s) from Sentinel.

We're almost done! Save the logic app and head over to a Microsoft Sentinel incident to give it a test run. I've got test data from the Microsoft Sentinel Training Lab in my instance, so I will run this playbook against an incident triggered by the Malicious Inbox Rule alert.

You might be wondering why we didn't configure a second action in our playbook to add a comment or task with the results. We'll get there - but first we want to make sure that our prompt is returning good content from the AI model. Head back to the Playbook and open the Overview in a new tab. You should see an item in the Runs History, hopefully with a green check mark:

Click on the item to view detailed information on the logic app run. We can expand any of the action blocks to see detailed input and output parameters:

Our GPT3 action took just two seconds to complete successfully. Let's click on the action block to expand it out and view full details on its inputs and outputs:

Let's take a closer look at the "choices" field in the Outputs section. This is where GPT3 returns its completed text along with the completion status and any error codes. I've copied the full text of the Choices output into Visual Studio Code here:

It's looking good so far! GPT3 correctly expanded on the MITRE definition of "Defense Evasion". Before we add a logic action to our playbook to create an incident comment with this answer text, let's take another look at the parameters in the GPT3 action itself. There are a total of nine parameters, not counting the engine selection and the prompt, in the OpenAI text completion action:

What do these mean, and how can we adjust them for best result? To help us understand the impact of each parameter on the result, let's move over to the OpenAI API Playground. We can paste in the exact prompt from the input field of our logic app run, but before clicking "Submit", let's make sure the parameters match. Here is a quick table to compare parameter names between the Azure Logic App OpenAI connector and the OpenAI Playground:

Azure Logic App Connector	OpenAI Playground	Explanation
Engine	Model	The model which will generate the completion. We can choose between DaVinci (new), DaVinci (old), Curie, Babbage, or Ada in the OpenAI connector, corresponding to 'text-davinci-003', 'text-davinci-002', 'text-curie-001', 'text-babbage-001', and 'text-ada-001' in the Playground.
n	n/a	How many completions to generate for each prompt. Equivalent to re-entering the prompt multiple times in the Playground.
best_of	(Same)	Generates multiple completions and returns the best one. Use with caution - this will consume a lot of tokens!
temperature	(Same)	Defines the randomness (or creativity) of the response. Set to 0 for a highly deterministic, repetitive prompt completion where the model will always return its most confident choice. Set to 1 for a maximally creative reply with more randomness, or in between as needed.
max tokens	Maximum length	The maximum length of ChatGPT's response, given in tokens. One token is approximately equal to four characters. ChatGPT usage is priced in tokens; at the time of this writing, 1000 tokens costs $0.002. The cost for an API call will include the token length of the prompt and reply together, so subtract the token length of the prompt from 1000 to set the upper limit for the reply if you want to stay at the minimum cost per response.
frequency penalty	(Same)	A number ranging from 0 to 2. The higher the value, the less likely the model will repeat lines verbatim (it will try to find synonyms or restate lines instead).
presence penalty	(Same)	A number ranging from 0 to 2. The higher the value, the less likely the model will repeat topics already mentioned in the response.
top p	(Same)	An alternative way to set the "creativity" of the response if you're not using temperature. This parameter restricts the possible answer tokens based on their probability; when set to 1, all tokens are considered, but smaller values will reduce the possible set of answers to the top X%.
user	n/a	A unique identifier. We don't need to set this parameter as our API key already functions as our identifier string.
stop	Stop sequences	Up to four sequences that will end the model's response.

Let's use the following OpenAI API Playground settings to match our Logic App action:

Model: text-davinci-003
Temperature: 1
Maximum length: 100

Here's what we get back from the GPT3 engine.

It looks like the response got cut off in mid-sentence, so we should increase our maximum length parameter. Otherwise, this response looks pretty good. We are using the highest possible value for temperature - what would happen if we reduced it for a more deterministic response? Take this example with temperature at zero:

At temperature=0, no matter how many times we regenerate this prompt, we'll get almost exactly the same result back. This works well when we are asking GPT3 to define a technical term; there shouldn't be much variance in the meaning of "Defense Evasion" as a MITRE ATT&CK tactic. We can improve the readability of the response by adding a frequency penalty to decrease the model's tendency to use the same words repeatedly ("techniques such as"). Let's increase the frequency penalty to its maximum value of 2:

So far we've only used the latest DaVinci model for our prompt completion task. What would happen if we dropped down to one of OpenAI's faster, less expensive models such as Curie, Babbage, or Ada? Let's change our model to "text-ada-001" and compare results:

Hmm… not quite. Let's try Babbage:

Babbage doesn't seem to return the result we're looking for either. Perhaps Curie would fare better?

Sadly, Curie also doesn't meet the standard set by DaVinci. They're certainly fast, but our use case of adding context to security incidents doesn't depend on sub-second response times - accuracy in summarization is far more important. We will stay with the winning combination of the DaVinci model, low temperature, and high frequency penalty.

Back in our logic app, let's transfer the settings we discovered from the Playground to the OpenAI action block:

Our logic app also needs to be able to write a comment to our incident. Click on "New step" and select "Add comment to incident" from the Microsoft Sentinel connector:

We just need to specify the incident ARM identifier and compose our comment message. First, search for "Incident ARM ID" in the dynamic content flyout menu:

Next, find the "Text" output from our previous step. You may need to click "See more" to view the outputs. The logic app designer will automatically wrap our comment action in a "For each" logic block to handle a situation where multiple completions are generated for the same prompt.

Our completed logic app should look similar to this:

Let's test it out again! Head back over to that Microsoft Sentinel incident and run the playbook. We should get another successful completion in our logic app runs history and a new comment in our incident activity log.

If you've stayed with us this far, you now have a working OpenAI GPT3 integration with Microsoft Sentinel that can add value to your security investigations. Stay tuned for our next installment where we'll discuss more ways to integrate OpenAI models with Sentinel, unlocking workflows that can help you get the most out of your security platform!

Products (50)

Special Topics (27)

Video Hub (462)

Most Active Hubs

Most Active Hubs

Video Hub

Introduction to OpenAI and Microsoft Sentinel