Unleashing AI with PowerShell, Log Analytics, and Azure Open AI
Step into the World of Vector Embeddings and Semantic Search utilizing familiar IT tools
AI is not a mystical realm accessible only to seasoned data scientists. With familiar tools like PowerShell and cloud services like Azure, even traditional IT professionals can delve in to explore and understand the fascinating world of AI. This article unveils a unique approach to playing with vector embeddings and semantic search utilizing Azure’s OpenAI GPT service, Azure Log Analytics, and Powershell.
The Confluence of Tools
As a Principal Security Architect, my roles have required me to evolve my skill and understanding of Powershell scripting. This was primarily due to my work within Windows operating system focused companies. Couple this with my general curiosity about new technology and my most recent multi-year focus on understanding and securing Azure, and you may begin to understand the appeal of coupling Log Analytics, Kusto, and Azure’s recent OpenAI offering together with Powershell to understand better semantic search vector embeddings and vector databases.
PowerShell has become a staple tool for IT professionals for various reasons. I utilized it within this design for the following reasons:
- Automation and Scripting: As a task automation and configuration management framework, Powershell allowed me to automate the creation, uploading, and querying of vectors.
- Integration with Microsoft Products: Since PowerShell is a Microsoft product, it integrates seamlessly with my local data and with any key core Azure service, including Log Analytics and Azure Open AI.
Azure Log Analytics and Kusto (KQL, Kusto Query Language) have gained immense popularity over the past few years. I chose it here due to:
- Ease of Use & Implementation: Log Analytics, an integrated platform-as-a-service offering powered by Azure Data Explorer, was a one-touch deployment within this solution. The platform is married to KQL, simplifying the storage and analysis of the dataset.
- Versatility: Log Analytics supports the ingestion of custom semi-structured event data. This enabled the ability to store a vector, along with other key fields, thus easing the analysis of the dataset.
- Powerful Query Language: Kusto Query Language (KQL) is the heart of Azure Log Analytics. It is a simple yet highly expressive language that makes it easy to explore and gain insights from data. KQLs versatility was the key to unlocking the creation of a custom cosine similarity function used within the solution.
The synergy of these large event ingestion and analytics capabilities is why Microsoft decided to power its Security Incident and Event Management tool, Sentinel, with this same backend stack, and it’s also why I selected it for this task.
Azure’s OpenAI offering introduces several benefits for IT professionals, including:
- Privacy: Azure’s private OpenAI GPT offering keeps my company’s data private.
- Security: The private offering utilizes varying degrees of security integration, including using managed system identities for API calls.
- State-of-the-Art Technology: OpenAI is a leading entity in artificial intelligence research. Leveraging Azure’s OpenAI service provides access to cutting-edge AI models like GPT-4 or text-embedding-ada-002, which was used for vector generation.
Overall, Azure’s OpenAI offering provides an easy and intuitive path toward exploring vector generation.
A Unique Approach: Vector Embeddings and Semantic Search
Vector embeddings and semantic search are essential concepts in artificial intelligence and natural language processing.
- Vector Embeddings: Imagine that you must represent the meaning of words in a way that a computer can understand. This is where vector embeddings come in. In simple terms, vector embedding translates a word or phrase into a series of numbers (a vector) so that words with similar meanings have similar values. It’s like mapping words into a multidimensional space where the ‘distance’ and ‘direction’ between words indicate how closely related they are. For example, in this space, the vectors for “dog” and “puppy” would be closer to each other than the vectors for “dog” and “car.”
- Semantic Search: Semantic search is the ability to understand a query’s intent and meaning instead of focusing on individual keywords. It makes search more intelligent and relevant, as it considers the context and synonyms, delivering more precise results.
Regarding Azure’s OpenAI GPT service, specifically the text-embedding-ada-002 deployment option, this specific model produces these vector representations of text of virtually any size. Submit your text to your deployed Azure OpenAI GPT service, and the service generates a vector that encapsulates the meaning of that text. I utilized Powershell scripts for ingesting blobs of text, gathering a vector for each blob, and storing the vector in a custom table in Log Analytics.
Once these vectors are generated and stored in Log Analytics, how do we use them to find similar concepts or perform a search? This is where Kusto’s cosine similarity function comes into play.
Cosine similarity is a measure that calculates the cosine of the angle between two vectors. This value can represent how similar the two vectors (and, therefore, the two pieces of text) are. In this case, the two vectors are the search text vector and the ‘stored within Log Analytics’ text vector. The closer the cosine value to 1, the more similar the texts are in meaning. I utilized Powershell here to gather the vector for the search and then searched over Log Analytics for similarity returning the top ten results.
So, in essence, Azure’s OpenAI GPT service translated our words into a language computers can understand. Kusto helps us compare these translations to find the most similar ones via automation by Powershell.
Conclusion
As technology advances, it is becoming apparent that artificial intelligence is not just for data scientists. Even IT professionals can use tools like PowerShell and Azure to access the power of AI. I personally find the combination of Azure’s OpenAI GPT service, Log Analytics, and PowerShell very exciting as it opens up many possibilities for security solutions. This technology makes complex AI operations accessible and familiar, which can help improve my expertise. I hope it does the same for you in your field.
Step-by-step: Implementation
First, you must have an Azure OpenAI deployment available to proceed.
Part 1 — Uploading The Primary Data Set
Identify Data: The first step is identifying the text you want to search over. This can be scraped from the internet, from your intranet, or you can use the example city-state data I’ve supplied here.
Deploy Log Analytics:
# Select your subscription
Set-AzContext -Subscription "<your sub here>"
# Set the resource group name and location
$resourceGroupName = "<your rsg name here>"
$location = "East US"# Create a resource group
New-AzResourceGroup -Name $resourceGroupName -Location $location# Set the workspace name and SKU
$workspaceName = "<yourworkspaceName>"
$sku = "pergb2018"# Create the Log Analytics workspace
New-AzOperationalInsightsWorkspace -Location $location -Name $workspaceName -Sku $sku -ResourceGroupName $resourceGroupName
Ingest Data
#region - get data for which to get embeddings
$content = import-csv -Path "<yourpath>/<yourfile>.csv"
#endregion
Generate Vector
Using your existing Azure OpenAI deployment, use the following Powershell to call the API service while iterating through the data you ingested above. This will produce an object of that data with its corresponding vector.
#region - vars for gpt
$APIVersion = "2023-05-15"
$DeploymentName = "ada-embeddings" #your embedding deployment name here
$ResourceName = "<yourresourcename>"
$apikey = '<yourapikey>'
$maxtokens = 400
$Global:MyHeader = @{"api-key" = $apikey }
$gptURI = "https://$ResourceName.openai.azure.com/openai/deployments/$DeploymentName/embeddings?api-version=$APIVersion"#endregion
#region - generate embeddings for file
$report = @() #object to hold values
$i = 1
foreach($item in $content.sentence){
$info = "" | select-object item_number, data, vector $requestBody = @{
input = $item
} $jsonBodyPayload = $requestBody | ConvertTo-Json
$vectorObject = invoke-restmethod -Method POST -Uri $gptURI -ContentType "application/json" -Body $jsonBodyPayload -Headers $Global:MyHeader
$info.item_number = $i
$info.data = $item
$info.vector = $vectorObject.data.embedding
$report += $info
$i = $i + 1
$report += $info
}
Upload to Log Analytics
For each item in the object above, call Log Analytics and load the data into a custom table.
#region - load to LA
foreach($item in $report){
#log analytics setup
$workspaceId = "<yourLAworkspaceId>"
$sharedKey = "<workspaceSharedKey>"
$apiVersion = "2016-04-01"
$apiEndpoint = "https://$workspaceId.ods.opinsights.azure.com/api/logs?api-version=$apiVersion"
$currentTime = (Get-Date).AddHours(5)
$customFormat = $currentTime.ToString("yyyy-MM-ddTHH:mm:ssZ")
$rFormat = $currentTime.ToString("r")
#set to direct variable to allow for nesting into json object
$item_num = $item.item_number
$vector = $item.vector
$value = $item.data # create custom JSON object to upload
$jsonObject = @{
"eventType" = "VECTOR_ADDRESS"
"timestamp" = $customFormat
"item_num" = $item_num
"value" = $value
"vector" = $vector
} $jsonString = $jsonObject | ConvertTo-Json # Create the authorization signature
$date = $rFormat
$contentType = "application/json"
$contentLength = $jsonString.Length
$stringToHash = "POST`n$contentLength`n$contentType`nx-ms-date:$date`n/api/logs"
$bytesToHash = [Text.Encoding]::UTF8.GetBytes($stringToHash)
$keyBytes = [Convert]::FromBase64String($sharedKey)
$hmacSha256 = New-Object Security.Cryptography.HMACSHA256
$hmacSha256.Key = $keyBytes
$hashBytes = $hmacSha256.ComputeHash($bytesToHash)
$signature = [Convert]::ToBase64String($hashBytes) #Send to Log Analytics
$headers = @{
"Authorization" = "SharedKey ${workspaceId}:${signature}"
"Log-Type" = "<yourtablenamehere>" #this equates to the table name in LA
"x-ms-date" = $date
"Content-Type" = $contentType
} $response = Invoke-RestMethod -Uri $apiEndpoint -Method Post -Headers $headers -Body $jsonString -Verbose }#endregion
Part Two — Search For Semantic Similarity
Request Search String From User & Get Vector
#region - get embedding for hardcoded search input.
# Prompt the user for input
$UserInput = Read-Host "Please enter your information here"
$requestBodyA = @{
input = $userInput
} $jsonBodyPayloadA = $requestBodyA | ConvertTo-Json
$vectorAObject = invoke-restmethod -Method POST -Uri $gptURI -ContentType "application/json" -Body $jsonBodyPayloadA -Headers $Global:MyHeader
$vectorAValueRaw = $vectorAObject.data.embedding
$vectorAValueString = "dynamic([$($vectorAValueRaw -join ',')])" #store vector for loading into LA#endregion
Perform Symentic Search Against Log Analytics
#region - log analytics query
Set-AzContext -Subscription "<your sub here>"
$workspaceId = "<yourworkspaceID>" #query
$query = @"
let series_cosine_similarity_fl=(vec1:dynamic, vec2:dynamic, vec1_size:real=double(null), vec2_size:real=double(null))
{
let dp = series_dot_product(vec1, vec2);
let v1l = iff(isnull(vec1_size), sqrt(series_dot_product(vec1, vec1)), vec1_size);
let v2l = iff(isnull(vec2_size), sqrt(series_dot_product(vec2, vec2)), vec2_size);
dp/(v1l*v2l)
};
let s1 = $vectorAValueString;
FraudAddressVector_CL
| extend s2 = vector_s
| summarize count() by similiarity=series_cosine_similarity_fl(s1, vector_s), value_s
| top 10 by similiarity
"@
$queryResults = Invoke-AzOperationalInsightsQuery -WorkspaceId $workspaceId -Query $query write-host ""
write-host "what's being looked for: " $UserInput
write-host "-------------------------------------"
$queryResults.Results | format-table#endregion
Part 3 — The Result
Running this, I decided to see what addresses are semantically similar to ‘alligator.’ The result? Addresses in Florida and Georgia.
Whether delving into vector embeddings or maneuvering around semantic search, the IT toolkit is expanding and more exciting than ever. By bridging the gap between traditional IT roles and the dynamic world of AI, we can ask questions regarding how AI can help us in the roles we currently hold.
For instance, within Information Security, can AI help reduce risk? Based on their semantic similarity with other configurations, can I identify where risky configurations lie at scale? Can I eliminate the proliferation of toxic data by storing vectors instead?
I hope this article has ignited your enthusiasm to dive into AI, armed with your IT skills and the powerful features of Azure. Feel free to connect with me on LinkedIn.