Logs and events are the backbone of performant security. They provide the necessary details to investigate and audit security incidents. They also supply the ‘experiences’ that powers today’s machine learning and artificial intelligence constructs. As with life, the more experiences a system has, the more successful that system will be in identifying, alerting, and reacting to anomalous event experiences. Logs and events facilitate system learning. The more, the better.
Capturing ‘all’ logs and events can be expensive. CISOs know this because a SIEM, the tool that can ingest, learn, and alert upon these items, is one of if not the most expensive solutions in the CISO’s arsenal. Given this, CISOs would want to know if there is a less expensive way to aggregate, learn, visualize, and alert upon some of their more extensive data sets.
Azure Monitor / Log Analytics is Microsofts existing cloud-native PaaS log and event aggregation solution. Pricing for log/event ingestion is dependent on numerous items (contract discounts, pre-purchased capacity, etc.), but you could estimate roughly $2.10/Gb ingested (pricing details here). At this cost ingestion of large datasets like SQL audit, DNS or on-premise Syslog may be out of reach for some companies. As many cloud architects know, the cloud amplifies the requirement to architect for expense; enter Azure Data Explorer or ADX for short.
ADX is the backend persistence tier that powers Azure Monitor and Azure Log Analytics. Microsofts Azure Monitor and Azure Log Analytics teams are one of, if not the, largest customer of ADX. Given this, it is obvious ADX is performant. My question was, can I get ADX to ingest my customers on-prem Syslog traffic quickly. The answer was yes, sort of! What I am sharing here isn’t something I found on-line directly, but it is something you have. Congratulations!
My goals; provide a PaaS powered low-cost persistence tier for voluminous Syslog data. It needed to provide straightforward data ingestion and normalization capabilities, and a robust well-documented query language that enabled ML/ AI constructs all within a platform that integrated quickly with standard industry visualization tools.
The PowerShell code below:
- Creates a resource group, vnet, subnet, and a Ubuntu VM. This VM is the log source.
- Creates blob storage. This is the initial low-cost event / log aggregation point.
- Creates an ADX cluster and DB. This provides parsing, normalization, storage, and analysis of the data.
- Creates custom Linux Azure Diagnostic (LAD) public and private JSON documents.
I am not a programmer/coder, so you can hold any of those comments. :)
Once deployed, check your storage account to see if your Syslog data is landing in a blob container. I used the desktop version of Storage Explorer. Driving down through, I see that the blob is regularly updating.
To connect ADX to blob, you will need to create a table within the DB first, and also create a mapping for that table. To make it simple, just make the table a single column dynamic.
Once the table and mapping are complete, you will need to ingest data from the blob.
Once ingestion is complete, you should see raw records in the ADX. Ingestion into ADX occurs after a container on blob is updated. Once ingested, you can parse the raw records using mv-expand.
ADX functions and policy enable automation of normalization after data lands in the raw table. Detailed instructions on tables, mapping, functions, and auto-run policies within ADX can be found here.
There you have it. An ADX POC that allows you to send Syslog to storage and pull that into ADX for analysis with Kusto. The Syslog on the front side can be an aggregation point for your on-prem resources, and that can be sized and scaled accordingly.
Articles that helped in the success of this endeavor: