By Shomiron Das Gupta, Founder and Chief Executive Officer of DNIF – HyperScale SIEM
All animals are equal but some animals are more equal than others.
George Orwell (The Animal Farm)
In the 2020s, one need not be a giant security firm to understand that all data is not equal. It is visible with the explosion of data being generated daily. With the expansion of mobile devices, virtual servers, and desktops, as well as cloud-based services and RFID technologies, IT infrastructures are becoming more complex. Across domains, organizations have started looking at mechanisms to classify and maintain data based on its relevance or importance. To put an obvious end this is being able to get what is needed before it’s too late.
Let’s start by first setting the context. This discussion refers to machine data and its use specifically in the area of security and operational monitoring. Within the whole data explosion phenomenon, the increase in machine-generated data is perhaps the most significant. A massive challenge that is faced today is to make machine-generated data actionable in a cost-effective manner. When asking CISOs today whether they consume all the event data across IT systems and applications in their security monitoring solutions, the overwhelming response is that they simply cannot think of doing this because of the scale of the infrastructure required to do so. Let us learn how to prioritize the actions in the first 90 days as a CISO.
Event data generated by machines can be broadly categorized as;
- Telemetry: Events to measure key operational parameters – CPU utilization, IOPS, etc.
- Errors and Alerts: Events about errors faced during operations.
- Notifications: General notifications of what just happened on the device.
- User Activity: Notifications that identify what was done by users of the machine.
- Administration Activity: Changes are done in the settings or administration parameters of the device.
Most devices have adopted standardized log formats and event identifiers with improving logging standards. This has greatly helped device manufacturers improve the troubleshooting and performance monitoring of the devices. Today most devices provide specific Event IDs and generate logs in consumable formats by log analytics applications.
A simple assessment of the above categories would reveal that neither all data above is generated at the same volume nor has the same level of importance and relevance over a period of time.
It is still observed that although most log monitoring applications can understand the log formats and Event IDs, none of them treats the different types of data differently. All event data in log monitoring applications continue to be treated with the same level of importance and are retained for the same amount of time.
This is a mistake this industry overlooked for a long period.
Data analysts take huge pains in creating parsers to identify, parse, annotate and enrich log events from different applications. An obvious extension of this would also be to bucket these events in such a way that makes it possible to store and retain data as per the category. This would give users the flexibility to search and retain data based on the type and hence maximize utilization of the same. A more appropriate mode of storing this data would be as follows:
Taking this approach at an enterprise level allows users to store exactly the data that is needed to be stored and at the level of resiliency and performance which is needed.
It is important to have core IT engineers that have made architectural optimizations and have included the above concept at the core of our product design. This data-driven approach will help customers to reduce blind spots and get the best ROI.
Machine data is generated continuously by every processor-based system (including HVAC controllers, smart electrical meters, GPS devices, and RFID tags), as well as many consumer-oriented systems (mobile devices, automobiles, and medical devices with embedded electronic devices)
As more and more businesses use big data analytics and machine learning, there are more chances to properly analyze machine data alongside other corporate data types to gain fresh ideas and views that can aid them in making better business decision