Fundamental-2
Module-1
Introduction
Fundamentals One Refresher -
Splunk Search Terms:
01. Keywords []
02. Booleans [Boolean operator must be uppercase]
03. Phrases [Exact phrases can be searched by placing the keyword in quotes]
04. Fields [We can also search on an extracted field by typing a field value pair into the search. Field names are case sensitive. Field values are not]
05. Wildcards
[Wildcards can be used at any point in keyword text and fields]
[Using a wildcard at the begining of the keyword or field is very inefficient]
06. Comparisons [Comparison operator can be used to define events.]
[Supported operators are, =Equal, !=Not Equal, <Less than, <=Less than or equal to, >Greater than, >=Greater than or equal to]
Introduction
Fundamentals One Refresher -
Splunk Search Terms:
01. Keywords []
02. Booleans [Boolean operator must be uppercase]
03. Phrases [Exact phrases can be searched by placing the keyword in quotes]
04. Fields [We can also search on an extracted field by typing a field value pair into the search. Field names are case sensitive. Field values are not]
05. Wildcards
[Wildcards can be used at any point in keyword text and fields]
[Using a wildcard at the begining of the keyword or field is very inefficient]
06. Comparisons [Comparison operator can be used to define events.]
[Supported operators are, =Equal, !=Not Equal, <Less than, <=Less than or equal to, >Greater than, >=Greater than or equal to]
Commonly used commands are,
fields
command allows you to include or exclude specific fields from search
results.
[sourcetype=access_combined | fields clientip, action]
table command returns specified fields in a table format.
[sourcetype=access_combined | table clientip, action]
rename command, can be used to rename fields.
[sourcetype=access_combined | rename clientip as "userip"] dedup command, removes duplicate events from results that share common values.
[sourcetype=access_combined | dedup clients]
sort command allows you to display your results in ascending or descending order.
[sourcetype=access_combined | sort - price]
lookup command adds field values from external sources.
[sourcetype=access_combined | lookup dnslookup clientip]
transforming commands, are used to order search results into a data table that Splunk can use for statistical purposes. They are required to transform search results into a visualization.
top & rare, with top and rare allowing you to quickly find most common and rare values in a result set.
Stats, for producing statistical information from our search results.
Module-2
Beyond basic search fundamentals
[If a command references a specific value, that value will be case sensitive]
[eg., replace command]
{sourcetype=access_combined purchase | replace www1 with server1 in host}
[Field values from a Lookup are case sensitive by default. A User within Admin roles can choose for values to be case insensitive when creating lookup table but best to assume that this is not the case when searching.]
[Boolean operator are case sensitive. If boolean operator is not supplied with Upper case it is seen as literal keyword]
[when searching using tag, tag values is case sensitive]
[When using regex with commands, the regex terms needs to follow define character clause case sensitivity]
[Buckets]
01. When Splunk inguest data that will be stored in Bucket. 02. Bucket are directories containing set of, Raw data and Indexing data.
03. Buckets have configurable with Maximum size and Maximum time span.
04. There are three kinds of searchable buckets in Splunk, Hot, Warm and Cold.
Hot - As events are indexed, they are placed in Hot buckets. Hot buckets are the only writeable buckets.
Hot bucket rolls to warm bucket when,
- Maximum size reached
- Time span reached
- Indexer is restarted
Warm - Upon rolling, bucket is closed, renamed and changed to "read only" status.
Warm buckets are renamed displaying time stamps with youngest and oldest events in the bucket.
Warm bucket rolls to cold bucket when,
- Maximum size reached
- Time span reached
Cold - The bucket is typically stored in different location then Hot and Warm buckets. This allows them to be stored on a slower, cost-effectivie infrastructure.
Using Wildcards -
01. Wildcards are tested after all other search terms.
02. Only trailing wildcards make efficient use of index.
[sourcetype=access*]
03. Wildcards at the beginning of a string cause Splunk to search all events.
04. Wildcards in the middle of a string produce inconsistent results.
05. Avoid using wildcards to match punctuation.
06. Be as specific as possible in search terms.
Search Modes -
Knowing when to use appropriate search mode can help your search more efficient or allow better access to your data for discovery.
Fast Mode - It emphasis performance, only returns essential data. when running non-transforming search in this mode only fields required for the search are extracted and displayed in events. As with all non-transforming commands, statistics and visualization are not available but patterns are. If we run a transforming commands events and patterns are no longer return but we have access to statistics and visualizations.
Verbose Mode - It emphasis completeness by returning all field and event data. If we run a non-transforming search in this mode, we get events and patterns (same like, Fast Mode) but all field for the events are extracted and displayed in the side bar. If we run a transforming search in this mode, we can access to see statistics and visualizations, but we can also see patterns and events.
Smart Mode - It is designed to return the best results for the search being run using a combination of both fast and verbose modes. If we use a non-transforming search, it acts like verbose mode, returning all fields for events and access to patterns. If we use transforming commands, it will act like fast mode.
General Best Practices,
01. The less data you have to search, the faster Splunk will be.
02. Fields extracted at index time do not need to be extracted for each search. (time, index, source, host and sourcetype)
03. Inclusion is generally better than exclusion. (searching for "access denied" is better than Not "access-granted")
Use the appropriate search mode,
Fast mode for performance
Verbose mode for completeness
Smart mode for the combination of both
Search Job Inspector,
you might have times to tune a search to get more efficient search. The search job inspector is a tools that can be used to troubleshoot performance of searches and determines which phase of a search takes the most time. It dissects behavior of searches to help understand costs of knowledge objects, search commands and other components with in the search. Any search job that has not expired can be inspected.
Module-3
Splunk will allow you to visualize your data in many ways. Any search that returns statistical values can be viewed as a chart. Most visualization requires results structured its tables with at least two columns.
The chart command can take two clause statements (over & by).
Over - It tells Splunk whcih field you want to be on the X axis.
[Any stats function can be applied to the chart command.]
[index=web sourcetype=access_combined status>299 | chart count over status]
status is the x-axis and count is the y-axis. The y-axis is always to be numeric, so that it can be charted.
By - The "by" clause comes into play when we want to split our data by an additional field.
[index=web sourcetype=access_combined status>299 | chart count over status by host]
unlike the stats command, only one value can be specified after the "by" modifier when using the "over" clause. If two "by" clause is used without the "over" clause, the first field is used as the "over" clause.
[index=web sourcetype=access_combined status>299 | chart count by status, host]
[index=web sourcetype=access_combined status>299 | chart count over host by product_name]
[index=web sourcetype=access_combined status>299 | chart count over host by product_name usenull=false] --> to remove null from our data
[index=web sourcetype=access_combined status>299 product_name=* | chart count over host by product_name] --> it does remove null values in the intial search which is most efficient
{The chart command by default is limited to 10 columns, others can be included with the limit argument, by default showup as other in your events}
[index=web sourcetype=access_combined status>299 product_name=* | chart count over host by product_name useother=false] --> to remove field "other" in the column
[index=web sourcetype=access_combined status>299 product_name=* | chart count over host by product_name limit=5] --> to display the number of product to be showened
[index=web sourcetype=access_combined status>299 product_name=* | chart count over host by product_name limit=0] --> using "limit=0" to display all of the product
Timechart command - Perfoms stats aggregations against time. Time is always the X axis.
[index=sales sourcetype=vendor_sale | timechart count]
[index=sales sourcetype=vendor_sale | timechart count by product_name]
As with chart, any stats function can be applied with timechart command and only one value can be specified after the "by" modifier. The limit, useother and usenull also available to timechart. The timechart command intellegently clusters data in time intervals dependent on the time range selector.
To change the span of time of the cluster, you can use the argument of span with the time to group by.
[index=sales sourcetype=vendor_sales | timechart span=12hr sum(price) by product_name limit=0]
We may want to compare data over specific time periods and Splunk provide the command "timewrap".
[index=sale sourcetype=vendor_sales product_name="Dream Crusher" | timechart span=1d sum(price) by product_name | timewrap 7d | rename _time as Day | eval Day = strftime(Day, "%A")]
Line graph -
Chart overlay - will allow you to lay a line chart of one series over another visualization.
[index=main (sourcetype=access_combined action=purchase status=200) OR sourcetype=vendor_sales | timechart sum(price) by sourcetype | rename access_combined as "web_sales"]
Area chart - Differences of Line graph and Area formatting is the ability to show the stack.
Column chart - It also allows you to stack data.
Bar graph - uses horizontal bars to show comparision and can be stacked.
Pie chart - It takes the data and visualizes the percentage for each slice.
Scatter chart - It show the relationship between two discret data values, plotted on a X & Y axis.
Bubble chart - We can add more versility by adding a bubble chart. This provides a visual way to view a third dimention of data. Each bubble plots against two dimentions of X & Y axis. The size of the bubble represents the value for the third dimention.
Trellis layout - It allow us to split our visualization by a selected field or aggregation. While we get multiple visualization, the originating search is only run once.
[Additional visualizations can be downloaded from Splunk base]
Module-4
There are several options for representing a data that includes Geographical information. IPlocation - It is used to lookup and add a location information to events. Data searches city, country, region, latitude and longitude can be added to events that include external ipaddress.
[index=security sourcetype=linux_secure action=success src_ip!=10.* | iplocation src_ip]
Depending on the IP, not all location ip information might be available for it. This is a nature of geolocation and should be taken to consideration when searching your data.
If you are collecting Geographical data, you can use the Geostats command to aggregate the data for use on a map visualization. The Geostats command uses the same functions as the stats command.
[index=sales sourcetype=vendor_sales | geostats latfield=VendorLatitude longfield=VendorLongitude count]
[index=sales sourcetype=vendor_sales | geostats latfield=VendorLatitude longfield=VendorLongitude count by product_name]
Unlike the stats command, the Geostats command only accepts one "by" argument. To control the column count, the globallimit argument can be used.
[index=sales sourcetype=vendor_sales | geostats latfield=VendorLatitude longfield=VendorLongitude count by product_name globallimit=4]
[You can lookup Geographical data to use with Geostats using the Iplocation command]
[index=sales sourcetype=linux_secure action=success src_ip!=10.* | iplocation src_ip | geostats latfield=lat longfield=lon count] Choropleth map - It is another way to see your data as a Geographical visualization. They allow us to use shawding to show relative matrix over predefined locations of a map.
[In order to use Choropleth, you need a .kmz or compressed Keyhole Markup Language File that defines region boundries]
To prepare our events with Choropleth, we use the Geom command. It adds the field with geographical data structures matching polygons on map.
[index=sales sourcetype=vendor_sales VendorID>=5000 AND VendorID<=5055 | stats count as Sales by VendorCountry | geom geo_countries featureIdField=VendorCountry]
::geo_countries - name of the kmz file/also known as the featureCollection::
::featureIdField is also required::
Single value visualization - When the result contains single value, there are two different types of visualizations, you can use to display them.
You can pipe the events into the gauge command,
[index=web sourcetpe=access_combined action=purchase | stats sum(price) as total | gauge total 0 30000 60000 70000]
[Once the color range format is set, it stays persistent over the radio, filler or marker gauges]
The Trendline command computes moving averages of field values. Giving you clear understanding of how your data is trending.
[index=web sourcetype=access_combined action=purchase status=200 | timechart sum(price) as sales | trendline wma2(sales) as trend]
trendline command requires three arguments, Trendtype:
- simple moving average / sma
- exponential moving average / ema
- weighted moving average / wma
"sma/ema/wma", computes the sum of data points over a period of time. The wma and ema assign a heavier weighting to more current data points.
number "2", will average the data points on every two days.
field "sales", we need to define a field to calculate the trend from
Addtotals command - It computes the sum of all numeric fields for each event and create a total column.
[index=web sourcetype=access_combined file=* | chart sum(bytes) over host by file | addtotals col=true label="Total" labelfield="host" fieldname="Total by host" row=false]
col - We can create a column summary by setting a “col” variable to true
label - Row is created and it is not labeled. we add a label by setting the "label" variable with the name to use
labelfield - The "labelfield" variable with the field to show the label-in
fieldname - We can change the label for our Total using the "fieldname" variable
row - It is used to remove the field by setting the "row" variable to false
Module-5
Eval command - It is used to calculate and manipulate field values, Arithmetic, Concatenation & Boolean operators are supported by the command. Results can be written to new field or replace existing field. Field values are created by "eval" command is case sensitive.
[index=network sourcetype=cisco_wsa_squid | stats sum(sc_bytes) as Bytes by usage | eval bandwidth = Bytes/1024/1024]
[index=network sourcetype=cisco_wsa_squid | stats sum(sc_bytes) as Bytes by usage | eval bandwidth = rount(Bytes/1024/1024,2)]
[index=network sourcetype=cisco_wsa_squid | stats sum(sc_bytes) as Bytes by usage | eval bandwidth = rount(Bytes/1024/1024,2) | sort -bandwidth | rename bandwidth as "Bandwidth (MB)" | fields - Bytes]
Along with conversion values, the Eval command allows to perform Mathematical functions against fields with numerical values.
[index=web sourcetype=access_c* product_name=* action=purchase | stats sum(price) as total_list_price, sum(sale_price) as total_sale_price by product_name | eval discount = round(((total_sal_price - total_list_price) / total_list_price)*100) | sort - discount | eval discount = discount."%"]
Convert values with Eval command - Tostring function - It converts numerical values to strings. The tostring function also allows formating of strings, this allows to format for time, hexadecimal numbers and commas.
[index=web sourcetype=access* product_name=* action=purchase | stats sum(price) as total_list_price, sum(sale_price) as total_sale_price by product_name | eval total_list_price = "$" + tostring(total_list_price,"commas")]
[After use tostring function, the field values might not sort numerically because are now ascii values]
[sourcetype=access_combined | fields clientip, action]
table command returns specified fields in a table format.
[sourcetype=access_combined | table clientip, action]
rename command, can be used to rename fields.
[sourcetype=access_combined | rename clientip as "userip"] dedup command, removes duplicate events from results that share common values.
[sourcetype=access_combined | dedup clients]
sort command allows you to display your results in ascending or descending order.
[sourcetype=access_combined | sort - price]
lookup command adds field values from external sources.
[sourcetype=access_combined | lookup dnslookup clientip]
transforming commands, are used to order search results into a data table that Splunk can use for statistical purposes. They are required to transform search results into a visualization.
top & rare, with top and rare allowing you to quickly find most common and rare values in a result set.
Stats, for producing statistical information from our search results.
Module-2
Beyond basic search fundamentals
[If a command references a specific value, that value will be case sensitive]
[eg., replace command]
{sourcetype=access_combined purchase | replace www1 with server1 in host}
[Field values from a Lookup are case sensitive by default. A User within Admin roles can choose for values to be case insensitive when creating lookup table but best to assume that this is not the case when searching.]
[Boolean operator are case sensitive. If boolean operator is not supplied with Upper case it is seen as literal keyword]
[when searching using tag, tag values is case sensitive]
[When using regex with commands, the regex terms needs to follow define character clause case sensitivity]
[Buckets]
01. When Splunk inguest data that will be stored in Bucket. 02. Bucket are directories containing set of, Raw data and Indexing data.
03. Buckets have configurable with Maximum size and Maximum time span.
04. There are three kinds of searchable buckets in Splunk, Hot, Warm and Cold.
Hot - As events are indexed, they are placed in Hot buckets. Hot buckets are the only writeable buckets.
Hot bucket rolls to warm bucket when,
- Maximum size reached
- Time span reached
- Indexer is restarted
Warm - Upon rolling, bucket is closed, renamed and changed to "read only" status.
Warm buckets are renamed displaying time stamps with youngest and oldest events in the bucket.
Warm bucket rolls to cold bucket when,
- Maximum size reached
- Time span reached
Cold - The bucket is typically stored in different location then Hot and Warm buckets. This allows them to be stored on a slower, cost-effectivie infrastructure.
Using Wildcards -
01. Wildcards are tested after all other search terms.
02. Only trailing wildcards make efficient use of index.
[sourcetype=access*]
03. Wildcards at the beginning of a string cause Splunk to search all events.
04. Wildcards in the middle of a string produce inconsistent results.
05. Avoid using wildcards to match punctuation.
06. Be as specific as possible in search terms.
Search Modes -
Knowing when to use appropriate search mode can help your search more efficient or allow better access to your data for discovery.
Fast Mode - It emphasis performance, only returns essential data. when running non-transforming search in this mode only fields required for the search are extracted and displayed in events. As with all non-transforming commands, statistics and visualization are not available but patterns are. If we run a transforming commands events and patterns are no longer return but we have access to statistics and visualizations.
Verbose Mode - It emphasis completeness by returning all field and event data. If we run a non-transforming search in this mode, we get events and patterns (same like, Fast Mode) but all field for the events are extracted and displayed in the side bar. If we run a transforming search in this mode, we can access to see statistics and visualizations, but we can also see patterns and events.
Smart Mode - It is designed to return the best results for the search being run using a combination of both fast and verbose modes. If we use a non-transforming search, it acts like verbose mode, returning all fields for events and access to patterns. If we use transforming commands, it will act like fast mode.
General Best Practices,
01. The less data you have to search, the faster Splunk will be.
02. Fields extracted at index time do not need to be extracted for each search. (time, index, source, host and sourcetype)
03. Inclusion is generally better than exclusion. (searching for "access denied" is better than Not "access-granted")
Use the appropriate search mode,
Fast mode for performance
Verbose mode for completeness
Smart mode for the combination of both
Search Job Inspector,
you might have times to tune a search to get more efficient search. The search job inspector is a tools that can be used to troubleshoot performance of searches and determines which phase of a search takes the most time. It dissects behavior of searches to help understand costs of knowledge objects, search commands and other components with in the search. Any search job that has not expired can be inspected.
Module-3
Splunk will allow you to visualize your data in many ways. Any search that returns statistical values can be viewed as a chart. Most visualization requires results structured its tables with at least two columns.
The chart command can take two clause statements (over & by).
Over - It tells Splunk whcih field you want to be on the X axis.
[Any stats function can be applied to the chart command.]
[index=web sourcetype=access_combined status>299 | chart count over status]
status is the x-axis and count is the y-axis. The y-axis is always to be numeric, so that it can be charted.
By - The "by" clause comes into play when we want to split our data by an additional field.
[index=web sourcetype=access_combined status>299 | chart count over status by host]
unlike the stats command, only one value can be specified after the "by" modifier when using the "over" clause. If two "by" clause is used without the "over" clause, the first field is used as the "over" clause.
[index=web sourcetype=access_combined status>299 | chart count by status, host]
[index=web sourcetype=access_combined status>299 | chart count over host by product_name]
[index=web sourcetype=access_combined status>299 | chart count over host by product_name usenull=false] --> to remove null from our data
[index=web sourcetype=access_combined status>299 product_name=* | chart count over host by product_name] --> it does remove null values in the intial search which is most efficient
{The chart command by default is limited to 10 columns, others can be included with the limit argument, by default showup as other in your events}
[index=web sourcetype=access_combined status>299 product_name=* | chart count over host by product_name useother=false] --> to remove field "other" in the column
[index=web sourcetype=access_combined status>299 product_name=* | chart count over host by product_name limit=5] --> to display the number of product to be showened
[index=web sourcetype=access_combined status>299 product_name=* | chart count over host by product_name limit=0] --> using "limit=0" to display all of the product
Timechart command - Perfoms stats aggregations against time. Time is always the X axis.
[index=sales sourcetype=vendor_sale | timechart count]
[index=sales sourcetype=vendor_sale | timechart count by product_name]
As with chart, any stats function can be applied with timechart command and only one value can be specified after the "by" modifier. The limit, useother and usenull also available to timechart. The timechart command intellegently clusters data in time intervals dependent on the time range selector.
To change the span of time of the cluster, you can use the argument of span with the time to group by.
[index=sales sourcetype=vendor_sales | timechart span=12hr sum(price) by product_name limit=0]
We may want to compare data over specific time periods and Splunk provide the command "timewrap".
[index=sale sourcetype=vendor_sales product_name="Dream Crusher" | timechart span=1d sum(price) by product_name | timewrap 7d | rename _time as Day | eval Day = strftime(Day, "%A")]
Line graph -
Chart overlay - will allow you to lay a line chart of one series over another visualization.
[index=main (sourcetype=access_combined action=purchase status=200) OR sourcetype=vendor_sales | timechart sum(price) by sourcetype | rename access_combined as "web_sales"]
Area chart - Differences of Line graph and Area formatting is the ability to show the stack.
Column chart - It also allows you to stack data.
Bar graph - uses horizontal bars to show comparision and can be stacked.
Pie chart - It takes the data and visualizes the percentage for each slice.
Scatter chart - It show the relationship between two discret data values, plotted on a X & Y axis.
Bubble chart - We can add more versility by adding a bubble chart. This provides a visual way to view a third dimention of data. Each bubble plots against two dimentions of X & Y axis. The size of the bubble represents the value for the third dimention.
Trellis layout - It allow us to split our visualization by a selected field or aggregation. While we get multiple visualization, the originating search is only run once.
[Additional visualizations can be downloaded from Splunk base]
Module-4
There are several options for representing a data that includes Geographical information. IPlocation - It is used to lookup and add a location information to events. Data searches city, country, region, latitude and longitude can be added to events that include external ipaddress.
[index=security sourcetype=linux_secure action=success src_ip!=10.* | iplocation src_ip]
Depending on the IP, not all location ip information might be available for it. This is a nature of geolocation and should be taken to consideration when searching your data.
If you are collecting Geographical data, you can use the Geostats command to aggregate the data for use on a map visualization. The Geostats command uses the same functions as the stats command.
[index=sales sourcetype=vendor_sales | geostats latfield=VendorLatitude longfield=VendorLongitude count]
[index=sales sourcetype=vendor_sales | geostats latfield=VendorLatitude longfield=VendorLongitude count by product_name]
Unlike the stats command, the Geostats command only accepts one "by" argument. To control the column count, the globallimit argument can be used.
[index=sales sourcetype=vendor_sales | geostats latfield=VendorLatitude longfield=VendorLongitude count by product_name globallimit=4]
[You can lookup Geographical data to use with Geostats using the Iplocation command]
[index=sales sourcetype=linux_secure action=success src_ip!=10.* | iplocation src_ip | geostats latfield=lat longfield=lon count] Choropleth map - It is another way to see your data as a Geographical visualization. They allow us to use shawding to show relative matrix over predefined locations of a map.
[In order to use Choropleth, you need a .kmz or compressed Keyhole Markup Language File that defines region boundries]
To prepare our events with Choropleth, we use the Geom command. It adds the field with geographical data structures matching polygons on map.
[index=sales sourcetype=vendor_sales VendorID>=5000 AND VendorID<=5055 | stats count as Sales by VendorCountry | geom geo_countries featureIdField=VendorCountry]
::geo_countries - name of the kmz file/also known as the featureCollection::
::featureIdField is also required::
Single value visualization - When the result contains single value, there are two different types of visualizations, you can use to display them.
You can pipe the events into the gauge command,
[index=web sourcetpe=access_combined action=purchase | stats sum(price) as total | gauge total 0 30000 60000 70000]
[Once the color range format is set, it stays persistent over the radio, filler or marker gauges]
The Trendline command computes moving averages of field values. Giving you clear understanding of how your data is trending.
[index=web sourcetype=access_combined action=purchase status=200 | timechart sum(price) as sales | trendline wma2(sales) as trend]
trendline command requires three arguments, Trendtype:
- simple moving average / sma
- exponential moving average / ema
- weighted moving average / wma
"sma/ema/wma", computes the sum of data points over a period of time. The wma and ema assign a heavier weighting to more current data points.
number "2", will average the data points on every two days.
field "sales", we need to define a field to calculate the trend from
Addtotals command - It computes the sum of all numeric fields for each event and create a total column.
[index=web sourcetype=access_combined file=* | chart sum(bytes) over host by file | addtotals col=true label="Total" labelfield="host" fieldname="Total by host" row=false]
col - We can create a column summary by setting a “col” variable to true
label - Row is created and it is not labeled. we add a label by setting the "label" variable with the name to use
labelfield - The "labelfield" variable with the field to show the label-in
fieldname - We can change the label for our Total using the "fieldname" variable
row - It is used to remove the field by setting the "row" variable to false
Module-5
Eval command - It is used to calculate and manipulate field values, Arithmetic, Concatenation & Boolean operators are supported by the command. Results can be written to new field or replace existing field. Field values are created by "eval" command is case sensitive.
[index=network sourcetype=cisco_wsa_squid | stats sum(sc_bytes) as Bytes by usage | eval bandwidth = Bytes/1024/1024]
[index=network sourcetype=cisco_wsa_squid | stats sum(sc_bytes) as Bytes by usage | eval bandwidth = rount(Bytes/1024/1024,2)]
[index=network sourcetype=cisco_wsa_squid | stats sum(sc_bytes) as Bytes by usage | eval bandwidth = rount(Bytes/1024/1024,2) | sort -bandwidth | rename bandwidth as "Bandwidth (MB)" | fields - Bytes]
Along with conversion values, the Eval command allows to perform Mathematical functions against fields with numerical values.
[index=web sourcetype=access_c* product_name=* action=purchase | stats sum(price) as total_list_price, sum(sale_price) as total_sale_price by product_name | eval discount = round(((total_sal_price - total_list_price) / total_list_price)*100) | sort - discount | eval discount = discount."%"]
Convert values with Eval command - Tostring function - It converts numerical values to strings. The tostring function also allows formating of strings, this allows to format for time, hexadecimal numbers and commas.
[index=web sourcetype=access* product_name=* action=purchase | stats sum(price) as total_list_price, sum(sale_price) as total_sale_price by product_name | eval total_list_price = "$" + tostring(total_list_price,"commas")]
[After use tostring function, the field values might not sort numerically because are now ascii values]
The fieldformat command - It can be used to
format values but not changed the characteristics of underlying values. It uses
the same functions as the eval command.
[index=web sourcetype=access* product_name=* action=purchase | stats sum(price) as total_list_price, sum(sale_price) as total_sale_price by product_name | eval total_list_price = "$" + tostring(total_list_price,"commas") | fieldformat total_sale_price = "$" + tostring(total_list_price,"commas")]
[fieldformat can be sorted numerically thats because fieldformat is happenning at display level without changing the underlying data]
[While eval creates new field values, the underlaying data in the index does not change]
Multiple Eval commands can be used in the search, since eval creates a new field, its subsequent command can reference the results of the eval commands that come before it.
[index=web sourcetype=access_combined price=* | stats values(price) as list_price, values(sale_price) as sale_price by product_name | eval current_discount=rount(((list_price - sale_price)/list_price) * 100) | eval new_discount = (current_discount -5) | eval new_sale_price = list_price - (list_price * (new_discount/100)) | eval price_change_revenue = (new_sale_price - sale_price)]
[The eval command has "if" function allows you to evaluate an argument and create defined values for fields depending on evaluate is true or false]
if(x,y,z)
x--boolean expression
y--it executes if x is true
z--it executes if x is false
[y & z must be in double quotes if not numerical]
[index=web sourcetype=access* product_name=* action=purchase | stats sum(price) as total_list_price, sum(sale_price) as total_sale_price by product_name | eval total_list_price = "$" + tostring(total_list_price,"commas") | fieldformat total_sale_price = "$" + tostring(total_list_price,"commas")]
[fieldformat can be sorted numerically thats because fieldformat is happenning at display level without changing the underlying data]
[While eval creates new field values, the underlaying data in the index does not change]
Multiple Eval commands can be used in the search, since eval creates a new field, its subsequent command can reference the results of the eval commands that come before it.
[index=web sourcetype=access_combined price=* | stats values(price) as list_price, values(sale_price) as sale_price by product_name | eval current_discount=rount(((list_price - sale_price)/list_price) * 100) | eval new_discount = (current_discount -5) | eval new_sale_price = list_price - (list_price * (new_discount/100)) | eval price_change_revenue = (new_sale_price - sale_price)]
[The eval command has "if" function allows you to evaluate an argument and create defined values for fields depending on evaluate is true or false]
if(x,y,z)
x--boolean expression
y--it executes if x is true
z--it executes if x is false
[y & z must be in double quotes if not numerical]
[index=sales sourcetype=vendor_sales | eval SalesTerritory =
if(VendorID < 4000,"North America","Rest of the World")
| stats sum(price) as TotalRevenue by SalesTerritory]
[The eval "case" function behavios much like "if" function but can take multiple boolean expressions and return the corresponding argument that is true]
[index=web sourcetype=access_combined | eval httpCategory=case(status>=200 AND status<300,"Success")]
[index=web sourcetype=access_combined | eval httpCategory=case(status>=200 AND status<300,"Success", status>=300 AND status<400,"Redirect", status>=400 AND status<500,"Client Error", status>=500,"Server Error")]
[if an event doesn't fit any of the cases no value will be used. If you want to make sure the value is always return from the case function, we add a final condition that evaluates to true]
[index=web sourcetype=access_combined | eval httpCategory=case(status>=200 AND status<300,"Success", status>=300 AND status<400,"Redirect", status>=400 AND status<500,"Client Error", status>=500,"Server Error", true(),"Something Weird Happened")]
Eval commands can be wrapped in transforming commands.
[index=web sourcetype=access_combined | stats count(eval(status<300)) as "Success", count(eval(status>=400 AND status<500)) as "Client Error", count(eval(status>500)) as "Server Error"]
Few things to note about using eval inside transforming commands.
["as" clause is required for transforming commands]
['"' double quotes are required for field values]
[resulting field values are case sensitive]
The search command - can be used to filter results at any time in the search. The command behavious exactly like the search terms before the first pipe but allows to filter your results further down the search pipeline.
[index=network sourcetype=cisco_wsa_squid usage=Violation | stats count(usage) as Visits by cs_username| search Visits > 1]
[Remember: If you can filter events before the first pipe, do it there for better searches]
The Where command - uses the same expression syntax as eval and many of the same functions but filters events to only keeps the results that evaluates to true.
[index=network sourcetype=cisco_wsa_squid } stats count(eval(usage="Personal")) as Personal, count(eval(usage="Business")) as Business by username | where Personal > Business | sort -Personal | where username!="lsagers" | sort -Personal]
[In the real world, Never use a "Where" command when you can filter by search terms]
[inside a "eval" or "where" command asteris(*) cann't be used as wildcard, instead you want to use the like operator with either the "%"(percentage) or "_"(underscore) character]
%(percentage) - character will match multiple characters
_(underscore) - character will match the one
If you want to eval a field to check is null or not use the "isnull" function
[index=sales sourcetype=vendor_sales | timechart sum(price) as sales | where isnull(sales)]
[index=sales sourcetype=vendor_sales | timechart sum(price) as sales | where isnotnull(sales)]
while using "where" clause - when evaluating the value, values are case sensitive
[index=sales sourcetype=vendor_sales | where product_name="final sequel"] --> does not get result
[index=sales sourcetype=vendor_sales | where product_name="Final Sequel"] --> this gets the result as value of product name is case sensiive while using "where" clause
If you use single quote, Splunk will treat the string as a field.
The Fillnull command - It replaces any null values in your events.
If you run a report that includes nulls for some data, your report is displayed with empty field.
[index=sales sourcetype=vendor_sales | chart sum(price) over product_name by VendorCountry | fillnull]
By default, the null values are replaced with 0 "Zero". But by using a "Value" argument any string can be used.
[index=sales sourcetype=vendor_sales | chart sum(price) over product_name by VendorCountry | fillnull value="nothing to see here"]
[The eval "case" function behavios much like "if" function but can take multiple boolean expressions and return the corresponding argument that is true]
[index=web sourcetype=access_combined | eval httpCategory=case(status>=200 AND status<300,"Success")]
[index=web sourcetype=access_combined | eval httpCategory=case(status>=200 AND status<300,"Success", status>=300 AND status<400,"Redirect", status>=400 AND status<500,"Client Error", status>=500,"Server Error")]
[if an event doesn't fit any of the cases no value will be used. If you want to make sure the value is always return from the case function, we add a final condition that evaluates to true]
[index=web sourcetype=access_combined | eval httpCategory=case(status>=200 AND status<300,"Success", status>=300 AND status<400,"Redirect", status>=400 AND status<500,"Client Error", status>=500,"Server Error", true(),"Something Weird Happened")]
Eval commands can be wrapped in transforming commands.
[index=web sourcetype=access_combined | stats count(eval(status<300)) as "Success", count(eval(status>=400 AND status<500)) as "Client Error", count(eval(status>500)) as "Server Error"]
Few things to note about using eval inside transforming commands.
["as" clause is required for transforming commands]
['"' double quotes are required for field values]
[resulting field values are case sensitive]
The search command - can be used to filter results at any time in the search. The command behavious exactly like the search terms before the first pipe but allows to filter your results further down the search pipeline.
[index=network sourcetype=cisco_wsa_squid usage=Violation | stats count(usage) as Visits by cs_username| search Visits > 1]
[Remember: If you can filter events before the first pipe, do it there for better searches]
The Where command - uses the same expression syntax as eval and many of the same functions but filters events to only keeps the results that evaluates to true.
[index=network sourcetype=cisco_wsa_squid } stats count(eval(usage="Personal")) as Personal, count(eval(usage="Business")) as Business by username | where Personal > Business | sort -Personal | where username!="lsagers" | sort -Personal]
[In the real world, Never use a "Where" command when you can filter by search terms]
[inside a "eval" or "where" command asteris(*) cann't be used as wildcard, instead you want to use the like operator with either the "%"(percentage) or "_"(underscore) character]
%(percentage) - character will match multiple characters
_(underscore) - character will match the one
If you want to eval a field to check is null or not use the "isnull" function
[index=sales sourcetype=vendor_sales | timechart sum(price) as sales | where isnull(sales)]
[index=sales sourcetype=vendor_sales | timechart sum(price) as sales | where isnotnull(sales)]
while using "where" clause - when evaluating the value, values are case sensitive
[index=sales sourcetype=vendor_sales | where product_name="final sequel"] --> does not get result
[index=sales sourcetype=vendor_sales | where product_name="Final Sequel"] --> this gets the result as value of product name is case sensiive while using "where" clause
If you use single quote, Splunk will treat the string as a field.
The Fillnull command - It replaces any null values in your events.
If you run a report that includes nulls for some data, your report is displayed with empty field.
[index=sales sourcetype=vendor_sales | chart sum(price) over product_name by VendorCountry | fillnull]
By default, the null values are replaced with 0 "Zero". But by using a "Value" argument any string can be used.
[index=sales sourcetype=vendor_sales | chart sum(price) over product_name by VendorCountry | fillnull value="nothing to see here"]
Module-6
Transaction command – Transaction any group of related events
that span time. These events can come from multiple applications or hosts.
Events related to purchase from Online store, can span
across from application server, database and e-commerce engine.
One email message can create multiple events as it travels
to various queues. Each events in network traffic logs represents a single user
generating a single http request.
Visiting a web-site, normally generates multiple http
request for html, javascript, flash, css files and images, etc.
[index=web sourcetype=access_combined | transaction clientip]
à We get a list of
events that shared the same client IP
[index=web sourcetype=access_combined | transaction clientip
| table clientip, action, product_name]
The Transaction command can create two fields in raw events,
duration and eventcount.
Duration – The duration is the time difference between first
and last event in the transaction.
Eventcount – The eventcount is the number of events in the
transaction.
These fields can be used with statistics and reporting
commands,
[index=web sourcetype=access_combined | transaction clientip
| timechart avg(duration)]
The transaction command includes some definition options,
the most common begin maxspan, maxpause, startswith & ensdwith.
Maxspan – It allows to set of maximum total time between earliest and
latest events.
Maxpause – It allowed maximum total time between events.
Startswith – It allows forming transactions starting with specified: terms,
field values & evaluations
Endswith – It allows forming transactions ending with specified: terms,
field values & evaluations
[index=web sourcetype=access_combined | transaction clientip
startswith=”addtocart” endswith=”purchase” | table clientip, action,
product_name]
Transaction is very incredibly handy when you need to
investigate an item. If you want to see what email are rejected by your email
security device.
[index=network sourcetype=cisco_esa REJECT]
[index=network sourcetype=cisco_esa | transaction mid dcid
icid | search REJECT]
Since “Transaction” are incredibly powerful you might want
them instead of “stats” but there are specific reasons to use one or the other.
Transactions –
01.
Use transaction to see events correlated
together
02.
Use when events need to be grouped on start and
end values
[By default, there is a limit of 1000 events per
transaction]
Stats –
03.
Use stats to see results of a calculation
04.
Use when events need to be grouped on a field
value
05.
Stats command is faster and more efficient, so
when you have choice use “stats”
[Stats does not have any limitation]
Module-7
What is Knowledge Object? – Simple put their tools that
helps you and your users discover and analyze your data. They include,
·
Data interpretation
·
Classification
·
Enrichment
·
Normalization and
·
Search Time Mapping of knowledge called Data
Models
Knowledge object are useful in Splunk for several reasons, it can
be created by one user and it can be shared with other user based on
permissions. They can be saved or reused to multiple peoples or multiple apps
and they can be used in search.
[Knowledge objects are powerful tools for your deployments]
Your Role –
·
Oversee knowledge object creation and usage
·
Implement best practices for naming conventions
·
Normalize data
·
Create data models
[Keeping the tools box (Knowledge Object) clean and
efficient]
Naming conventions –
·
Developing a naming convention will help us and
our users know exactly what each knowledge object does and will help Splunk
tool box uncluttered.
·
Create a Knowledge object with six segmented
keys, Group, Type, Platform, Category, Time and Description.
[OPS_WFA_Network_Security_na_IPshoisAction]
Permissions –
·
Permission playing a major role by creating and
sharing “knowledge objects” in Splunk
·
There are 3 pre-defined ways knowledge objects
can be displayed to users, Private, Specific App & All Apps
·
When user creates an “Object”, by default set to
private and only available to user
·
Power and Admin user are allowed to create
knowledge object that can be shared for all users of an App. They may allow
other roles to edit the object by granting their role with write permissions
·
Admin is the only user role that allowed to make
knowledge objects available to all apps
·
As with shared and app objects, these are
automatically made readable to all users but admin can choose to grant read and
write access per role
·
Admins also can read and edit private objects
created by any role
Manage Knowledge Objects –
·
Knowledge objects can be centrally managed under
the knowledge header in the settings menu
·
User with Admin role will see a “Reassign
knowledge objects” button
CIM Intro –
·
As we mentioned normalizing index data is the
major part of your role is a knowledge manager.
·
In most Splunk deployments, Data comes from
multiple sourcetypes as a result the same values of data can occur under many
different field names
Eg.,
sourcetype=access_combined – field: “clientip”
sourcetype=cisco_wsa_squid – field: “userIP”
·
At search time, we may want to normalize these
different occurrences to a common structure and naming convention. Allowing us
to correlate events from both source types
·
Splunk supports the use of a “Common Information
Model” or CIM to provide methodology for normalizing values to a common field
name
·
CIM uses schema to define standard fields
between sources, We can use knowledge object to help make these connections
Module-8
The Field Extractor – It is a utility allows you to use a graphical
user interface to extract Fields that persist as Knowledge Objects making them
reusable in searches.
There are 2 different methods that field extractor can use
to extract data.
·
Regular expression
·
Delimiters
Regular expression will work well when you have unstructured
data and events that you want to extract fields from. The field extractor will automatically
build Regular Expressions using provided samples.
Delimiters will used when events contain fields separated by
a character.
There are 3 ways to access Field extractor utility,
01.
From the fields menu in the settings
02.
The fields side bar
03.
From the events actions menu
·
The workflow changes depending on how you access
the Field Extractor and which method you choose. The easiest way to extract the
field is using the events actions menu.
Extracting Fields : RegEx – If you do edit the regular
expression, you will not be reture to the field extractor utility after doing
so.
Extract with Delimiter -
Extracting Multiple
Fields – The field extractor is also making it easy extract multiple field from
overlapping values.
Module-9
Field Alias – It will give you a way to normalize the data
over multiple sources. You can assign one or more aliases to any extracted
fields and can apply them to lookups.
Normalizing below sourcetype and the correlating of fields
is “Employee”
Sourcetype=cisco_firewall field=”Username”
Sourcetype=winauthentication_security field=”User”
Calculated Fields – If you find yourself repetitive, long or
complex eval commands calculated field can save your lot of time and headaches.
[Calculated Fields must be based on an extracted or
discovered fields]
[Output Fields from a Lookup Table or fields generated from
within a Search string are not supported]
Module-10
Tag –
Tags in Splunk or Knowledge object that allows you to
designate descriptive names for key-value pairs. They enable you to search for
events that contain particular field values.
[index=web host=www*]
www1 & www2 is in San Francisco
www3 is in London
Will use “tags” to give this host function and location labels
Creating Tags –
We can create tags by clicking on events information link
and clicking the action link for the field value pair we want to tag.
[index=security tag=SF]
[tag values are case sensitive in a search]
Event Types – It allow you to categorize events based on
search terms.
Creating Event Type from search -
Event type builder – An Event type can also be build using
the Event type builder.
When to use Event Types vs Saved Reports, each option has
its own advantages depending on what you need to do with your data.
Event types –
·
Allow you to categorize events based on search
string
·
Use tags to organize your data
·
“eventype” field within a search string
·
Eventtypes don’t include the time range
Saved reports –
·
It used when search criteria is not changed
(Fixed search criteria)
·
When you need to include a time range and
formatting the results
·
When you want to share with other Splunk users
·
When you want to add a report to dashboards
Module-11
Macros – are search strings or portions of search string that can be
reused in multiple places within Splunk. They are useful when you frequent run
searches requiring similar or complicated search syntax.
There are couple of things that make macros like no other
knowledge objects.
·
Macros allow you to store entire search strings
including pipes and eval statements
·
They are time range independent, allowing the
time range to be selected at search time
·
pass arguments to the search
Create Macro –
[index=sale sourcetype=vendor_sales | stats sum(sale_price)
as total_sales by Vendor | eval total_sales = “$” +
tostring(round(total_sales,2),”commas”)]
Settings à
Advanced search à
Add new in Search macros
Destination App: (search)
Macro Name: convertUSD
Definition: {This is the search string that will expand when
referenced – [eval total_sales = “$” +
tostring(round(total_sales,2),”commas”)]}
[index=sales sourcetype=vendor_sales | stats sum(sale_price)
as total_sales by Vendor | `convertUSD`]
{Backticks tells Splunk that this is the macro and to
replace it with the search in the macro definition}
Macro Argument – while this macro has saved as some
keystrokes. The goal should always beat make our macros as reusable as
possible.
List of macros can be seen under, Settings à Advanced search à Search macros
Destination App: (search)
Name: convertUSD(1)
Definition: eval $value$ = “$” +
tostring(round($value$,2),”commas”)
Arguments: value
[index=sales sourcetype=vendor_sales | stats sum(sale_price)
as Total_Sales by Vendor | `convertUSD(Total_Sales)]
-
Macros can be passed with any number
[index=sale sourcetype=vendor_sales | stats sum(sale_price)
as Average_price by product_name | `convertUSD(“Average_price”)]
Multiple Arguments –
Since we are using two string function with the eval command
if we try to sort our result alphanumerically which might not be our desire
result. Lets add another argument that allows users to choose if they want to
convert the currency with the eval or field format command.
Expanding search –
Splunk has a builtin search expansion tool that allows you
to preview your search without running it.
(Ctrl/windows)+shift+E to open a search expansion window
Module-12
Workflow Actions - Let us create links with an events that interact with external resources or narrow down search.
They use the HTTP GET or POST method to pass information to external sources or pass information back to Splunk to perform a secondary search.
Workflow Action - GET Method
To create a workflow actions - Settings --> Fields --> Workflow actions (Add new)
Destination app
Name
Label - "Get WhoIs for $src_ip$ (this label will display in UI when you launch the action)
Apply only to the following fields - src_ip
URI - http//whois.domaintools.com/$src_ip$
Workflow Action - Search
A workflow action can also be used to launch a search.
Settings --> Fields --> Workflow actions
Destination app -
Name -
Label - Find other events for $src_ip$
Apply only to the following fields - src_ip
Apply only to the following event types -
Show action in - Event menu
Action type - search (search will bring-up the search configuration)
Search string - $src_ip$
Run in app - search
Open in view -
Run search in - New window
Module-13
Data Models Intro -
In the fundamental 1 course, you learnt how to use the pivot interface to create reports and dashboards.
Pivot - It allows users to work with Splunk without ever having to understand the Splunk search language.
Data Models - are hierarchically structured datasets. They consist of, Events, Searches & Transactions.
You can think of Data Model is the framework, Pivot is the interface to the data.
Data Model Scenario - Some thought need to go intercreating our data models before the build them.
For Data Models, could you Pivot to search report and segment the data anyway we wanted.
[Any field can be made available to the data model]
We build data set hierarchi's by adding childern data set to the root data set.
Creating Root Datasets - Settings --> Data Models --> New Data Models
Title -
ID -
App - "searching and reporting"
Description -
Add Dataset --> Root Event / Root Search Root Event - It enables you to create hierarchies based on a set of events, and are the most commonly used type of root data model object.
Root Search - It builds these hierarchies from a transforming search. Root search don't benefit from data model acceleration.
[Splunk suggest to avoid using Root search whenever possible] Root Transaction - It objects allow you to create datasets from groups of related events that span time. They use an existing object from our data hierarchy to group on.
Child Objects - It allow us to constrain or narrow down the events in the objects above it in the hierarchical tree.
If you try to create a pivot with the current module, we can only use inhereted fields to split our data which is not very helpful. So we will need some additional fields.
Add fields -
01. Auto-Extracted - attributes are the fields Splunk extracts from our data
02. Eval Expression - is an attribute created by running an eval expression on a field
03. Lookup - atribute is created using lookup tables
04. Regular Expression - allows us to create an attribute using a regular expression on the data
05. Geo IP - attribute is created from Geo IP data in our events
[We select the fields we wanted to display and rename them for the end user]
Transactions with Datasets - Do not benefit from data model acceleration.
Data Models in search -
[It is recommended to use the Pivot UI over the pivot command]
Manage Data Models - Settings --> Data Models
We can edit our data model or explore them in Pivot. We can choose to upload and restore Data model from backup file.
[Accelerating data models can make searches faster and more efficient]
Module-14
CIM - Common information model
01. Demystify CIM
02. Why to make data CIM-compliant
03. How to validate compliance
[Sametype of data can occur as different field names]
sourcetype=access_combined field "clientip"
sourcetype=cisco_wsa_squid field "userIP"
Using a CIM, we could normalize the different occurances (clientip / userIP) to a shared structure "SRC" allowing us to correlate the clientip data with userip data under a shared field name.
Splunk provides the methodology for normalizing values to common field name by supporting the use of CIM.
Using the CIM schema, We can make sure all our data maps to defined method. (Maps all data to defined method)
Sharing a common language for field values. (Normalizes to common language)
You can normalize the data at index time or a search time using knowledge object. (Data can be normalized at index time or search time)
CIM schema should be used for,
* Field extractions
* Aliases
* Event types
* Tags
Knowledge objects can be shared globally across all apps. Allowing us to take advantages of mappings, no matter which apps is using at a time.
Splunk Premium solutions like Splunk Enterprise security rely heavily on data that is CIM compliant when searching data, running reports and creating dashboard.
Splunk provides CIM Add-on its Splunk base that include JSON data model files that help you
* validate indexed data compliance.
* use normalize data in pivots
* and can help improve preformance through data model exceleration
* Add-on is free and no additional indexing. so will not affect license in any-way.
* Add-on is only be installed on search head or a single instance of deployment of Splunk.
* User with the admin role is required to install Add-on
Using CIM with your data
01. Getting Data in
02. Examine Data
03. Tag Events
04. Verify Tag
05. Normalize Fields
06. Validate Against Model
07. Package as Add-on
Settings --> Data Models
Normalizing Data to CIM
Field extractions and lookups can also be used to make fields CIM compliant.
We can search our datamodel using our "datamodel" command.
Workflow Actions - Let us create links with an events that interact with external resources or narrow down search.
They use the HTTP GET or POST method to pass information to external sources or pass information back to Splunk to perform a secondary search.
Workflow Action - GET Method
To create a workflow actions - Settings --> Fields --> Workflow actions (Add new)
Destination app
Name
Label - "Get WhoIs for $src_ip$ (this label will display in UI when you launch the action)
Apply only to the following fields - src_ip
URI - http//whois.domaintools.com/$src_ip$
Workflow Action - Search
A workflow action can also be used to launch a search.
Settings --> Fields --> Workflow actions
Destination app -
Name -
Label - Find other events for $src_ip$
Apply only to the following fields - src_ip
Apply only to the following event types -
Show action in - Event menu
Action type - search (search will bring-up the search configuration)
Search string - $src_ip$
Run in app - search
Open in view -
Run search in - New window
Module-13
Data Models Intro -
In the fundamental 1 course, you learnt how to use the pivot interface to create reports and dashboards.
Pivot - It allows users to work with Splunk without ever having to understand the Splunk search language.
Data Models - are hierarchically structured datasets. They consist of, Events, Searches & Transactions.
You can think of Data Model is the framework, Pivot is the interface to the data.
Data Model Scenario - Some thought need to go intercreating our data models before the build them.
For Data Models, could you Pivot to search report and segment the data anyway we wanted.
[Any field can be made available to the data model]
We build data set hierarchi's by adding childern data set to the root data set.
Creating Root Datasets - Settings --> Data Models --> New Data Models
Title -
ID -
App - "searching and reporting"
Description -
Add Dataset --> Root Event / Root Search Root Event - It enables you to create hierarchies based on a set of events, and are the most commonly used type of root data model object.
Root Search - It builds these hierarchies from a transforming search. Root search don't benefit from data model acceleration.
[Splunk suggest to avoid using Root search whenever possible] Root Transaction - It objects allow you to create datasets from groups of related events that span time. They use an existing object from our data hierarchy to group on.
Child Objects - It allow us to constrain or narrow down the events in the objects above it in the hierarchical tree.
If you try to create a pivot with the current module, we can only use inhereted fields to split our data which is not very helpful. So we will need some additional fields.
Add fields -
01. Auto-Extracted - attributes are the fields Splunk extracts from our data
02. Eval Expression - is an attribute created by running an eval expression on a field
03. Lookup - atribute is created using lookup tables
04. Regular Expression - allows us to create an attribute using a regular expression on the data
05. Geo IP - attribute is created from Geo IP data in our events
[We select the fields we wanted to display and rename them for the end user]
Transactions with Datasets - Do not benefit from data model acceleration.
Data Models in search -
[It is recommended to use the Pivot UI over the pivot command]
Manage Data Models - Settings --> Data Models
We can edit our data model or explore them in Pivot. We can choose to upload and restore Data model from backup file.
[Accelerating data models can make searches faster and more efficient]
Module-14
CIM - Common information model
01. Demystify CIM
02. Why to make data CIM-compliant
03. How to validate compliance
[Sametype of data can occur as different field names]
sourcetype=access_combined field "clientip"
sourcetype=cisco_wsa_squid field "userIP"
Using a CIM, we could normalize the different occurances (clientip / userIP) to a shared structure "SRC" allowing us to correlate the clientip data with userip data under a shared field name.
Splunk provides the methodology for normalizing values to common field name by supporting the use of CIM.
Using the CIM schema, We can make sure all our data maps to defined method. (Maps all data to defined method)
Sharing a common language for field values. (Normalizes to common language)
You can normalize the data at index time or a search time using knowledge object. (Data can be normalized at index time or search time)
CIM schema should be used for,
* Field extractions
* Aliases
* Event types
* Tags
Knowledge objects can be shared globally across all apps. Allowing us to take advantages of mappings, no matter which apps is using at a time.
Splunk Premium solutions like Splunk Enterprise security rely heavily on data that is CIM compliant when searching data, running reports and creating dashboard.
Splunk provides CIM Add-on its Splunk base that include JSON data model files that help you
* validate indexed data compliance.
* use normalize data in pivots
* and can help improve preformance through data model exceleration
* Add-on is free and no additional indexing. so will not affect license in any-way.
* Add-on is only be installed on search head or a single instance of deployment of Splunk.
* User with the admin role is required to install Add-on
Using CIM with your data
01. Getting Data in
02. Examine Data
03. Tag Events
04. Verify Tag
05. Normalize Fields
06. Validate Against Model
07. Package as Add-on
Settings --> Data Models
Normalizing Data to CIM
Field extractions and lookups can also be used to make fields CIM compliant.
We can search our datamodel using our "datamodel" command.
No comments:
Post a Comment