SPLUNK



Splunk:
- Splunk is a Log Analyzing and Monitoring tool which communicates with the different log files and stores file’s data in the form of events into local Indexes
- Splunk has the capability to show data in different forms of dashboard which is useful for the application users and higher leadership

Components of Splunk:
- Universal Forwarder (UF)
- Heavy Forwarder (HF)
- Indexer (IDX)
- Search Head (SH)
- Deployment Server (DS)
- License Master (LM)

Splunk Features:
- Reporting
- Monitoring
- Log Analysis
- Alerting
- Dashboard

Splunk Enterprise:
- Splunk Enterprise collects, analyzes and acts on the value of the data generated by technology infrastructure, security and business applications
- It gives the insights to drive operational performance and business results

Splunk Cloud:
- Splunk cloud delivers all the features of Splunk Enterprise, as a cloud-based service
- The platform provides access to Splunk Enterprise Security and the Splunk App for AWS and it enables centralized visibility across cloud, hybrid and on-premises environments

Splunk Light:
- Splunk Light is a solution for small IT environments that automates log search and analysis.
- It speeds troubleshooting by gathering real-time log data from your distributed applications and infrastructure in one place to enable powerful searches, dynamic dashboards, alerts and reporting for real-time analysis all at an attractive price well within the budget

Four stages of splunk include,
– Accepts any text data as input
– Parses the data into events [rows in the database tables]
– Stores events in indexes [table in the relational DB format]
– Searches and reports

{Forwarder}
– Collects data from data source & forwards to indexer

{Indexer}
– Receives data from data source and do indexes the data
– It validates the license before indexing

{Search Head}
– Do searching the data from indexers and provides a report

Best practices:-
– Do not run splunk as super-user
– Create a user account that is used to run splunk
+ For input, splunk must be able to access data sources
+ On *Nix, non-root accounts cannot access port < 1024
+ On Windows,
– Use a domain account if splunk has to connect to other servers
– Otherwise, use a local machine account that can run services
+ Make sure the splunk account can access scripts used for inputs and alerts
– Splunk searches depend on accurate time
+ Correct event time stamping is essential
– It is imperative that your splunk indexer and production servers have standardized time configuration
+ Clock skew between hosts can affect search results

--------------------------------------------------------------------------------------------------------------------

SPLUNK SCALES:

Data processing:-
Input:- → Indexer/HF/UF
- Data from network/file/scripted input
- Data broken into 64k blocks
- Annotation of each block with host/source/source type/character encoding

Parsing:- → Indexer/HF
- Event line breaking
- Aggregation for multiline event
- Regex replacement
- Event wise host/source/source type annotation
- Time stamping events

Indexing:- → Indexer
- Parsed event data written into disk/index

Search:- → Indexer/SH
- Search on indexed data using SPL
- Knowledge object binding

SPLUNK FUNDAMENTAL 2



Fundamental-2
Module-1
Introduction
Fundamentals One Refresher -
Splunk Search Terms:
01. Keywords []
02. Booleans [Boolean operator must be uppercase]
03. Phrases [Exact phrases can be searched by placing the keyword in quotes]
04. Fields [We can also search on an extracted field by typing a field value pair into the search. Field names are case sensitive. Field values are not]
05. Wildcards
[Wildcards can be used at any point in keyword text and fields]
[Using a wildcard at the begining of the keyword or field is very inefficient]
06. Comparisons [Comparison operator can be used to define events.]
[Supported operators are, =Equal, !=Not Equal, <Less than, <=Less than or equal to, >Greater than, >=Greater than or equal to]
 
Commonly used commands are, fields command allows you to include or exclude specific fields from search results.
[sourcetype=access_combined | fields clientip, action]

table command returns specified fields in a table format.
[sourcetype=access_combined | table clientip, action]

rename command, can be used to rename fields.
[sourcetype=access_combined | rename clientip as "userip"]
dedup command, removes duplicate events from results that share common values.
[sourcetype=access_combined | dedup clients]

sort command allows you to display your results in ascending or descending order.
[sourcetype=access_combined | sort - price]

lookup command adds field values from external sources.
[sourcetype=access_combined | lookup dnslookup clientip]

transforming commands, are used to order search results into a data table that Splunk can use for statistical purposes. They are required to transform search results into a visualization.

top & rare, with top and rare allowing you to quickly find most common and rare values in a result set.
Stats, for producing statistical information from our search results.

Module-2
Beyond basic search fundamentals
[If a command references a specific value, that value will be case sensitive]

[eg., replace command]
{sourcetype=access_combined purchase | replace www1 with server1 in host}

[Field values from a Lookup are case sensitive by default. A User within Admin roles can choose for values to be case insensitive when creating lookup table but best to assume that this is not the case when searching.]
[Boolean operator are case sensitive. If boolean operator is not supplied with Upper case it is seen as literal keyword]
[when searching using tag, tag values is case sensitive]
[When using regex with commands, the regex terms needs to follow define character clause case sensitivity]

[
Buckets]
01.
When Splunk inguest data that will be stored in Bucket. 02. Bucket are directories containing set of, Raw data and Indexing data.
03. Buckets have configurable with Maximum size and Maximum time span.
04. There are three kinds of searchable buckets in Splunk, Hot, Warm and Cold.

Hot - As events are indexed, they are placed in Hot buckets. Hot buckets are the only writeable buckets.
Hot bucket rolls to warm bucket when,
- Maximum size reached
- Time span reached
- Indexer is restarted

Warm - Upon rolling, bucket is closed, renamed and changed to "read only" status.
Warm buckets are renamed displaying time stamps with youngest and oldest events in the bucket.
Warm bucket rolls to cold bucket when,
- Maximum size reached
- Time span reached

Cold - The bucket is typically stored in different location then Hot and Warm buckets. This allows them to be stored on a slower, cost-effectivie infrastructure.

Using Wildcards -
01. Wildcards are tested after all other search terms.
02. Only trailing wildcards make efficient use of index.
[sourcetype=access*]
03. Wildcards at the beginning of a string cause Splunk to search all events.
04. Wildcards in the middle of a string produce inconsistent results.
05. Avoid using wildcards to match punctuation.
06. Be as specific as possible in search terms.


Search Modes -
Knowing when to use appropriate search mode can help your search more efficient or allow better access to your data for discovery.

Fast Mode - It emphasis performance, only returns essential data. when running non-transforming search in this mode only fields required for the search are extracted and displayed in events. As with all non-transforming commands, statistics and visualization are not available but patterns are. If we run a transforming commands events and patterns are no longer return but we have access to statistics and visualizations.

Verbose Mode - It emphasis completeness by returning all field and event data. If we run a non-transforming search in this mode, we get events and patterns (same like, Fast Mode) but all field for the events are extracted and displayed in the side bar. If we run a transforming search in this mode, we can access to see statistics and visualizations, but we can also see patterns and events.

Smart Mode - It is designed to return the best results for the search being run using a combination of both fast and verbose modes. If we use a non-transforming search, it acts like verbose mode, returning all fields for events and access to patterns. If we use transforming commands, it will act like fast mode.

General Best Practices,
01. The less data you have to search, the faster Splunk will be.
02. Fields extracted at index time do not need to be extracted for each search. (time, index, source, host and sourcetype)
03. Inclusion is generally better than exclusion. (searching for "access denied" is better than Not "access-granted")

Use the appropriate search mode,
Fast mode for performance
Verbose mode for completeness
Smart mode for the combination of both

Search Job Inspector,
you might have times to tune a search to get more efficient search. The search job inspector is a tools that can be used to troubleshoot performance of searches and determines which phase of a search takes the most time. It dissects behavior of searches to help understand costs of knowledge objects, search commands and other components with in the search. Any search job that has not expired can be inspected.

Module-3
Splunk will allow you to visualize your data in many ways. Any search that returns statistical values can be viewed as a chart. Most visualization requires results structured its tables with at least two columns.

The chart command can take two clause statements (over & by).
Over - It tells Splunk whcih field you want to be on the X axis.
[Any stats function can be applied to the chart command.]
[index=web sourcetype=access_combined status>299 | chart count over status]
status is the x-axis and count is the y-axis. The y-axis is always to be numeric, so that it can be charted.
By - The "by" clause comes into play when we want to split our data by an additional field.
[index=web sourcetype=access_combined status>299 | chart count over status by host]

unlike the stats command, only one value can be specified after the "by" modifier when using the "over" clause. If two "by" clause is used without the "over" clause, the first field is used as the "over" clause.

[index=web sourcetype=access_combined status>299 | chart count by status, host]
[index=web sourcetype=access_combined status>299 | chart count over host by product_name]
[index=web sourcetype=access_combined status>299 | chart count over host by product_name usenull=false] --> to remove null from our data
[index=web sourcetype=access_combined status>299 product_name=* | chart count over host by product_name] --> it does remove null values in the intial search which is most efficient

{The chart command by default is limited to 10 columns, others can be included with the limit argument, by default showup as other in your events}
[index=web sourcetype=access_combined status>299 product_name=* | chart count over host by product_name useother=false] --> to remove field "other" in the column
[index=web sourcetype=access_combined status>299 product_name=* | chart count over host by product_name limit=5] --> to display the number of product to be showened
[index=web sourcetype=access_combined status>299 product_name=* | chart count over host by product_name limit=0] --> using "limit=0" to display all of the product

Timechart command - Perfoms stats aggregations against time. Time is always the X axis.
[index=sales sourcetype=vendor_sale | timechart count]
[index=sales sourcetype=vendor_sale | timechart count by product_name]

As with chart, any stats function can be applied with timechart command and only one value can be specified after the "by" modifier. The limit, useother and usenull also available to timechart. The timechart command intellegently clusters data in time intervals dependent on the time range selector.
To change the span of time of the cluster, you can use the argument of span with the time to group by.
[index=sales sourcetype=vendor_sales | timechart span=12hr sum(price) by product_name limit=0]

We may want to compare data over specific time periods and Splunk provide the command "timewrap".
[index=sale sourcetype=vendor_sales product_name="Dream Crusher" | timechart span=1d sum(price) by product_name | timewrap 7d | rename _time as Day | eval Day = strftime(Day, "%A")]

Line graph -
Chart overlay - will allow you to lay a line chart of one series over another visualization.
[index=main (sourcetype=access_combined action=purchase status=200) OR sourcetype=vendor_sales | timechart sum(price) by sourcetype | rename access_combined as "web_sales"]

Area chart - Differences of Line graph and Area formatting is the ability to show the stack.
Column chart - It also allows you to stack data.
Bar graph - uses horizontal bars to show comparision and can be stacked.
Pie chart - It takes the data and visualizes the percentage for each slice.
Scatter chart - It show the relationship between two discret data values, plotted on a X & Y axis.
Bubble chart - We can add more versility by adding a bubble chart. This provides a visual way to view a third dimention of data. Each bubble plots against two dimentions of X & Y axis. The size of the bubble represents the value for the third dimention.                

Trellis layout - It allow us to split our visualization by a selected field or aggregation. While we get multiple visualization, the originating search is only run once.
[Additional visualizations can be downloaded from Splunk base]

Module-4
There are several options for representing a data that includes Geographical information.
IPlocation - It is used to lookup and add a location information to events. Data searches city, country, region, latitude and longitude can be added to events that include external ipaddress.
[index=security sourcetype=linux_secure action=success src_ip!=10.* | iplocation src_ip]

Depending on the IP, not all location ip information might be available for it. This is a nature of geolocation and should be taken to consideration when searching your data.

If you are collecting Geographical data, you can use the Geostats command to aggregate the data for use on a map visualization. The Geostats command uses the same functions as the stats command.
[index=sales sourcetype=vendor_sales | geostats latfield=VendorLatitude longfield=VendorLongitude count]
[index=sales sourcetype=vendor_sales | geostats latfield=VendorLatitude longfield=VendorLongitude count by product_name]

Unlike the stats command, the Geostats command only accepts one "by" argument. To control the column count, the globallimit argument can be used.
[index=sales sourcetype=vendor_sales | geostats latfield=VendorLatitude longfield=VendorLongitude count by product_name globallimit=4]
[You can lookup Geographical data to use with Geostats using the Iplocation command]
[index=sales sourcetype=linux_secure action=success src_ip!=10.* | iplocation src_ip | geostats latfield=lat longfield=lon count]
Choropleth map - It is another way to see your data as a Geographical visualization. They allow us to use shawding to show relative matrix over predefined locations of a map.
[In order to use Choropleth, you need a .kmz or compressed Keyhole Markup Language File that defines region boundries]

To prepare our events with Choropleth, we use the Geom command. It adds the field with geographical data structures matching polygons on map.
[index=sales sourcetype=vendor_sales VendorID>=5000 AND VendorID<=5055 | stats count as Sales by VendorCountry | geom geo_countries featureIdField=VendorCountry]

::geo_countries - name of the kmz file/also known as the featureCollection::
::featureIdField is also required::

Single value visualization - When the result contains single value, there are two different types of visualizations, you can use to display them.
You can pipe the events into the gauge command,
[index=web sourcetpe=access_combined action=purchase | stats sum(price) as total | gauge total 0 30000 60000 70000]
[Once the color range format is set, it stays persistent over the radio, filler or marker gauges]

The Trendline command computes moving averages of field values. Giving you clear understanding of how your data is trending.
[index=web sourcetype=access_combined action=purchase status=200 | timechart sum(price) as sales | trendline wma2(sales) as trend]
trendline command requires three arguments,
Trendtype:
- simple moving average / sma
- exponential moving average / ema
- weighted moving average / wma
"sma/ema/wma", computes the sum of data points over a period of time. The wma and ema assign a heavier weighting to more current data points.

number "2", will average the data points on every two days.
field "sales", we need to define a field to calculate the trend from

Addtotals command - It computes the sum of all numeric fields for each event and create a total column.
[index=web sourcetype=access_combined file=* | chart sum(bytes) over host by file | addtotals col=true label="Total" labelfield="host" fieldname="Total by host" row=false]

col - We can create a column summary by setting a “col” variable to true
label - Row is created and it is not labeled. we add a label by setting the "label" variable with the name to use
labelfield - The "labelfield" variable with the field to show the label-in
fieldname - We can change the label for our Total using the "fieldname" variable
row - It is used to remove the field by setting the "row" variable to false

Module-5
Eval
command - It is used to calculate and manipulate field values, Arithmetic, Concatenation & Boolean operators are supported by the command. Results can be written to new field or replace existing field. Field values are created by "eval" command is case sensitive.

[index=network sourcetype=cisco_wsa_squid | stats sum(sc_bytes) as Bytes by usage | eval bandwidth = Bytes/1024/1024]
[index=network sourcetype=cisco_wsa_squid | stats sum(sc_bytes) as Bytes by usage | eval bandwidth = rount(Bytes/1024/1024,2)]
[index=network sourcetype=cisco_wsa_squid | stats sum(sc_bytes) as Bytes by usage | eval bandwidth = rount(Bytes/1024/1024,2) | sort -bandwidth | rename bandwidth as "Bandwidth (MB)" | fields - Bytes]
Along with conversion values, the Eval command allows to perform Mathematical functions against fields with numerical values.
[index=web sourcetype=access_c* product_name=* action=purchase | stats sum(price) as total_list_price, sum(sale_price) as total_sale_price by product_name | eval discount = round(((total_sal_price - total_list_price) / total_list_price)*100) | sort - discount | eval discount = discount."%"]

Convert values with Eval command -
Tostring function - It converts numerical values to strings. The tostring function also allows formating of strings, this allows to format for time, hexadecimal numbers and commas.
[index=web sourcetype=access* product_name=* action=purchase | stats sum(price) as total_list_price, sum(sale_price) as total_sale_price by product_name | eval total_list_price = "$" + tostring(total_list_price,"commas")]
[After use tostring function, the field values might not sort numerically because are now ascii values]
 
The fieldformat command - It can be used to format values but not changed the characteristics of underlying values. It uses the same functions as the eval command.
[index=web sourcetype=access* product_name=* action=purchase | stats sum(price) as total_list_price, sum(sale_price) as total_sale_price by product_name | eval total_list_price = "$" + tostring(total_list_price,"commas") | fieldformat total_sale_price = "$" + tostring(total_list_price,"commas")]

[fieldformat can be sorted numerically thats because fieldformat is happenning at display level without changing the underlying data]
[While eval creates new field values, the underlaying data in the index does not change]

Multiple Eval commands can be used in the search, since eval creates a new field, its subsequent command can reference the results of the eval commands that come before it.
[index=web sourcetype=access_combined price=* | stats values(price) as list_price, values(sale_price) as sale_price by product_name | eval current_discount=rount(((list_price - sale_price)/list_price) * 100) | eval new_discount = (current_discount -5) | eval new_sale_price = list_price - (list_price * (new_discount/100)) | eval price_change_revenue = (new_sale_price - sale_price)]

[The eval command has "if" function allows you to evaluate an argument and create defined values for fields depending on evaluate is true or false]
if(x,y,z)
x--boolean expression
y--it executes if x is true
z--it executes if x is false

[y & z must be in double quotes if not numerical]
 
[index=sales sourcetype=vendor_sales | eval SalesTerritory = if(VendorID < 4000,"North America","Rest of the World") | stats sum(price) as TotalRevenue by SalesTerritory]
[The eval "case" function behavios much like "if" function but can take multiple boolean expressions and return the corresponding argument that is true]
[index=web sourcetype=access_combined | eval httpCategory=case(status>=200 AND status<300,"Success")]
[index=web sourcetype=access_combined | eval httpCategory=case(status>=200 AND status<300,"Success", status>=300 AND status<400,"Redirect", status>=400 AND status<500,"Client Error", status>=500,"Server Error")]

[if an event doesn't fit any of the cases no value will be used. If you want to make sure the value is always return from the case function, we add a final condition that evaluates to true]
[index=web sourcetype=access_combined | eval httpCategory=case(status>=200 AND status<300,"Success", status>=300 AND status<400,"Redirect", status>=400 AND status<500,"Client Error", status>=500,"Server Error", true(),"Something Weird Happened")]

Eval commands can be wrapped in transforming commands.
[index=web sourcetype=access_combined | stats count(eval(status<300)) as "Success", count(eval(status>=400 AND status<500)) as "Client Error", count(eval(status>500)) as "Server Error"]

Few things to note about using eval inside transforming commands.
["as" clause is required for transforming commands]
['"' double quotes are required for field values]
[resulting field values are case sensitive]

The search command - can be used to filter results at any time in the search. The command behavious exactly like the search terms before the first pipe but allows to filter your results further down the search pipeline.
[index=network sourcetype=cisco_wsa_squid usage=Violation | stats count(usage) as Visits by cs_username| search Visits > 1]

[Remember: If you can filter events before the first pipe, do it there for better searches]

The Where command - uses the same expression syntax as eval and many of the same functions but filters events to only keeps the results that evaluates to true.

[index=network sourcetype=cisco_wsa_squid } stats count(eval(usage="Personal")) as Personal, count(eval(usage="Business")) as Business by username | where Personal > Business | sort -Personal | where username!="lsagers" | sort -Personal]
[In the real world, Never use a "Where" command when you can filter by search terms]
[inside a "eval" or "where" command asteris(*) cann't be used as wildcard, instead you want to use the like operator with either the "%"(percentage) or "_"(underscore) character]
%(percentage) - character will match multiple characters
_(underscore) - character will match the one

If you want to eval a field to check is null or not use the "isnull" function
[index=sales sourcetype=vendor_sales | timechart sum(price) as sales | where isnull(sales)]
[index=sales sourcetype=vendor_sales | timechart sum(price) as sales | where isnotnull(sales)]

while using "where" clause - when evaluating the value, values are case sensitive
[index=sales sourcetype=vendor_sales | where product_name="final sequel"] --> does not get result
[index=sales sourcetype=vendor_sales | where product_name="Final Sequel"] --> this gets the result as value of product name is case sensiive while using "where" clause

If you use single quote, Splunk will treat the string as a field.

The Fillnull command - It replaces any null values in your events.
If you run a report that includes nulls for some data, your report is displayed with empty field.
[index=sales sourcetype=vendor_sales | chart sum(price) over product_name by VendorCountry | fillnull]

By default, the null values are replaced with 0 "Zero". But by using a "Value" argument any string can be used.
[index=sales sourcetype=vendor_sales | chart sum(price) over product_name by VendorCountry | fillnull value="nothing to see here"]
 
Module-6
Transaction command – Transaction any group of related events that span time. These events can come from multiple applications or hosts.
Events related to purchase from Online store, can span across from application server, database and e-commerce engine.
One email message can create multiple events as it travels to various queues. Each events in network traffic logs represents a single user generating a single http request.
Visiting a web-site, normally generates multiple http request for html, javascript, flash, css files and images, etc.

[index=web sourcetype=access_combined | transaction clientip] à We get a list of events that shared the same client IP
[index=web sourcetype=access_combined | transaction clientip | table clientip, action, product_name]

The Transaction command can create two fields in raw events, duration and eventcount.
Duration – The duration is the time difference between first and last event in the transaction.
Eventcount – The eventcount is the number of events in the transaction.
These fields can be used with statistics and reporting commands,
[index=web sourcetype=access_combined | transaction clientip | timechart avg(duration)]

The transaction command includes some definition options, the most common begin maxspan, maxpause, startswith & ensdwith.
Maxspan – It allows to set of maximum total time between earliest and latest events.
Maxpause – It allowed maximum total time between events.
Startswith – It allows forming transactions starting with specified: terms, field values & evaluations
Endswith – It allows forming transactions ending with specified: terms, field values & evaluations

[index=web sourcetype=access_combined | transaction clientip startswith=”addtocart” endswith=”purchase” | table clientip, action, product_name]
Transaction is very incredibly handy when you need to investigate an item. If you want to see what email are rejected by your email security device.
[index=network sourcetype=cisco_esa REJECT]
[index=network sourcetype=cisco_esa | transaction mid dcid icid | search REJECT]

Since “Transaction” are incredibly powerful you might want them instead of “stats” but there are specific reasons to use one or the other.
Transactions
01.   Use transaction to see events correlated together
02.   Use when events need to be grouped on start and end values
[By default, there is a limit of 1000 events per transaction]
Stats
03.   Use stats to see results of a calculation
04.   Use when events need to be grouped on a field value
05.   Stats command is faster and more efficient, so when you have choice use “stats”
[Stats does not have any limitation]

Module-7
What is Knowledge Object? – Simple put their tools that helps you and your users discover and analyze your data. They include,
·         Data interpretation
·         Classification
·         Enrichment
·         Normalization and
·         Search Time Mapping of knowledge called Data Models
Knowledge object are useful in Splunk for several reasons, it can be created by one user and it can be shared with other user based on permissions. They can be saved or reused to multiple peoples or multiple apps and they can be used in search.
[Knowledge objects are powerful tools for your deployments]
Your Role –
·         Oversee knowledge object creation and usage
·         Implement best practices for naming conventions
·         Normalize data
·         Create data models
[Keeping the tools box (Knowledge Object) clean and efficient]

Naming conventions –
·         Developing a naming convention will help us and our users know exactly what each knowledge object does and will help Splunk tool box uncluttered.
·         Create a Knowledge object with six segmented keys, Group, Type, Platform, Category, Time and Description.
[OPS_WFA_Network_Security_na_IPshoisAction]

Permissions –
·         Permission playing a major role by creating and sharing “knowledge objects” in Splunk
·         There are 3 pre-defined ways knowledge objects can be displayed to users, Private, Specific App & All Apps
·         When user creates an “Object”, by default set to private and only available to user
·         Power and Admin user are allowed to create knowledge object that can be shared for all users of an App. They may allow other roles to edit the object by granting their role with write permissions
·         Admin is the only user role that allowed to make knowledge objects available to all apps
·         As with shared and app objects, these are automatically made readable to all users but admin can choose to grant read and write access per role
·         Admins also can read and edit private objects created by any role
Manage Knowledge Objects –
·         Knowledge objects can be centrally managed under the knowledge header in the settings menu
·         User with Admin role will see a “Reassign knowledge objects” button
CIM Intro
·         As we mentioned normalizing index data is the major part of your role is a knowledge manager.
·         In most Splunk deployments, Data comes from multiple sourcetypes as a result the same values of data can occur under many different field names
Eg.,
sourcetype=access_combined – field: “clientip”
sourcetype=cisco_wsa_squid – field: “userIP”
·         At search time, we may want to normalize these different occurrences to a common structure and naming convention. Allowing us to correlate events from both source types
·         Splunk supports the use of a “Common Information Model” or CIM to provide methodology for normalizing values to a common field name
·         CIM uses schema to define standard fields between sources, We can use knowledge object to help make these connections

Module-8
The Field Extractor – It is a utility allows you to use a graphical user interface to extract Fields that persist as Knowledge Objects making them reusable in searches.
There are 2 different methods that field extractor can use to extract data.
·         Regular expression
·         Delimiters
Regular expression will work well when you have unstructured data and events that you want to extract fields from. The field extractor will automatically build Regular Expressions using provided samples.
Delimiters will used when events contain fields separated by a character.
There are 3 ways to access Field extractor utility,
01.   From the fields menu in the settings
02.   The fields side bar
03.   From the events actions menu

·         The workflow changes depending on how you access the Field Extractor and which method you choose. The easiest way to extract the field is using the events actions menu.
Extracting Fields : RegEx – If you do edit the regular expression, you will not be reture to the field extractor utility after doing so.
Extract with Delimiter -
 Extracting Multiple Fields – The field extractor is also making it easy extract multiple field from overlapping values.

Module-9
Field Alias – It will give you a way to normalize the data over multiple sources. You can assign one or more aliases to any extracted fields and can apply them to lookups.
Normalizing below sourcetype and the correlating of fields is “Employee”
Sourcetype=cisco_firewall field=”Username”
Sourcetype=winauthentication_security field=”User”

Calculated Fields – If you find yourself repetitive, long or complex eval commands calculated field can save your lot of time and headaches.
[Calculated Fields must be based on an extracted or discovered fields]
[Output Fields from a Lookup Table or fields generated from within a Search string are not supported]

Module-10
Tag
Tags in Splunk or Knowledge object that allows you to designate descriptive names for key-value pairs. They enable you to search for events that contain particular field values.
[index=web host=www*]
www1 & www2 is in San Francisco
www3 is in London
Will use “tags” to give this host function and location labels

Creating Tags –
We can create tags by clicking on events information link and clicking the action link for the field value pair we want to tag.
[index=security tag=SF]
[tag values are case sensitive in a search]

Event Types – It allow you to categorize events based on search terms.
Creating Event Type from search -
Event type builder – An Event type can also be build using the Event type builder.

When to use Event Types vs Saved Reports, each option has its own advantages depending on what you need to do with your data.
Event types
·         Allow you to categorize events based on search string
·         Use tags to organize your data
·         “eventype” field within a search string
·         Eventtypes don’t include the time range
Saved reports
·         It used when search criteria is not changed (Fixed search criteria)
·         When you need to include a time range and formatting the results
·         When you want to share with other Splunk users
·         When you want to add a report to dashboards

Module-11
Macros – are search strings or portions of search string that can be reused in multiple places within Splunk. They are useful when you frequent run searches requiring similar or complicated search syntax.
There are couple of things that make macros like no other knowledge objects.
·         Macros allow you to store entire search strings including pipes and eval statements
·         They are time range independent, allowing the time range to be selected at search time
·         pass arguments to the search
Create Macro –
[index=sale sourcetype=vendor_sales | stats sum(sale_price) as total_sales by Vendor | eval total_sales = “$” + tostring(round(total_sales,2),”commas”)]
Settings à Advanced search à Add new in Search macros
Destination App: (search)
Macro Name: convertUSD
Definition: {This is the search string that will expand when referenced – [eval total_sales = “$” + tostring(round(total_sales,2),”commas”)]}
[index=sales sourcetype=vendor_sales | stats sum(sale_price) as total_sales by Vendor | `convertUSD`]
{Backticks tells Splunk that this is the macro and to replace it with the search in the macro definition}

Macro Argument – while this macro has saved as some keystrokes. The goal should always beat make our macros as reusable as possible.
List of macros can be seen under, Settings à Advanced search à Search macros
Destination App: (search)
Name: convertUSD(1)
Definition: eval $value$ = “$” + tostring(round($value$,2),”commas”)
Arguments: value
[index=sales sourcetype=vendor_sales | stats sum(sale_price) as Total_Sales by Vendor | `convertUSD(Total_Sales)]
-          Macros can be passed with any number
[index=sale sourcetype=vendor_sales | stats sum(sale_price) as Average_price by product_name | `convertUSD(“Average_price”)]
Multiple Arguments –
Since we are using two string function with the eval command if we try to sort our result alphanumerically which might not be our desire result. Lets add another argument that allows users to choose if they want to convert the currency with the eval or field format command.
Expanding search –
Splunk has a builtin search expansion tool that allows you to preview your search without running it.
(Ctrl/windows)+shift+E to open a search expansion window

Module-12
Workflow Actions
- Let us create links with an events that interact with external resources or narrow down search.
They use the HTTP GET or POST method to pass information to external sources or pass information back to Splunk to perform a secondary search.

Workflow Action - GET Method
To create a workflow actions - Settings --> Fields --> Workflow actions (Add new)

Destination app
Name
Label - "Get WhoIs for $src_ip$ (this label will display in UI when you launch the action)
Apply only to the following fields - src_ip
URI - http//whois.domaintools.com/$src_ip$

Workflow Action - Search
A workflow action can also be used to launch a search.
Settings --> Fields --> Workflow actions
Destination app -
Name -
Label - Find other events for $src_ip$
Apply only to the following fields - src_ip
Apply only to the following event types -
Show action in - Event menu
Action type - search (search will bring-up the search configuration)

Search string - $src_ip$
Run in app - search
Open in view -
Run search in - New window

Module-13
Data Models Intro
-
In the fundamental 1 course, you learnt how to use the pivot interface to create reports and dashboards.

Pivot - It allows users to work with Splunk without ever having to understand the Splunk search language.
Data Models - are hierarchically structured datasets. They consist of, Events, Searches & Transactions.
You can think of Data Model is the framework, Pivot is the interface to the data.
Data Model Scenario - Some thought need to go intercreating our data models before the build them.
For Data Models, could you Pivot to search report and segment the data anyway we wanted.
[Any field can be made available to the data model]
We build data set hierarchi's by adding childern data set to the root data set.
Creating Root Datasets - Settings --> Data Models --> New Data Models
Title -
ID -
App - "searching and reporting"
Description -

Add Dataset --> Root Event / Root Search
Root Event - It enables you to create hierarchies based on a set of events, and are the most commonly used type of root data model object.
Root Search - It builds these hierarchies from a transforming search. Root search don't benefit from data model acceleration.

[Splunk suggest to avoid using Root search whenever possible]
Root Transaction - It objects allow you to create datasets from groups of related events that span time. They use an existing object from our data hierarchy to group on.
Child Objects - It allow us to constrain or narrow down the events in the objects above it in the hierarchical tree.

If you try to create a pivot with the current module, we can only use inhereted fields to split our data which is not very helpful. So we will need some additional fields.
Add fields -
01. Auto-Extracted - attributes are the fields Splunk extracts from our data
02. Eval Expression - is an attribute created by running an eval expression on a field
03. Lookup - atribute is created using lookup tables
04. Regular Expression - allows us to create an attribute using a regular expression on the data
05. Geo IP - attribute is created from Geo IP data in our events

[We select the fields we wanted to display and rename them for the end user]
Transactions with Datasets - Do not benefit from data model acceleration.
Data Models in search -
[It is recommended to use the Pivot UI over the pivot command]

Manage Data Models - Settings --> Data Models
We can edit our data model or explore them in Pivot. We can choose to upload and restore Data model from backup file.
[Accelerating data models can make searches faster and more efficient]

Module-14
CIM - Common information model
01. Demystify CIM
02. Why to make data CIM-compliant
03. How to validate compliance

[Sametype of data can occur as different field names]
sourcetype=access_combined field "clientip"
sourcetype=cisco_wsa_squid field "userIP"

Using a CIM, we could normalize the different occurances (clientip / userIP) to a shared structure "SRC" allowing us to correlate the clientip data with userip data under a shared field name.
Splunk provides the methodology for normalizing values to common field name by supporting the use of CIM.
Using the CIM schema, We can make sure all our data maps to defined method. (Maps all data to defined method)
Sharing a common language for field values. (Normalizes to common language)
You can normalize the data at index time or a search time using knowledge object. (Data can be normalized at index time or search time)

CIM schema should be used for,
* Field extractions
* Aliases
* Event types
* Tags

Knowledge objects can be shared globally across all apps. Allowing us to take advantages of mappings, no matter which apps is using at a time.
Splunk Premium solutions like Splunk Enterprise security rely heavily on data that is CIM compliant when searching data, running reports and creating dashboard.

Splunk provides CIM Add-on its Splunk base that include JSON data model files that help you
* validate indexed data compliance.
* use normalize data in pivots
* and can help improve preformance through data model exceleration
* Add-on is free and no additional indexing. so will not affect license in any-way.
* Add-on is only be installed on search head or a single instance of deployment of Splunk.
* User with the admin role is required to install Add-on

Using CIM with your data
01. Getting Data in
02. Examine Data
03. Tag Events
04. Verify Tag
05. Normalize Fields
06. Validate Against Model
07. Package as Add-on

Settings --> Data Models
Normalizing Data to CIM
Field extractions and lookups can also be used to make fields CIM compliant.
We can search our datamodel using our "datamodel" command.