SPLUNK FUNDAMENTAL 1



Fundamentals-1:
Module-2
What is Splunk?
01.   Index data
02.   Search & Investigate
03.   Add Knowledge
04.   Monitor & Alert
05.   Report & Analyze
Components?
01.   Indexer – Indexers process incoming machine data storing the results in index as events. As the indexer indexes data, it creates the number of files organized the sets of directories by age.
02.   Search Head – The searches allows users to use the Splunk search language to search the index data. Search Head handles search request from users and distributes the request to the indexers, which perform the actual search on the data. Search heads then consolidate and in rich the results from indexers before return them to user. Search head also provides users with various tools, such as dashboards, reports and visualizations to a sys the search experience.
03.   Forwarder – Forwarders are Splunk Enterprise Instances, they consume data and forward to indexers for processing. In most Splunk deployments, forwarders serve as the primary way data is supplied for indexing.
Splunk Deployments and Scaling?
In Single-Instance Deployments,
01.   Input
02.   Parsing
03.   Indexing
04.   Searching
Perfect environment for,
·         Proof of concept
·         Personal use
·         Learning
·         Might serve the needs of small, department-sized environments

Module-3
Splunk installation on Linux, Windows, OSX & Splunk Cloud
Splunk Apps and Roles –
01.   Apps - The Apps you see are defined by a user with an administrator role.
02.   Roles - Determine what a user can see, do and interact with.
Three main roles in Splunk Enterprise,
01.   Administrator Role – Can install apps, and create knowledge objects for all users.
02.   Power Role – Can create and share knowledge objects for users of an app and do real time searches.
03.   User Role – Will only see their own knowledge objects and those shared with them.

Module-4
Types of Data Input:
Monitor option – Allow us to monitor files, directories, HTTP events, network ports or data gathering scripts located on Enterprise Splunk instances.
If we are in windows environment, we would also see options to monitor windows specific data. This includes,
01.   Event logs
02.   File system changes
03.   Active directory
04.   Network information (Both local & remote machine)
Forward option – We can receive data from external forwarders. Forwarders are installed in remote machines to gather data and forwarded to an indexer over a receiving port. In most production environments, forwarders will be used as your main source of data input.
[Indexes or Directories where data is stored]
[Havin separate indexes can make your searches more effective]
[Limits data amount Splunk searches & returns events only from that index]
[Multiple indexes also allow to limiting access by user role. Admin can control who sees what data]
[Retains of data – Separate indexes allows custom retention policies by index]

Module-5
Search & Reporting App – The search and reporting app provides the default interface for searching and analyzing data. It enables you to create a knowledge object, reports, dashboards and more.
Seven main components of search and report app main interface,

[Limiting a search by time is key to faster result and is a best practice]
[Commands that create statistics and visualizations are called transforming commands] – These are the commands that transforms event data into a data table.
[By default, a search job will remain active for 10 minutes after its run. After 10 mins Splunk will run the job again to return the result]
[Shared search jobs remain active for 7 days and will be readable to everyone, meaning that anyone you shared this job with will see the same exactly results you did when you first made it]
Modes:
01.   Fast Mode – Only returning information on default fields and fields required to fulfill your search. [Field discovery is disabled in Fast Mode]
02.   Smart Mode – [default] It will toggle behavior based on the type of search you are running.
03.   Verbose Mode – It returns as much field and event data as possible, discovering all fields it can.
[Selecting or zooming into events uses your original search job]
[When you zoom out, Splunk runs a new search job to return newly selected events]
[By default, events are shown in a List view but there are options to display as Raw events or in a Table]
Exploring Search Term Options in Splunk:
Upper case Booleans are, AND, OR, NOT. It can be used for multiple terms.
·         Failed NOT password
·         Failed OR password
·         Failed password – [If no Boolean is used, AND is implied]
[Boolean operations have an order of evaluation, that are – NOT, OR, AND] [Parentheses can be used to control the evaluation]
·         Failed NOT (success OR accepted)
·         “failed password” – exact phrases can be searched by placing the quotes
Escaping characters in a search:
01.   Info=”user “chrisV4” not in database”
02.   Info=”user \”chrisV4\” not in database”

Module-6
Fields info – The fields sidebar shows all the fields that extracted in the search time. We see fields are broken down into selected fields and interesting fields list.
Selected fields are fields of the at most importance to you.
Interesting fields are having values in at least 20% of the events.
In the interesting fields,
-          a denotes the string value
-          # denotes the numerical value
[when you add a field to a selected field list that field will show in the events occurs and persistence for sub-sequent searches]
Searching Fields – You can refrain and run more efficient searches by using fields in them.
·         Sourcetype=linux_secure
[field names are case sensitive while values are not case sensitive/fieldname=sourcetype and values=linux_secure]
Field Operators:
Operators {= or !=} can be used with numerical or string values. Operators {>, >=, <, <=} can be used with numerical values.

Module-7
Best practices in Splunk:
01.   Time
02.   Index, source, host and sourcetype
[Above fields are extracted in the index time so they will not be extracted in each search]
[The more you tell the search engine, the more likely it is that you will get good results]
[inclusion is generally better then exclusion] – eg., instead of “access denied” can be search for NOT “access granted”
[Time abbreviations are used to tell Splunk what time range to search]
Index – One way we can filter events early in our search is by using index. Indexes stores data for searching. Splunk administrator will often use multiple indexes to segregate data.

Module-8
The Splunk Search Language: The search language is built by five components,
01.   Search terms
02.   Commands – It tells Splunk what we want to do with search results. This includes charts, computing statistics and formatting
03.   Functions – It explain how we want to chart, compute and evaluate the results
04.   Arguments – Arguments are the variables that we want to apply to the function
05.   Clauses – Clauses which explain how we want results grouped or defined
Eg., Sourcetype=acc* status=200 | stats list(product_name) as “Game Sold” |
Sourcetype=acc* status=200 à Search Terms
Stats à Command (Blue)
List à Function (Purple)
Product_name à Argument (Green)
As à Clause (Orange)
Visual Syntax tools for SPL,
CTRL or HOME + \ for each pipe to the new line
Fields Command:
The Fields command that include or exclude fields from search results. It is useful to limit fields displayed and can make search faster. Field extraction is one of the most costly parts of searching in Splunk. Field exclusion happens after field extraction, only affecting displayed results.

Table Command: The table command retains searched data in a tabulated format.
Rename Command: It is used to rename fields. Once renamed, original name is not available to subsequent search commands. New field names will need to be used further down the pipeline.
Dedup Command: It is used to remove duplicate events from the results that share common values. You can use with single or multiple fields to dedup commands.
Sort Command: The sort command will let you display the results in ascending or descending order. The sort command can use limit the results by limit arguments.
[String data is sorted alphanumerically]
[Numeric data is sorted numerically]

Module-9
Transforming Commands: These commands order the search results into a data table that Splunk can use for statistical purposes.
Top Command: It finds the most common values of a given field.
·         Index=sales sourcetype=vendor_sales | top Vendor limit=20
·         Index=sales sourcetype=vendor_sales | top Vendor product_name limit=0
Top command clauses are,
Limit = int
Countfield = string
Percentfield = string
Showcount = True/False
Showperc = True/False
Showother = True/False
Otherstr = string
·         Index=sales sourcetype=vendor_sales | top Vendor limit=5 showperc=False
·         Index=sales sourcetype=vendor_sales  | top Vendor limit=5 showperc=False countfield=”Number of Sales” useother=True
Using the “by” clause,
·         Index=sales sourcetpe=vendor_sales | top product_name by Vendor limit=3 countfield=”Number of sales” showperc=False
Rare Command: It shows the least common values of a field set.
·         Index=sales sourcetype=vendor_sales | rare Vendor limit=5 showperc=False countfield=”Number of Sales” useother=True
Using the “by” clause,
·         Index=sales sourcetype=vendor_sales | rare product_name by Vendor limit=3 showperc=False countfield=”Number of Sales” useother=True
Stats Command: To produce the statistics of our search result, we use the stats command. Some of the common stats functions are,
01.   Count – It returns number of events matching search criteria
02.   Distinct count – It returns the counts of unique values for the given field in the search results
03.   Sum – It returns the sum of numerical values
04.   Average – It returns the average of numerical values
05.   Min – It returns the minimum numerical values
06.   Max – It returns the maximum numerical values
07.   List – It list all values of given field
08.   Values – It return the Unique values of a given field. It works like a list function

·         Index=sales sourcetype=vendor_sales | stats count as “Total sales by Vendors” by product_name, categoryid, sales_price
·         [| stats count(field)]
·         Index=web sourcetype=access_combined | stats count(action) as ActionEvents, count as “Total Events”
·          Index=sales sourcetype=vendor_sales | stats distinct_count(product_name) as “Number of games for sale by vendors” by sale_price
·         Index=sales sourcetype=vendor_sales | stats sum(price) as “Gross Sales” by product_name
·         Index=sales sourcetype=vendor_sales | stats count as “Units Sold” sum(price) as “Gross Sales” by product_name à [when using stats, Count and Sum should be in the same pipe]
·         Index=sales sourcetype=vendor_sales | stats avg(sale_price) as “Average Price” à [Missing or misformatted values are not added to the calculation]
·         Index=sales sourcetype=vendor_sales | stats avg(sale_price) as “Average Price”, min(sale_price) as “Min Price”, max(sale_price) as “Max Price” by categoryId
·         Index=bcgassets sourcetype=asset_list | stats list(Asset) as “company assets” by Employee
·         Index=network sourcetpe=cisco_wsa_squid | stats values(s_hostname) by cs_username

Module-11
Pivot - It allows to design report in a simple to use interface without ever having to craft search string.
Data Models or Knowledge objects that provide the data structure that drives Pivots.

These are created by Admin, Power roles & knonwledge of the search language and solid understanding of data.
Data Model is the framework and Pivot is the interface to the data. Each data model is made up of Dataset. Datasets are smaller collection of your data, defined for specific purpose. They are represented as table with field names for columns and field values for cells

Need to create a report without/but Data model is currently does not exists. Instant Pivot tool can get them working with data without having first creating a datamodel.
By entering non-transforming command into the search bar, we will see a button in the Statistics and Visualization search results tab.

The Datasets that makeup the data models can also be helpful in another ways. Allowing our users access to small slicing of data can help them gain operational intelligence from the data without having them use splunk search language.
Datasets help users to find data & get answers faster. Splunk also has a Datasets Add on that you can download from Splunk base. The Add-on allows you to rapidly build dataset tables without using the splunk search language.

Module-12
Lookups - Lookups allow you to add other fields and values to your events not included in the index data. We can combine fields from sources external to the index with searched events based on paired fields present in the events. This might include csv file, sctipts or geospatial data.

[A lookup is categorized as a Dataset]
There are two steps to set up a lookup file,
01. Define a lookup table
02. Define the lookup

(Optionally you can configure your lookup to run automatically. Once defined, Lookup field values are case-sensitive by default)
Create a Lookup Table -
Settings -> Lookups -> Lookup table files ->
Destination App:
Upload a lookup file: (http_status.csv)
Destination filename:

search - | inputlookup http_status.csv

Define a Lookup -
[Now that we have a table without lookup date, We need to define the lookup]

Settings -> Lookups -> Lookup definitions ->
Destination App:
Name:
Type: (File-based/External/KV Store/Geospatial)
Lookup file:

The Lookup Command -
[index=web sourcetype=access_combined NOT status=200 | lookup http_status code as status, OUTPUT code as "HTTP Code", description as "HTTP Description" | table host, "HTTP Code", "HTTP Description"]

[Input fields are not automatically generated with the lookup command]

By default all fields in the lookup table are returned its output fields except the input field. We can choose what fields are lookup returns by adding an OUTPUT clause.

[If there are existing output fields with same name it will be overwritten. We can use OUTPUTNEW clause that don't overwritten an existing fields]
Creating an Automatic Lookup -
Settings -> Lookups -> Automatic lookups ->
Destination app:
Name:
Lookup table:
Apply to:
Lookup input fields:
Lookup output fields:

index=web sourcetype=access_combined NOT status=200 | table host, "Code", "Description"

Additional Lookup Options -
In addition to file based lookups, you can also populated lookup table with search results.
Define lookup based on external script or command.
Use Splunk DB Connect application to create lookups based on external databases.
Use geospatial lookups to create queries that can be used to generate choropleth map visualizations.
Populate events with KV Store fields.

Module-13
Scheduled Reports - Is a report that runs on a scheduled interval and can trigger an action each time it runs.

[Running concurrent reports, and the searches behing them, can put a big demand on your system hardware even if everything is configured to the recommended specs]
[Include a Schedule Window only if the report doesn't have to start at a specific time...and you're ok with the delay.]

Embedded Report -
[Embedding report - Anyone with access to the web page will be able to see the report]
[An embedded report will not show data until the scheduled search is run. Once embedded is enable, we no longer is able to edit]

Alerts - It is based on searches that run on scheduled intervals or in real-time. You can Splunk alert you, when the results of a search meet defined conditions. Alerts are triggered when search is completed.
01. List in interface
02. Log events
03. Output to lookup
04. Send to a telemetry endpoint
05. Trigger scripts
06. Send emails
07. Use a webhook
08. Run a custom alert

There are two types in the alerts are Scheduled or Realtime.
Scheduled alert type allows you to set a schedule and time range for the search to be run.
Real-time alert type will run the search continuously in the background. [Since real-time alerts run continuously and can place more overhead on system performance]

No comments: