Tuesday, January 13, 2015

Using Splunk as an ETL Tool for Data Residing in a Relational Database

Use Splunk to Extract, Transform, and Load data from an existing OLTP database into another OLTP database. It’s especially great if your source data has XML or JSON (imagine JSON stored in an XML field – Splunk can handle that no problem).


In my case, there was data in a table that stored web service requests and responses.


I used DB Connect (the Splunk app) to query the last 2 days’ worth of data, extracting the XML key/value pairs and cleaning up the data so the Splunk search output a clean table of data. I then filtered that data with a Splunk subsearch that returned the most recent request time already in my destination table (on a different SQL Server instance). Then I piped it all to dboutput pointed at the table. The table was created to ignore any duplicate keys, just in case. Then, I saved the search and scheduled it to run every hour.


Oh yeah, since none of this data is indexed in Splunk, it doesn’t count against your daily licensing limits :)


Here are what some of my Splunk search looked like (I’ll try to build it up in stages). I’m running Splunk Enterprise 4.3.3, btw:


First use dbquery to get the data from your source table.


1. Install DB Connect app (I got the latest version)


2. Set up the database connection to your source DB


3. Set up the database connection to your destination DB (make sure to uncheck the “Read Only” box, otherwise you’ll get messages about the database being read-only”.


4. My dbquery ended up looking something like this (queries from the Splunk external database connection named “DWH”):


| dbquery DWH "select * from LogDB.dbo.SourceServiceLog l (nolock) where l.RequestTime > dateadd(dd,-2,convert(date,getdate()))"


5. Use Splunk Search commands to clean up the data (I’ll leave this to you and the docs… but I’ll have more posts on clever uses)


6. Pipe your search results to dboutput. Mine looked something like this (it naively inserts the fields specified at the end of the command into the table splunktest in the database “MyDatabaseName”):


| dboutput type=insert database=MyDatabaseName table=splunktest key=Character RequestTime UserId RequestType


Note, when specifying the table to insert into with dboutput you cannot fully qualify a table name, e.g. MyDatabaseName.dbo.splunktest.


Note, the select query with dbquery can be any complex query you want. Definitely avoid any double quotes though.


This technique does not seem to count against your daily indexing license limits.





No comments:

Post a Comment