Logging Best Practices
Splunk can index any text data. There are things you can do, however,
to help Splunk extract more information, and provide more value, from
your data.
Getting developers to change the way they do things is like herding
cats. But if you want to save a huge amount of time and get the most
out of your logging, heed these tips.
If you scour the web, you can find many "logging best practices"
documents. Most of them are language specific, discuss performance
implications, and explain event levels like DEBUG, INFO, etc. The
purpose of this document is not to duplicate existing efforts, but to
show you how to log effectively so you use robust analytics in Splunk.
Splunk can reveal a tremendous amount of actionable information from
within your logs, especially if you follow these guidelines.
Log in Text
Don’t log binary information, as it cannot be meaningfully searched
and analyzed. If you must log binary data, place textual meta-data in
the event so that you can still search on this data. For example,
don’t log the binary data of a JPG file, but do log the meta-data,
such as the image size, creation tool, username, camera, gps location,
etc.
Try not to use XML if at all possible. First, it’s very difficult for
humans to read and second, it requires more work for Splunk to parse.
Occasional XML is fine for dumping the value of something that exists
in your code, but don’t make a habit out of it.
Timestamp Every Event
Most logging systems do this for you automatically. The correct time
is critical to understanding the proper sequence of events. Use
whatever time-granularity makes sense to your application. Timestamps
are critical for debugging, analytics and deriving transactions.
Splunk will automatically timestamp events that don’t include them
using a number of techniques, but it’s best that you do it.
Log More than Debugging Events
Put semantic meaning in events so you can get much more out of your
data. Log audit trails, what users are doing, transactions, timing
information – anything that can add value when aggregated, charted or
further analyzed.
Clearly mark key value pairs
One of the most powerful features of Splunk is its ability to extract
fields from events at search time. This is how Splunk creates
structure out of unstructured event data. Events can be created in
such a way to guarantee that the field extractions will work as
intended. Simply use the following string syntax (spaces or commas
are fine):
key=value,key2=value2,key3=value . . .
If your values contain spaces, wrap them in quotes (e.g. username=”bob smith”).
This may be a bit more verbose than you are used to, but the automatic
field extraction will be well worth the size difference.
Take the two following events as an example:
Log.debug(“error user %d”, userId)
Log.debug(“orderstatus=error user=%d”, userId)
Searching for the word “error” will probably bring up many types of
events, but searching for “orderstatus=error” will retrieve the
precise events of interest. Additionally, you can then query Splunk
for reports that use orderstatus, such as asking for the distribution
of orderstatus (e.g., completed=78%, aborted=21%, error=1%) –
something you couldn’t do, if you only had the keyword “error” in your
log event.
Log Unique Identifiers
Things like transaction ids, user ids, etc., are tremendously helpful
when debugging, and even more helpful when gathering analytics.
Unique ids can point you to the exact transaction. Without them, many
times all you have to go on is a time range.
Don’t change the format of these ids between modules if you can help
it. If you keep them the same, then a transaction can be tracked
through the system, or multiple ones if that’s the case.
You can track transactions with multiple ids as well, as long as there
remains a transitive connection between them. For instance:
Event A contains ID 12345
Event B contains ID 12345 and UID ABCDE
Event C contains UID ABCDE
You can associate A with B with C because there is closure between the 2 ids in event B.