how to develop world-changing applications with Splunk
Subscribe to splunkdev group
Email:

Logging Best Practices

Splunk can index any text data. There are things you can do, however, to help Splunk extract more information, and provide more value, from your data.

Getting developers to change the way they do things is like herding cats. But if you want to save a huge amount of time and get the most out of your logging, heed these tips.

If you scour the web, you can find many "logging best practices" documents. Most of them are language specific, discuss performance implications, and explain event levels like DEBUG, INFO, etc. The purpose of this document is not to duplicate existing efforts, but to show you how to log effectively so you use robust analytics in Splunk. Splunk can reveal a tremendous amount of actionable information from within your logs, especially if you follow these guidelines.

Log in Text

Don’t log binary information, as it cannot be meaningfully searched and analyzed. If you must log binary data, place textual meta-data in the event so that you can still search on this data. For example, don’t log the binary data of a JPG file, but do log the meta-data, such as the image size, creation tool, username, camera, gps location, etc. Try not to use XML if at all possible. First, it’s very difficult for humans to read and second, it requires more work for Splunk to parse. Occasional XML is fine for dumping the value of something that exists in your code, but don’t make a habit out of it.

Timestamp Every Event

Most logging systems do this for you automatically. The correct time is critical to understanding the proper sequence of events. Use whatever time-granularity makes sense to your application. Timestamps are critical for debugging, analytics and deriving transactions. Splunk will automatically timestamp events that don’t include them using a number of techniques, but it’s best that you do it.

Log More than Debugging Events

Put semantic meaning in events so you can get much more out of your data. Log audit trails, what users are doing, transactions, timing information – anything that can add value when aggregated, charted or further analyzed.

Clearly mark key value pairs

One of the most powerful features of Splunk is its ability to extract fields from events at search time. This is how Splunk creates structure out of unstructured event data. Events can be created in such a way to guarantee that the field extractions will work as intended. Simply use the following string syntax (spaces or commas are fine):
    key=value,key2=value2,key3=value . . .
If your values contain spaces, wrap them in quotes (e.g. username=”bob smith”).

This may be a bit more verbose than you are used to, but the automatic field extraction will be well worth the size difference.

Take the two following events as an example:

    Log.debug(“error user %d”, userId)
    Log.debug(“orderstatus=error user=%d”, userId)
Searching for the word “error” will probably bring up many types of events, but searching for “orderstatus=error” will retrieve the precise events of interest. Additionally, you can then query Splunk for reports that use orderstatus, such as asking for the distribution of orderstatus (e.g., completed=78%, aborted=21%, error=1%) – something you couldn’t do, if you only had the keyword “error” in your log event.

Log Unique Identifiers

Things like transaction ids, user ids, etc., are tremendously helpful when debugging, and even more helpful when gathering analytics. Unique ids can point you to the exact transaction. Without them, many times all you have to go on is a time range. Don’t change the format of these ids between modules if you can help it. If you keep them the same, then a transaction can be tracked through the system, or multiple ones if that’s the case. You can track transactions with multiple ids as well, as long as there remains a transitive connection between them. For instance:
  Event A contains ID 12345
  Event B contains ID 12345 and UID ABCDE
  Event C contains UID ABCDE
You can associate A with B with C because there is closure between the 2 ids in event B.

© 2005-2010 Splunk Inc.