I wrote about data leaks before . A painful problem like this is an opportunity for some and we now have quite a few startups selling products to monitor data leaving a company's network for sensitive information. Vontu and Reconnex are a couple of them. Port Authority was another that was acquired by WebSense.
It is interesting to see how one can go about solving this problem. [Note: In this writeup I focus only on detecting leaks through the company's network. There is another ways in which information can be leaked: through storage devices like hard disks and USB flash. The techniques I mention here do not work for them.]
First we need to define sensitive data. A few items like social security numbers can be easily defined as regular expressions (ddd-dd-dddd, where d is a digit) and one can scan all network data for anything that looks like a social security number. But what about other information? We can apply the pattern matching approach to other structured information like patient records in a healthcare facility, account information in a bank etc. What about unstructured information like design documents or patent ideas communicated between team members in emails? It does not follow a pre-defined pattern. Is there a way of monitoring it? Fortunately, there is. One method is to use rabin fingerprints . Calculate these fingerprints for all potentially sensitive data and match it with the fingerprints calculated for network traffic. This method works well because even if the data was changed a little (like a section from a document was copy-pasted into an email etc) it is matched by the fingerprints.
An approach that combines pattern matching for known and/or structured data and fingerprinting for unstructured data works well in detecting unintended accidental data leaks in information passing through a company's network. A report says that 60% of the leaks reported so far are of this nature. So it is a useful approach. What about the other 40% intentional data theft? I will write about it another day but the first thing that will come into mind is to apply some kind of "locks" and "alarms". Locks in the digital world are cryptographic techniques and alarms are data access, modification and transmission logs.
[Some readers will note that I did not mention "watermarks". I consider that as a subset of structured data]
Add to Technorati Favorites
Wednesday, March 14, 2007
Tuesday, March 13, 2007
Sandboxes for false positives in IDS
Signatures have been an effective but not exhaustive method for threat prevention for a long time. In the early days there were issues with false postives, then there were shortcomings in dealing with polymorphic threats but these were due to "bad signatures" and sometimes performance tradeoffs. That is no longer true [I wrote about various IDS algorithms before]
However, as I said it is not an exhaustive method so it is often complemented by protocol anomaly detection and behavioral anomaly detection. Protocol anomaly detection(PAD) is a very reliable technique but a mere presence of an anomaly does not always indicate an intrusion. Behavioral anomaly detection(NBAD) is worse because it relies on unproven statistical models of network traffic and user behaviour. [Related reading ]. However, both of these methods are useful in locating "suspicious activity", protocol anomaly detection more so than behavioral. The challenge is to deal with the false positives they inevitably result in. A good solution to this problem is now available in atleast two commercial products: FireEye and CheckPoint's MCP which use "sandbox execution" of code extracted from anomalous traffic. The approach is in its infancy but is very promising because it is to intrusion detection what "blood tests"are to disease diagnosis. It has very low false positives and works against zero day threats. It is "expensive" because it requires a lot of computation if PAD and NBAD are used to narrow the search space of traffic, it can scale well.
I believe most commercial IDS/IPS and even anti-virus/spyware vendors will add this weapon to their arsenal this year. Thoughts?
Add to Technorati Favorites
However, as I said it is not an exhaustive method so it is often complemented by protocol anomaly detection and behavioral anomaly detection. Protocol anomaly detection(PAD) is a very reliable technique but a mere presence of an anomaly does not always indicate an intrusion. Behavioral anomaly detection(NBAD) is worse because it relies on unproven statistical models of network traffic and user behaviour. [Related reading ]. However, both of these methods are useful in locating "suspicious activity", protocol anomaly detection more so than behavioral. The challenge is to deal with the false positives they inevitably result in. A good solution to this problem is now available in atleast two commercial products: FireEye and CheckPoint's MCP which use "sandbox execution" of code extracted from anomalous traffic. The approach is in its infancy but is very promising because it is to intrusion detection what "blood tests"are to disease diagnosis. It has very low false positives and works against zero day threats. It is "expensive" because it requires a lot of computation if PAD and NBAD are used to narrow the search space of traffic, it can scale well.
I believe most commercial IDS/IPS and even anti-virus/spyware vendors will add this weapon to their arsenal this year. Thoughts?
Add to Technorati Favorites
Labels:
false positives,
IDS,
IPS,
malicious code,
nbad,
PAD,
sandbox
Subscribe to:
Posts (Atom)