Human Assisted Automated Analysis

I think that we’re beginning to see a pattern emerging for solving some of information security’s most pressing, complex problems.

I get a lot of vendor announcements. That comes with my territory; I feel that I have to stay tuned as new developments appear. Another motivation, of course, are the raft of new attack types, combined attacks, carefully orchestrated attacks: it’s a dangerous Internet out there. It should be no surprise to any of my readers that there are individuals and groups who want what you have, irrespective of your values and your politics (and, of course, in some cases, because of your values and your politics. We all get attacked out of multiple motivations).

So what’s a poor security guy to do? Well, I do my day job, of course. And, I try to stay informed. Plus, as many of you know, I know a few folks in the industry, many of whom are a lot smarter than yours truly. Lucky me, these very smart people sometimes tell me about what problems they’re working on and how they’re approaching that work.

A couple of days ago, I received a link to an article in Computerworld purportedly about “DLP1 in the Cloud” from BEW Global. OK, everyone wants to jump on the “cloud” bandwagon. It’s true. The hype machine is definitely running2. But the cloud claim was not what caught my eye.

Data Loss Prevention (DLP) is one of the trickier problems to solve for unstructured and/or poorly normalized data. We’ve had functioning regular expression or signature-based identification DLP software for quite a while. This is nothing new. In my experience, and, talking to others running DLP programmes, the existing toolset is accurate enough when pointed at highly predictable data types: government numbers, credit card numbers, any field who’s form can be normalized. This is what computer programmes are good at, yes?

But, try to write a regular expression to identify code written in any one of several languages, each with a distinct syntax. That is a problem. When I have broached this problem with vendors, they admit that it’s a tough nut to crack. I fear that one would have come fairly close to writing a language parser3; usually not a trivial problem.

So, what are BEW Global bringing to the problem that’s new? “Firm leverages cloud, human capital to offer data loss prevention services”. Their new element is the addition of a human analyst.

But BEW aren’t the only ones trying to do this on an enterprise scale. Whitehat Security have been doing this for a number of years. Within the last year, I’ve also heard this approach from a few other companies. Look who else is adding human analysts to supplement automated techniques4:

The idea as I understand it is that every variation that can be identified through programmatic techniques simply gets cataloged by the software. Outliers, exceptions, unique patterns get punted off to humans for their analysis as a part of the normal operation of the product.

There is, obviously, nothing new about human powered analysis5. Who else is going to create the algorithms initially? That’s really the only approach when tackling entirely new problems. Additionally, most security tools are enhanced cyclically; this is most often done through human programming. However, the norm has been that the running tool functions entirely by its programming6.

The emerging new architectural pattern includes analysts as a part of the running solution. Humans are not confined to updating the solution (additions, adjustments, new). Analysts are employed as a part of the running product, augmenting processing power with brain power.

Truly difficult problems apparently require highly trained people. It seems obvious in its way. This idea has taken a while to emerge. Whitehat were the first7 that I’d heard of sending outliers to human staff as a part of their standard analysis.

Apparently, we’re starting to see this idea catch on?

Thoughts? Other instances? Have I missed something?

Cheers

/brook
1 Data Loss Prevention
2 I sometimes see that the “cloud” is a couple of servers hosted at one of the big providers. Hmmm… ?!?
3 The usual front end for a compiler.
4 Vendor mention must in no manner be construed as a product or company endorsement by Brook Schoenfield whatsoever, without restriction. My understanding of any one of these products may be faulty. I have listed these as examples only, to the best of my understanding of what has been told to me. I have performed no due diligence on the statements of these organizations. Brook Schoenfield makes no warranty or representation whatsoever on the truthfulness of company claims nor on the fitness of any product.
5 Even if the humans are employing automated tools.
6 I’m including updating signatures and similar data upon which the automation will base its decisions.
7 I haven’t done any research on who was first. I remember who told me about this pattern the first time.