Security Research Is Threat Modeling’s Constructive Criticism

(the following is partially excerpted from my next book)

Adam Shostack, author of Threat Modeling: Designing for Security has succinctly described a process of constructive critique within engineering:

“The back and forth of design and critique is not only a critical part of how an individual design gets better, but fields in which such criticism is the norm advance faster.”

The Spectre/Meltdown issues are the result of a design critique such as Shostack describes in his pithy quote given above.  In fact, one of the fundamental functions that I believe security research can play is providing designers with a constructive critique.

Using Spectre and Meltdown as an example of engineering critique, let’s look at some of the headlines from before the official issue announcement by the researchers:

“Kernel-memory-leaking Intel processor design flaw forces Linux, Windows redesign”, John Leyden and Chris Williams 2 Jan 2018 at 19:29, The Register, https://www.theregister.co.uk/2018/01/02/intel_cpu_design_flaw/

“A CRITICAL INTEL FLAW BREAKS BASIC SECURITY FOR MOST COMPUTERS”, Andy Greenberg, January 3, 2018, Wired Magazine, https://www.wired.com/story/critical-intel-flaw-breaks-basic-security-for-most-computers/

There were dozens of similar headlines (many merely repeating the first few, especially, The Register’s), all declaiming a “flaw” in CPUs. I want to draw the reader’s attention to the word, “flaw”. Are these issues “flaws”? Or, are they the result of something else?

The Register headline and that first article were based upon speculation that had been occurring amongst the open source community supporting the Linux Kernel. A couple of apparently odd changes that had been made to kernel code. But, the issues to which these changes responded were “embargoed”, that is, the reasoning behind the changes was known only to those making the changes.

Unlike typical open source changes, whose reasoning is public and often discussed by members of the community, these kernel changes had been made in opaquely without public comment, which of course, set concerned kernel community members wondering.

To observers not familiar with the reasoning behind the changes, it was clear that something was amiss and likely in relation to CPU functions; anxious observers were guessing what might be the motivation for those code changes.

Within the digital security universe, there exists an important dialog between security research aimed at discovering new attack techniques and the designers of the systems and protocols upon which that research is carried out. As Adam noted so very wryly, achieving solid designs, even great ones, and most importantly, resilient designs in the face of omnipresent attack requires an interchange of constructive critique. That is how Spectre and Meltdown were discovered and presented.

Neither of this collection of (at the time of announcement) new techniques involved exercising a flaw, that is, a design error – in other words, the headlines quoted just above were erroneous and rather misleading[2].

Speculative execution and the use of kernel mapped user memory pages by operating systems were intentional design choices that had been working as designed for more than 10 years. Taken together, at least some of the increases in CPU performance over that period can directly be tied to speculative execution design.

Furthermore, and quite importantly to this discussion, these design choices were made within the context of a rather different threat landscape. Some of today’s very active threat actors didn’t exist, or at least, were not nearly as active and certainly not as technically sophisticated circa 2005 as they are today, May 2018.

If I recall correctly (and I should be able to remember, since I was the technical lead for Cisco’s Web infrastructure and application security team at that time), in 2005, network attacks were being eclipsed by application focused attack methods, especially, web attack methods.

Today, web attacks are very “ho, hum”, very run of the ordinary, garden variety. But in 2005, when the first CPU speculative execution pipelines were being released, web applications were targets of choice at the cutting edge of digital security. Endpoint worms and gaining entrance through poor network ingress controls had been security’s focus up until the web application attack boom (if I may title it so?). At about that time, targeting web applications was fast displacing network periphery concerns. Attackers were in the process of shifting to targets that used a standard protocol (HTTP) which was guaranteed to pass through organizational firewalls. Many of the new targets’ were becoming always  available via the Public Internet.

Since the web application attack boom, attacks and targets have continued to evolve. The threat landscape changed dramatically over the years since the initial design of speculative execution CPUs. Alongside the changes in types of attackers as well as their targets, attacker and researcher sophistication has grown, as has an attackers’ toolbox. 2018 is a different security world than 2005. I see no end to this curve of technical growth in my crystal ball.

The problem is, when threat modeling, whether in 2005 or 2018, one considers the attacks of the past, those of moment, and then one must try one’s best to project from current understanding to attacks that might arise within the foreseeable future. Ten or twelve years seems an awfully long horizon of prescience, especially when considering the rate at which technical change continues to take place.

As new research begins to chew at the edges of any design, I believe that the wise and diligent practitioner revisits their existing threat models in light of developments.

If I were to fault the CPU and operating system makers whose products are subject to Spectre or Meltdown, it would be for a failure to anticipate where research might lead, as research has unfolded. CPU threat modelers could have taken into account advances in research indicating unexpected uses of cache memory.

Speculative execution leaves remnants of a speculated execution branch in cache memory when a branch has not been taken. It is those remnants that lie at the heart of this line of research.

A close examination of the unfolding research might very well have led those responsible for updating CPU threat models to consider the potential for something like Spectre and Meltdown. (Or, perhaps the threat models were updated, but other challenges prevented updates to CPU designs? CPU threat modelers, please tell us all what the real story is)

I’ve found a publication chain during the 3 years previous that, to me, points towards the new techniques. Spectre and Meltdown are not stand-alone discoveries, but lie on a body of CPU research that had been published regularly for several years.

As I wrote for McAfee’s Security Matters blog in January of 2018 (as a member of McAfee’s Advanced Threat Research Team),

“Meltdown and Spectre are new techniques that build upon previous work, such as “KASLR”  and other papers that discuss practical side-channel attacks. The current disclosures build upon such side-channels attacks through the innovative use of speculative execution….An earlier example of side-channel based upon memory caches was posted to Github in 2016 by one of the Spectre/Meltdown researchers, Daniel Gruss.” Daniel Gruss is one of the Spectre and Meltdown paper authors.

Reading these earlier papers, it appears to me that some of the parent techniques that would be used for the Spectre and Meltdown breakthroughs could have been read (should have been read?) by CPU security architects in order to re-evaluate the CPU’s threat model. That previously published research was most certainly available.

Of course, hindsight is always 20/20; I had the Spectre and Meltdown papers in hand as I reviewed previous research. Going the other way might be more difficult?

Spectre and Meltdown did not just spring miraculously from the head of Zeus, as it were. They are the results of a fairly long and concerted effort to discover problems with and thus, hopefully, improve the designs of modern processors. Indeed, the researchers engaged in responsible disclosure, not wishing to publish until fixes could be made available.

To complete our story, the driver that tipped the researchers to an early, zero-day disclosure (that is, disclosure without available mitigations or repairs) were the numerous speculative (if you’ve forgive the pun?) journalism (headlines quoted above) that gained traction based upon misleading (at best) or wrong conclusions. Claiming a major design “flaw” in millions of processors is certainly a reader catching headline. But, unfortunately, these claims were vastly off the mark since no flaw existed in the CPU or operating system designs.

While it may be more “interesting” to imagine a multi-year conspiracy to cover up known design issues by evil CPU makers, no such conspiracy appears to have taken place.

Rather, in the spirit of responsible disclosure, the researchers were waiting for mitigations to be made available to customers; CPU manufacturers and operating system coders were heads down at work figuring out what appropriate mitigations might be, and just how to implement these with the least amount of disruption. None of these parties was publicly discussing just why changes were being made, especially to the open source Linux kernel.

Which is precisely what one would expect in order to protect millions of CPU users: embargo the technical details to foil attackers. There is actually nothing unusual about such a process; it’s all very normal and typical, and unfortunately for news media, quite banal[3].

What we see through the foregoing example about Spectre and Meltdown is precisely the sort of rich dialog that should occur between designers and critics (researchers, in this case).

Designs are built against the backdrop and within the context of their security “moment”. Our designs cannot improve without collective critique amongst the designers; such dialog internal to an organization or at least, a development team is essential. I have spoken about this process repeatedly at conferences: “It takes a village to threat model,” (to misquote a famous USA politician.)

But, there’s another level, if you will, that can reach for a greater constructive critique.

Once a design is made available to independent critics, that is, security researchers, research discoveries can and I believe, should become part of an ongoing re-evaluation of the threat model, that is, the security of the design. In this way, we can, as an industry, reach for the constructive critique called for by Adam Shostack.

[1]In my humble experience, Adam is particularly good at expressing complex processes briefly and clearly. One of his many gifts as a technologist and leader in the security architecture space.

[2]Though salacious headlines apparently increase readership and thus advertising revenue. Hence, the misleading but emotion plucking headlines.

[3]Disclosure: I’ve been involved in numerous embargoed issues over the years.

Heartbleed Exposure, What Is It Really?

Heartbleed Exposure, what is it really?

“Heap allocation patterns make private key exposure unlikely” Neel Mehta, discoverer of HeartBleed” 

In the media, there’s been a lot of discussion about what might be exposed from the heartbleed OpenSSL attack. It is certainly true that very sensitive items can be exposed. And over thousands of test runs, sensitive items like private keying materials and the like have been returned by the heartbleed buffer overread.

A very strong case can be made for doing exactly as industry due diligence suggests. Teams should replace private keys on servers that had been vulnerable, once these are patched. But should every person on the Internet change every password? Let’s examine that problems by digging into the details of exactly how heartbleed works.

First, heartbleed has been characterized as an “overflow” error: “Heartbleed is basically a buffer-overflow vulnerability”. This unfortunately is a poor descriptor and somewhat inaccurate. It may make better media copy, but calling heartbleed an “overflow” is a poor technical description upon which to base a measured response.

Heartbleed is not a classic buffer overflow. No flow control or executable code may be injected via heartbleed. A read of attacker chosen memory locations is not possible, as I will explain, below. A better descriptor of heartbleed is a “buffer over-read”. Unintentionally, some data from memory is returned to the attacker. To be precise, heartbleed is a data leak, not a flow control error.

In order to understand what’s possible to disclose, it’s key to understand program “heap” memory. The heap is an area of memory that programs use to store data. Generally speaking, well-written programs (like OpenSSL) do not to put executable code into heap (that is, data) memory[1]. Because data and execution are separated, the attacker has no way through this vulnerability to execute code. And that is key, as we shall see.

As a program runs, bits of data, large and small, temporary and more or less permanent for the run, are put into the heap[2]. Typically, data are put wherever is convenient at the moment of allocation, depending upon what memory is available.

Memory that’s been deallocated gets reused. If an available piece of memory happens to be larger than a requested size, the new sized piece will be filled with the new data, while adjacent to the new data will remain bits and pieces of whatever was there previously.

In other words, while not entirely random, the heap is filled with bits and pieces of data, a little from here, a little from there, a nice big chunk from this session, with a bit left over from some other session, all helter-skelter amongst each other. The heap is a jumble; taking random bits from the heap may be considered to be like attending a jumble sale.

Now, let’s return to heartbleed. The heartbleed bug returns whatever happens to be on the heap just above the 16 bytes that are required for the TLS heartbeat packet. The attacker may request as much as 64K bytes. That’s a nice big chunk of stuff from the heap; make no mistake about it. Anything might be in there. At the very least, decrypted  data intended for application processing will be returned to the attacker[3]. That’s certainly bad! It breaks the confidentiality supposedly gained through the TLS encryption. But getting a random bit is different than requesting an arbitrary memory location at the discretion of the attacker. And that is a very important statement to hold in mind as we respond to this very serious situation.

An analogy to Heartbleed might be a bit like going fishing. Sometimes, we fish where we can clearly see the fish (mountain streams) or signs of fish (clearer lakes), or with a “fish finder” appliance, that identifies fish  under the surface when the fish aren’t visible.

Heartbleed is a lot more like fishing for fish that are deep in a turbulent lake with no fish finding capability. The fisher is guessing. If she or he guesses correctly, fish for dinner. If not, it’s a long day holding onto the fishing rod.

In the same manner, the attacker, the “fisher” as it were, doesn’t know where the “fish”, the goodies are. The bait (the heartbleed request) is cast upon the “lake” (the program heap) in the hopes that a big fish will “bite” (secret “bytes” will get returned).

The attacker can heartbleed to her or his heart’s content (pun intended). That is, if left undiscovered, an attacker can continuously pound the other side of the connection with heartbleeds, perhaps thousands of times. Which means multiple chunks of memory will be returned to the attacker, as the heap allocates, deallocates, and moves data around.

Lots of different heap chunks will get returned. There will likely also be overlap between the chunks that are returned to the attacker. Somewhere within those memory chunks are likely to be some sensitive data. If the private key for a session happens to be in one of those chunks, it will be exposed to the attacker. If any particular session open through the OpenSSL library happens to a contain a password that had been transmitted, it’s been exposed. It won’t take an engineering genius to do an ASCII dump of returned chunks of memory in order to go poking about to find interesting bits.

Still, and nonetheless, this is hunting for goodies in a bit of a haystack. Some people are quite good at that. Let’s acknowledge that outright. But that’s very different than a directed attack.

And should a wise and prepared security team, making good use of appropriate security tools, notice a heartbleed attack, they will most likely kill the connection before thousands of buffers can be read. Heartbleed over any particular connection is a linear process, one packet retrieved at a time. Retrieving lots of data takes some time. Time to respond. Of course, an unprotected and unaware site could allow many sessions to get opened by an attacker, each linearly heartbled, thus revealing far more of what’s on the heap than a single session might. Wouldn’t you notice such anomalous behaviour?

It’s important to note that the returns in the heartbleed packets are not necessarily tied to the attackers’ session. Again, it’s whatever happens to be on the heap, which will contain parts of other sessions. And any particular heartbleed packet is not necessarily connected to the data in a previous or subsequent packet. Which means that there’s no continuity of session nor any linearity between heartbleed retrievals. All session continuity must be pieced together by the attacker. That’s not rocket science. But it’s also work, perhaps significant work.

I’ll reiterate in closing, that this is a dangerous bug to which we must respond in an orderly fashion.

On the other hand, this bug does not give attackers free reign to go after all the juicy targets that may be available on any host, server, or endpoint that happens to have OpenSSL installed. Whatever happens to be on the heap of the process using the OpenSSL library and that is adjacent to the heartbeat buffer will be returned. And that attack may only occur during a TLS session. Simply including the vulnerable library poses no risk, at all. Many programs make use of OpenSSL for other functionality beyond TLS sessions.

This bug is not the unfettered keys to the kingdom, unless a “key to the kingdom” just happens to be on the heap and happens to get returned in the over-read. What gets returned is entirely due to the distribution of the heap at the moment of that particular heartbeat.

Cheers,

/brook

These assertions have been demonstrated in the lab through numerous runs of the heartbleed attack by a  team who cannot be named here. My thanks to them for confirming this assessment. Sorry for not disclosing.

[1] There are plenty of specialized cases that break this rule. But typically, code doesn’t run from the heap; data goes onto the heap. And generally speaking, programs refrain from executing on the heap because it’s a poor security practice. Let’s make that assumption about OpenSSL (and there’s nothing to indicate that this is NOT true in this case), in order to make clear what’s going on with heartbleed.

[2] The libraries that support programs developed with the major development tools and running on the major operating systems have sophisticated heap management services that are consumed by the running application as it allocates and deallocates memory. While care must be exercised in languages like C/C++, the location of where data end up on the heap is controlled by these low-level services.

[3] That is, intended for the application that is using OpenSSL for TLS services.

The “Real World” of Developer-centric Security

My friend and colleague, Dr. James Ransome, invited me late last Winter to write a chapter for his 10th book on computer security, Core Software Security(with co-author, Anmol Misra published by CRC Press. My chapter is “The SDL In The Real World”, SDL = “Secure Development Lifecycle”. The book was released December 9, 2013. You can get copies from the usual sources (no adverts here, as always).

It was an exciting process. James and I spent hours white boarding possible SDL approaches, which was very fun, indeed*. We collectively challenged ourselves to uncover current SDL assumptions, poke at the validity of these, and find better approaches, if possible.

Many of you already know that I’ve been working towards a different approach to the very difficult, multi-dimensional and multi-variate problem of designing and implementing secure software for a rather long time. Some of my earlier work has been presented to the industry on a regular basis.

Specifically, during the period of 2007-9, I talked about a new (then) approach to security verification that would be easy for developers to integrate into their workflow and which wouldn’t require a deep understanding of security vulnerabilities nor of security testing. At the time, this approach was a radical departure.

The proving ground for these ideas was my program at Cisco, Baseline Application Vulnerability Assessment, or BAVA, for short (“my” here does not exclude the many people who contributed greatly to BAVA’s structure and success. But it was more or less my idea and I was the technical leader for the program).

But, is ease and simplicity all that’s necessary? By now, many vendors have jumped on the bandwagon; BAVA’s tenets are hardly even newsworthy at this point**. Still, the dream has not been realized, as far as I can see. Vulnerability scanning still suffers from a slew of impediments from a developer’s view:

  • Results count vulnerabilities not software errors
  • Results are noisy, often many variations of a single error are reported uniquely
  • Tools are hard to set up
  • Tools require considerable tool  knowledge and experience, too much for developers’ highly over-subscribed days
  • Qualification of results requires more in-depth security knowledge than even senior developers generally have (much less an average developer)

And that’s just the tool side of the problem. What about architecture and design? What about building security in during iterative, fast paced, and fast changing agile development practices? How about continuous integration?

As I was writing my chapter, something crystalized. I named it, “developer-centric security”, which then managed to get wrapped into the press release and marketing materials of the book. Think about this:  how does the security picture change if we re-shape what we do by taking the developer’s perspective rather than a security person’s?

Developer-centric software security then reduced to single, pointed question:

What am I doing to enable developers to innovate securely while they are designing and writing software?

Software development remains a creative and innovative activity. But so often, we on the security side try to put the brakes on innovation in favour of security. Policies, standards, etc., all try to set out the rules by which software should be produced. From an innovator’s view at least some of the time, developers are iterating through solutions to a new problem while searching for the best way to solve it. How might security folk enable that process? That’s the question I started to ask myself.

Enabling creativity, thinking like a developer, while integrating into her or his workflow is the essence of developer-centric security. Trust and verify. (I think we have to get rid of that old “but”)

Like all published works, the book represents a point-in-time. My thinking has accelerated since the chapter was completed. Write me if you’re intrigued, if you’d like more about developer-centric security.

Have a great day wherever you happen to be on this spinning orb we call home, Earth.

cheers,

/brook

*Several of the intermediate diagrams boggle in complexity and their busy quality. Like much software development, we had to work iteratively. Intermediate ideas grew and shifted as we worked. a creative process?

**At the time, after hearing BAVA’s requirements, one vendor told me, “I’ll call you back next year.”. Six months later on a vendor webcast, that same vendor was extolling the very tenets that I’d given them earlier. Sea change?

Agile & Security, Enemies For Life?

Are Agile software development and security permanent enemies?

I think not. In fact, I have participated with SCRUM development where building security into the results was more effective than classic waterfall. I’m an Secure Agile believer!

Dwayne Melancon, CTO of Tripwire, opined for BrightTALK on successful Agile for security.

Dwayne, I wish it was that simple! “Engage security early”. How often have I said that? “Prioritize vulnerabilities based on business impact”. Didn’t I say that at RSA? I hope I did?

Yes, these are important points. But they’re hardly news to security practitioners in the trenches building programmes.

Producing secure software when using an Agile menthod is not quite as simple as “architect and design early”. Yes, that’s important, of course. We’ve been saying “build security in, don’t bolt it on” for years now. That has not changed.

I believe that the key is to integrate security into the Agile process from architecture through testing. Recognize that we have to trust, work with, and make use of the agile process rather than fighting. After all, one of the key values that makes SCRUM so valuable is trust. Trust and collaboration are key to Agile success. I argue that trust and collaboration are keys to Agile that produces secure software.

In SCRUM, what is going to be built is considered during user story creation. That’s the “early” part. A close relationship with the Product Owner is critical to get security user stories onto the backlog and critical during user story prioritization. And, the security person should be familiar with any existing architecture during user story creation. That way, security can be considered in context and not ivory tower.

I’ve seen security develop a basic set of user stories that fit a particular class or type of project. These can be templated and simply added in at the beginning, perhaps tweaked for local variation.

At the beginning of each Sprint, stories are chosen for development from out of the back log, During this process, considerable design takes place. Make security an integral part of that process, either through direct participation or by proxy.

Really, in my experience, all the key voices should be a part of this selection and refinement process. Quality people can offer why a paticular approach is easier to test, architects can offer whether a story has been accounted for in the architecture, etc. Security is one of the important voices, but certainly not the only one.

Security experts need to make themselves available throughout a Sprint to answer questions about implementation details, the correct way to securely build each user story under development.  Partnership. Help SCRUM members be security “eyes and ears” on the ground.

Finally, since writing secure code is very much a discipline and practice, appropriate testing and vulnerability assurance steps need to be a part of every sprint. I think that these need to be part of Definition of Done.

Everyone is involved in security in Agile. Security folk can’t toss security “over the wall” and expect secure results. We have to get our hands dirty, get some implementation grease under the proverbial fingernails in order to earn the trust of the SCRUM teams.

Trust and collaboration are success factors, but these are also security factors. If the entire team are considering security throughout the process, they can at the very least call for help when there is any question. In my experience, over time, many teams will develop their team security expertise, allowing them to be far more self-sufficient – which is part of the essence of Agile.

Us security folk are going to have to give up control and instead become collaborators, partners. I don’t always get security built the way that I might think about it, but it gets built in. And, I learn lots of interesting, creative, innovative approaches from my colleagues along the way.

cheers

/brook