Tuesday, May 28, 2013

Week 3: Back to Research Papers

I started this week attempting to write a script to download random bytes from randomserver.dyndns.org. The issue I was having was that a web page presented a GUI for downloading a given number of bytes but the website, which streams true random numbers from a hardware RNG connected to the server, did not have additional resources for automating the streaming. Adam was able to help me out by pointing me to a script he wrote for a website with a similar problem. He used the twill extension for Python which automates HTML form filling and submission, which is exactly what I needed. I modified his script for the web site I was working on but when I tried to test the script, I found that the server was down. Since completing the script, the web site has only been active intermittently, so it has been difficult for me to properly debug it. I will keep checking the site daily to see if it is back up and stable so that the script can be finished up.

Meanwhile, Amir sent me two more papers on sources of entropy that have been recently published. The paper on boot-time entropy in particular is extremely relevant to our project and provided a lot of insight into providing sufficient randomness at all times through the Linux kernel.

Welcome to the Entropics: Boot-time Entropy in Embedded Devices
This paper starts by making the claim that a device cannot be successfully booted until it is able to provide high-entropy randomness to applications. It then introduces three methods for providing this high entropy before the traditional blocking pool for /dev/random is filled. In the first method, the number of cycles required to run each function in start_kernel is recorded. The advantage of this method is that it provides high entropy with little overhead (about 0.00016 seconds added to boot time) and can run with a single kernel thread and interrupts disabled. While every source that accounted for the non-determinism in execution time could not be identified, the paper noted that clock domain crossing and variation in DRAM access latencies among devices were major factors. The second method creates entropy by measuring the decay of DRAM storage in a live system when refresh is disabled. The decay rate of a given bit is affected by manufacturing variations, temperature, the value of the bit, and other factors. This method is tricky to deploy since it relies on memory controller operations which differ based on the DUT. The last technique measures the amount of time required for a PLL to lock onto the new output frequency. This time varies due to power supply stability, accuracy and jitter of the source oscillator, and manufacturing process differences. Even though it provides the highest bitrate of the three methods, it relies on a strong understanding of the given SoC.

Recommendation for the Entropy Sources Used for Random Bit Generation

This document specifies the design and requirements for a NIST-approved entropy source. The entropy originates from a noise source, which then may need to be digitized to produce random binary output. Optional conditioning functions are necessary to reduce bias that may arise due to a number of different factors. The conditioning functions will provide the overall output of the entropy source. Health testing functionality is required to detect malfunctions in the noise source or bias beyond the bounds for which the conditioning functions can remedy. Health tests include startup tests on all components, continuous tests on the noise source that run as long as the system is live, and on-demand tests that require more time and resources than continuous tests. On the simpler end, the Repitition Count Test and Adaptive Proportion Test do basic checks for abnormal proportions of a given sample in a fixed bitstring length. Extensive statistical analysis is performed to verify that the estimated min-entropy provided by the source is sufficient.

Monday, May 20, 2013

Week 2: Gathering Entropy from Online Sources


This week, my assignment was to collect and record output from true random number generators that are available online. These online resources typically utilize quantum interactions to generate entropy and provide random bits for personal use free of charge. Unfortunately, one of the websites I tried to extract bits from, LavaRnd, has been decommissioned indefinitely. I am currently working on obtaining random bits from randomserver.dyndns.org, which utilizes a hardware RNG. While the site has a page with a GUI for downloading a given number of random bytes, it does not contain any additional resources to directly stream from the server, so I will need to think of a way to write a script that automates the downloading process so a sufficient number of random bytes can be obtained. During this week, I was able to successfully write scripts for two of the web sites, and I uploaded the code to the randomness/onlineResources github repository.

Australian National University Quantum Random Numbers Server
A lab at ANU measures real-time quantum fluctuations of phase and amplitude within a vacuum to yield an ultra-high bandwidth random number stream. The hardware generates this stream at 5.7 Gbit/sec with no daily limit on downloading, so my script could theoretically be looped without interruption given that the bottleneck on stream rate is the internet bandwidth. The site provided a link to the anurandom github project which offers tools and sample code for downloading the random bits from the server in several different languages. I utilized Python because of the presence of a web page on the site that displays 1024 live random bits, making the extraction and storage code trivial. Therefore, my script pulls 124 random bytes and stores them as binary data in a newly created anu_random.bin file within the same directory.

Humboldt University of Berlin QRNG Service
At the Department of Physics of the Humboldt University, the Nano-Optics lab uses state-of-the-art photon timing instrumentation and data processing to generate random numbers based on photon arrival times. The device constantly delivers 150 Mbit/sec of data, so once again the script could run without interruption only limited by internet bandwidth. The site offers the libQRNG C library to interface with the server as well as demo code. I removed SSL security from the demo program because the library utilized outdated Linux SSL libraries rather than OpenSSL, rendering that portion of the code obsolete. I also modified the code so that it does not block waiting for user input, allowing the program to be looped indefinitely. I had to register as a user of the service and my user name and password is required to start a connection with the server. As it is now, this C program downloads 25 MB of random binary data from the server and stores it in qrng_output.bin within the same directory. 

Tuesday, May 14, 2013

Week 1: Getting up to Speed


This summer, I will be working with Adam Norton and Amir Rahmati on the cryptographic randomness project in the SPQR lab. Our primary goal is to identify sources of entropy that provide sufficient randomness for a truly secure cryptographic system, particularly on devices with limited computing power. For the first week, I read five different research papers on cryptographic randomness to get up to speed on the scope of the project. Included below is a brief summary of each paper as well as the major lessons and conclusions that will be helpful to our team.

Mining your Ps and Qs: Detection of Widespread Weak Keys in Network Devices
This paper details a large-scale experiment aiming to examine the security of random number generation in important cryptographic protocols. The study specifically targets TLS and SSH, performing a scan of the entire IPv4 address space to obtain TLS certificates and SSH host keys. The data shows that headless or embedded devices do not generate sufficient entropy for the PRNG in the protocols to create distinct keys. 5.57% of TLS hosts and 9.60% of SSH hosts shared keys with other hosts because of this problem. Furthermore, by exploiting weaknesses in RSA and DSA, the private keys of 0.5% of TLS hosts and 1.06% of SSH hosts could be computed using the public key. One reason embedded devices do not generate sufficient entropy is because they do not have the resources of traditional PCs, which utilize keyboard input, mouse movement, disk accesses, and other data to provide entropy. However, the final component of the experiment exposes a major reason that embedded devices do not generate sufficient entropy. The most popular software package for implementing a PRNG is /dev/urandom. The study finds that this Linux package exhibits a boot-time entropy hole that causes its random number generator to produce deterministic output. In one case, the non-blocking entropy pool which supplies urandom did not get data from the input pool until 66 seconds after boot, even though OpenSSH seeded its PRNG using urandom well before that. The paper provides helpful advice for anyone trying to create a secure embedded device: avoid using factory default keys or certificates, ensure sources of entropy are effective, test cryptographic randomness on the device, and use hardware random number generators whenever possible.

Analysis of the Linux Random Number Generator
 The goal of this paper is to decipher the complex open-source code describing the Linux RNG that supplies /dev/random and /dev/urandom. The operating system collects entropy through mouse movement, keyboard input, specific interrupts, and disk accesses. This entropy is funneled into a 512 byte primary pool in the form of 2 32-bit words. The first word describes the timing of the event in milliseconds and the second word describes the event type. The primary pool provides entropy bits to the secondary and urandom pools, both of which store up to 128 bytes. The secondary pool outputs to /dev/random, which blocks until the entropy counter of the secondary pool is at a sufficient value. The urandom pool outputs to /dev/urandom, which does not block. Complex mathematics is involved in determining the amount of entropy entering each pool and, in turn, incrementing the entropy counter. The study also finds a vulnerability in the forward security of the system due to the fact that output from the pools is calculated after the state of the pool is updated. An algorithm is presented that can compute the five previous states of a pool with a high entropy counter based on the current state with an overhead of 264 using O(1) memory. This issue can be avoided by computing the output before updating the state of the pool. Additionally, the study notes that OpenWRT, a Linux distribution for wireless routers, is much less secure because it lacks strong sources of entropy such as mouse and keyboard input or disk accesses. The only source it uses are network interrupts, which are easily visible to an external adversary. Finally, the paper provides solutions for a denial-of-service attack based on continuous calls to the blocking /dev/random. This can be avoided by creating a quota per user or group on the consumption of random bits.

How to Generate Cryptographically Strong Sequences of Pseudo Random Bits
With the motivation of devising a robust, secure random number generator in mind, this paper starts out with an excellent coin-flipping analogy to define the key concept that randomness is relative to a specific model of computing with a specific amount of computational resources. Basically, an observer with no resources is unable to determine the outcome of a coin flip with greater than 50% success. However, an observer equipped with a computer capable of taking pictures and tracking the motion of the coin and efficiently calculating its speed and momentum may be able to predict the outcome with greater than 50% success. Keeping this in mind, a random number generator proposed which takes as an input a random seed value s and outputs a sequence of pseudo random bits. The bits are polynomially many in the length of s, each bit can be calculated in time polynomial in the length of s, and, given the first k pseudo-random bits not not s, it is computationally infeasible to calculate the k+1st bit in the sequence with better than 50% success.

When Good Randomness Goes Bad: Virtual Machine Reset Vulnerabilities and Hedging Deployed Cryptography
There are two different components to this paper. The first is a case study on the nature of snapshots used in virtual machines to reset the state of the system and the security vulnerabilities that arise due to this technique. This study has significant implications because VM managers such as VMWare and VirtualBox are used at the consumer and business level to efficiently manage computing infrastructure and the increasingly popular cloud computing relies on these management services as well. The study found that if a client runs multiple times from the same full reset snapshot, it may end up sending the same premaster secret value required for the RSA protocol to different servers. In terms of TLS servers, a vulnerability arises if multiple servers are run from the same snapshot. If a snapshot of the server was taken after startup, it would save the state of the RNG based on the boot-time seed value. Reuse of that state would lead to multiple servers using the same randomness in DSA. Fixing this vulnerability requires ensuring that the RNG gets sufficient entropy after the snapshot and that OS sources such as /dev/random and /dev/urandom generate randomness immediately preceding the cryptographic operation. The second part of the paper explores the idea of hedging currently deployed cryptography to safeguard against bad randomness and therefore mitigate the risk of RNG failures. The proposed hedging framework preprocesses RNG-derived randomness with other inputs such as mouse and keyboard events to create pseudo randomness for cryptographic operations with very little overhead. The framework can be applied to the OpenSSL code base, which is the cryptographic library for the TLS protocol.

A Statistical Test Suite for Random and Pseudorandom Number Generators for Cryptographic Applications
In the introduction, this paper explains the important difference between random number generators and pseudo random number generators. RNGs use an entropy source to create randomness. On the other hand, PRNGs use random, unpredictable seed values as inputs to generate multiple “pseudorandom” numbers. Therefore, by default, a PRNG requires an RNG to operate off a seed value. The paper then presents a statistical suite of 15 tests to determine whether RNGs and PRNGs produce sufficient randomness. The first test examines a sequence of random bits and assesses the closeness of the fraction of ones to ½ to determine whether the number of ones and zeros is approximately the same as what would be expected in a truly random sequence. The next 14 tests depend on whether this first one passes. The next test checks the proportion of ones within M-bit blocks. For block size M=1, this test is the same as the first test. Another test focuses on the number of runs in the sequence, where a run is defined as an uninterrupted sequence of identical bits. This test determines whether the oscillation between zeros and ones is too fast or too slow. Other tests utilize mathematical techniques such as discrete Fourier transforms, template matching, and Maurer’s test.