Monday, June 24, 2013

Week 7: Keylogging

Now that work on gathering random bytes from online sources has finished, I have begun working on tracking keystrokes and attempting to build a record of timestamps and cycles after boot-up for each key pressed. This has been more challenging than expected because each major operating system platform handles the raw input of the keyboard differently. The first platform I worked on was Mac OS X. I thought the UNIX-based OS would use device event files to record keystrokes but Apple changed the architecture so that the same information was in a much more obscure location within Application Services. Amir gave me a Macbook Air to work on but we did not have the passwords for any of the accounts. Instead of having to load a new OS on the machine, Adam let me borrow the Macbook Pro he had been working with. Eventually, the only solution I could come up with was to use logKextClient, a software that is able to capture all keyboard input. While it records the timespan range for the collected data, it does not assign a timestamp to each individual key pressed, let alone include the number of CPU cycles that have passed.

Trying to find a keylogger solution for Windows and Linux have proven similarly difficult. While I would like to test a simple script which reads the Linux device event file, that script failed when Ubuntu was loaded as a virtual machine on the MBP. In fact, the device event file for the keyboard was empty. This is probably because keyboard input is routed through the VM interface and the data never gets written to the event file. I will need to read more about device event files before I can work on the Linux platform. I found PyKeylogger, a Python extension that is supposed to work for both Windows and Linux. I downloaded it on my Windows 8 machine but the installation process always fails. I will have to take a deeper look into my options for Windows as well.


Meanwhile, I have started my Stanford online courses in cryptography and startup engineering. Both courses will give me more knowledge and experience to help me with my project on cryptographic randomness. 

Monday, June 17, 2013

Week 6: More Studying

At the beginning of this week, I made some finishing touches to our online entropy collection project. I made my script for the NIST Beacon source more dynamic. Since the source provides new random bytes every minute, I set up the script to identify the last timestamp it left off at and start collecting data from a minute after it. Later on, Amir told me that after running the cron job to collect data from online sources, there was an issue with the file naming. I had to rethink the algorithm I had made for run_all.py. By switching the steps where I moved the files from the tmp directory to daily_results and renamed the files, I was able to fix this problem.

I have been continuing my study of Python through the Lutz manual. I have found it to be an extremely powerful language because of its cross-platform system programming capabilities. The os and glob modules make automation of any command line algorithms trivial. Having Python as part of my skill set will make it much easier for me to write scripts moving forward.

After finding out about it through a friend, I enrolled for the Stanford online cryptography course, which begins on June 17th. I feel that this course will expand my knowledge of cryptography beyond the basic concepts I learned in EECS 482. The course syllabus includes DES and AES block ciphers, collision resistant hashing, key derivation functions, Diffie-Hellman, RSA, and Merkle puzzles as well as several other topics.


On Friday, our team met with Professor Fu to discuss future work. We talked with Ari Juels about using subtle frequency variations in RFIDs to obtain randomness as well as thermal chamber testing. The first step is to order the RFIDs and with readers in order to begin testing. Those should take around a week to arrive to the lab. Until then, we will continue to focus on entropy available in desktop computers.

Monday, June 10, 2013

Week 5: Finishing up Online Sources

This week, I finished up our work with gathering entropy from online sources. I was able to circumvent the usage of twill for downloading random bytes from randomserver.dyndns.org by simply appending the form entry data to the URL. This way, I could utilize the Python URL library and the script to download 4096 random bytes became trivial. After that, I brought all of the online sources together. I standardized the output files of each script that was written and created a tmp folder for the output files to be stored in. Then, I wrote run_all.py, which uses the Python os extension to run each script from the command line. Once all the temporary files have been stored in tmp, the run_all script determines the timestamp, appends it to the file names, and moves the files to ./daily_results in the randomness github repository. Amir has set up a Cron job to execute run_all.py a couple times each day on his BICUSPID account, so the daily_results directory will continue to grow larger as we collect random bytes every day.

Throughout the week, I have been reading two different books to gain a general understanding of the Linux kernel and Python programming, which will help once I start writing scripts to isolate entropy sources and their collection for the Linux RNG.

Understanding the Linux Kernel (Bovet & Cesati): One important thing I learned from this book is that the open-source nature of the kernel has its advantages and disadvantages. On one hand, there is a strong base of Linux developers who are very talented programmers that understand how to make the operating system compact yet powerful. On the other hand, as I witnessed in trying to understand the RNG by looking at /drivers/char/random.c, the source code can get extremely messy. Another part of the overview of the kernel gave me an idea for a new source of entropy for the RNG. Linux dynamically links modules that contain code for file system management, device drivers, and other features so that main memory is not burdened with kernel code that may even go unused most of the time. If the time between module linking and unlinking events could be measured, this would provide a strong source of entropy, especially since the delta would be relatively large compared to other sources such as keystrokes or mouse clicks.   


Programming Python (Lutz): I decided to take a thorough, in-depth approach to learning Python since I found it to be a very powerful scripting language when writing my scripts for online sources. I have been getting accustomed to the syntax and writing simple Python programs, experimenting with the interpreter behavior. In the book, I have covered keeping records, using dictionaries of dictionaries as a database, Pickle files, and shelves. I left off on the advantages of object-oriented programming within Python (structure, encapsulation, customization, etc.) and the syntax for defining classes.

Monday, June 3, 2013

Week 4: Scripting Difficulties

I went down to Tennessee to attend my cousin’s high school graduation commencement and spend my Memorial Day vacation with the family. When I came back to the lab on Wednesday, I started out by trying to dig deeper into the paper on boot-time entropy in embedded devices. I tried to decipher the confusing diagrams used in the Results and Analysis section, ranging from histograms to per-test distributions to graphs of entropy over correlation threshold. While I still do not fully understand some of the statistical analysis that was done, I believe I have a better understanding of the underlying conclusion. Even with the techniques proposed to gather boot-time entropy, embedded devices are still at a disadvantage because they lack randomness from radios that wireless devices can utilize and previously saved entropy that desktops can use. These headless devices need to gather strong entropy immediately following the boot, before a network connection even occurs.

Next, I watched Dr. Avi Wigderson’s talk on Randomness and Pseudorandomness, which had been sent to me by Amir. The talk introduced several interesting real-life applications of the importance of randomness such as the P vs. NP problem, which aims to find a deterministic problem that is exponentially hard to solve – something which is trivial for non-deterministic problems. Perhaps the most intriguing part of Dr. Wigderson’s lecture, however, was his observation that randomness is relative to the computing power of the observer. He uses the example of a coin toss. An observer equipped with only his or her own eyesight will probably not be able to correctly determine the outcome of the toss with greater than 50% success rate. On the other hand, if the observer is equipped with several high-speed cameras, state-of-the-art sensors, and sophisticated momentum-measuring software, he or she will be able to predict the outcome with 100% success.

Finally, I have been facing many obstacles in writing a script to automatically download random bytes from randomserver.dyndns.org. I am used to programming on Linux, so it has taken me some time to get used to my programming environment on Windows 8. I installed Python 3.3 and twill, a Python extension that automates HTML form filling and submission. I was not able to successfully install twill, and I spent a considerable time trying to figure out what was going wrong. Eventually, I found out that the open-source code for twill was written for Python 2.5 in which ‘print’ is not a function and therefore does not wrap the argument in parentheses. I briefly attempted to write a script that would fix the print statements in all the Python scripts that were part of the twill code. I quickly realized, however, that not all the print statements outputted to the terminal, which would make fixing the scripts much more difficult. Also, there might be other differences between versions 2.5 and 3.3 that might make the installation fail anyway. My next step will be to install Python 2.5 so I can successfully install twill and get the random byte download script to work.