Most apps on your phone can interact with the Internet. By analysing the details of these interactions, very interesting information can be acquired. For example, is the app phoning home your personal data? Or maybe leaking interesting information like passwords and API keys. In this post I'll explain my favorite method for capturing and analysing data sent from Android apps. In short, the app Packet Capture is used capture data and then a homemade script is used to export the data from the phone to the computer for further analysis. Also note that no root access is needed.
The normal approach for doing capture and analysis on Android is to set up a wifi hotspot on your computer and tunnel all your phones traffic through it. The problem with this method is that it requires an extra computer. This can be quite inconvenient if the app needs to interact with the physical world, e.g scanning barcodes or connecting to a specific network. Packet Capture solves this problem since it can both capture and decrypt the information on the fly, no extra computer needed. It does, however, lack the ability to efficiently export all the data to a computer for deeper analysis.
Thus, the goal was to somehow extract all the data from the app. Android has the capability to backup apps using
adb backup command. The good, or in our case bad, thing is that the data is encrypted. Even though you get to choose the password for the encryption, there is no official guide on how to decrypt the data. As it turns out, someone else had already solved this problem and posted the code, android-backup-extractor. If you're interested in how the algorithm works I suggest you read the author's article Unpacking Android backups. The following three commands can be used to extract and decrypt all the data.
adb backup app.greyshirts.sslcapture
java -jar android-backup-extractor-20160710-bin/abe.jar unpack backup.ab sslcapture_backup.tar password
tar -xvf sslcapture_backup.tar
With full access to the decrypted data it was time to decode and interpret it. Each capture is saved in a binary file called captureN.dat, where N is an id number. The content of the files had the following structure.
00000000 94 b5 9f c0 59 01 00 00 00 00 00 00 bb 03 00 00 |....Y...........|
00000010 47 45 54 20 2f 70 61 67 65 61 64 2f 6a 73 2f 72 |GET /pagead/js/r|
The "GET /..." should look familiar. The first 16 bytes on the other hand might not. After some trial and error it became clear that the first 16 bytes were actually divided into three parts. First of all, an 8 byte timestamp measured in milliseconds. The second part was a 4 byte integer determining direction of the packet. Finally, 4 bytes containing the length of the packet. In python this can be expressed as:
(timestamp, direction, length) = struct.unpack('Qii', data[:16]).
Information about ip addresses, which apps were captured, etc., can be found in the database file called schedule.db.
Since Packet Capture doesn't save the full Ethernet packet, it is not possible to analyse the traffic in wireshark. However, to make the data simpler to handle I've written a script, rev.py, that will convert all the captured data into a single JSON file. This file should be easy the parse and convert to your favorite format.