========================[ Readings ]======================== Android chose Java for its language for a variety of reasons, such as ready availability of many Java programmers who would not need to learn a new language to start contributing to the ecosystem. Such bets have paid off for Microsoft (compare Steve Ballmer's famous speech, https://www.youtube.com/watch?v=1VgVJpVx9bc). However, Java's virtual machine, the JVM, was never optimized for mobile devices, neither for speed nor for bytecode size. Thus Android opted for a new virtual machine, Dalvik, which used completely different bytecode and architecture, register-based vs JVM's stack-based. Dalvik VM has been superseded by ART, the new VM. Differences between them largely boil down to using pre-compilation to native code vs Just-in-Time compilation of bytecode that is repeatedly executed. (See https://source.android.com/devices/tech/dalvik/ and https://www.slideshare.net/limaniBhavik/artaot-vs-dalvikjit for more details on ART.) ========================[ JVM bytecode ]======================== Java compiler builds .class files that contain class metadata and JVM bytecode. That bytecode will be translated to Dalvik bytecode, and all .class files will be merged into a DEX file later, but it pays to understand how Java bytecode works. Here's a short tutorial, explaining the stack machine model of the JVM: https://www.beyondjava.net/blog/java-programmers-guide-java-byte-code/ You can disassemble class files into readable bytecode with "javap", which comes in Java's JDK alongside the compiler javac. E.g., here's the disassembly of my PlumbBob's MainActivity class file. Remember, this class file does not make it into your APK; it just gets saved as a byproduct of the pipeline .java -> .class -> DEX -> ODEX cd ~/AndroidStudioProjects/PlumbBob javap -l -s -c ./app/build/intermediates/classes/debug/com/netfluke/sergey/plumbbob/MainActivity.class | less This is the Java code, with line numbers: 34: @Override 35: protected void onPause() { 36: super.onPause(); 37: mSensorManager.unregisterListener(this, mGravity); 38: } This is what onPause looks like in disassembly: Note that fields and methods are stored in tables and referred to by their index; the actual strings appear in //-comments, generated by the disassembler itself. My comments are marked with <-- . protected void onPause(); descriptor: ()V <-- "V" stands for void return value Code: 0: aload_0 <-- "this" pointer is in variable 0, gets pushed to the stack 1: invokespecial #16 // Method android/support/v7/app/AppCompatActivity.onPause:()V 4: aload_0 <-- and again, to get the mSensorManager field next: 5: getfield #10 // Field mSensorManager:Landroid/hardware/SensorManager; 8: aload_0 <-- we need two copies of "this", the literal 1st argument, 9: aload_0 <-- and its mGravity member field, obtained next: 10: getfield #13 // Field mGravity:Landroid/hardware/Sensor; 13: invokevirtual #17 // Method android/hardware/SensorManager.unregisterListener:(Landroid/hardware/SensorEventListener;Landroid/hardware/Sensor;)V <-- #17 is the unregisterListener method. 16: return LineNumberTable: <-- Java line to offset of the bytecode it's compiled into line 35: 0 line 36: 4 line 37: 16 LocalVariableTable: Start Length Slot Name Signature 0 17 0 this Lcom/netfluke/sergey/plumbbob/MainActivity; Note that Java class names are represented by starting the token that expresses them with "L", and ending it with ";". Dots turn into "/"s. So "Lcom/netfluke/sergey/plumbbob/MainActivity;" stands for "com.netfluke.sergey.plumbbob.MainActivity". I stands for Integer, F for float, V for void. [I stands for an array of integers, [F for an array of floats, [Object for an array of Object, and so on. This is the JVM bytecode specification: https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-6.html Note that the JVM is aware of arrays and objects. Its stack slots can include references to objects or arrays, and bytecodes include "anewarray" (https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-6.html#jvms-6.5.anewarray) for creating a new array of the given size and pushing the reference to it on top of the stack---hiding all the details of allocating the array in the heap. Another related bytecode is arraylength, which consumes (pops off the stack) a reference to an array and pushes its length on top of the stack: https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-6.html#jvms-6.5.arraylength ========================[ Dalvik/ART bytecode ]======================== JVM bytecode is excessively large for mobile platforms. It gets translated to Dalvik bytecode, which is very different in concept and implementation. It then gets further optimized, by translating the DEX file to an ODEX file, which is only guaranteed to run correctly on a specific virtual machine. This talk explains Dalvik bytecode vs JVM's one: https://www.slideshare.net/paller/understanding-the-dalvik-bytecode-with-the-dedexer-tool The dedexer tool mentioned in this talk is here: http://dedexer.sourceforge.net/ (download ddx1.26.jar and place in the current directory). Here is me using this tool to disassemble Varun's old APK. My comments start with "//": $ wget http://www.cs.dartmouth.edu/~campbell/cs65/myruns/apk/MyRuns-Android-chk2.apk $ unzip -d MyRuns2 MyRuns-Android-chk2.apk $ java -jar ddx1.26.jar -d dex MyRuns2/classes.dex $ less dex/com/varunmishra/myruns2/MainActivity.ddx (output follows) // Current class, its parent, and file: .class public com/varunmishra/myruns2/MainActivity .super android/app/Activity .source MainActivity.java // Member fields of the class: .field private mSlidingTabLayout Lcom/varunmishra/myruns2/view/SlidingTabLayout; .field private mViewPageAdapter Lcom/varunmishra/myruns2/ViewPageAdapter; .field private mViewPager Landroid/support/v4/view/ViewPager; // Constructor: .method public ()V .limit registers 1 ; this: v0 (Lcom/varunmishra/myruns2/MainActivity;) .line 12 invoke-direct {v0},android/app/Activity/ ; ()V return-void .end method // onCreate(..), returns void "V", takes android.os.Bundle as argument: .method protected onCreate(Landroid/os/Bundle;)V // only 5 local variables needed for this function: .limit registers 5 // v3 is "this", v4 holds the Bundle argument, v0..v2 are scratch space ; this: v3 (Lcom/varunmishra/myruns2/MainActivity;) ; parameter[0] : v4 (Landroid/os/Bundle;) // Likely code line: super.onCreate(bundle); .line 21 invoke-super {v3,v4},android/app/Activity/onCreate ; onCreate(Landroid/os/Bundle;)V // Likely code line: this.setContentView(R.layout.main_activity.xml); .line 22 const v0,2130968599 ; 0x7f040017 invoke-virtual {v3,v0},com/varunmishra/myruns2/MainActivity/setContentView ; setContentView(I)V // this.mSlidingTabLayout = (SlidingTabLayout) findViewById(R.id.sliding_tab_layout); .line 24 const v0,2131689556 ; 0x7f0f0054 invoke-virtual {v3,v0},com/varunmishra/myruns2/MainActivity/findViewById ; findViewById(I)Landroid/view/View; move-result-object v0 check-cast v0,com/varunmishra/myruns2/view/SlidingTabLayout iput-object v0,v3,com/varunmishra/myruns2/MainActivity.mSlidingTabLayout Lcom/varunmishra/myruns2/view/SlidingTabLayout; <...> Actual bytecode bytes are specified here: https://source.android.com/devices/tech/dalvik/dalvik-bytecode Syntactic specification of the decompiled bytecode is based on the framework called Jasmin: http://jasmin.sourceforge.net/guide.html http://jasmin.sourceforge.net/instructions.html (see, e.g., "Method invokation" for explained method call syntax). More about decompiling DEX bytecode: http://mariokmk.github.io/programming/2015/03/06/learning-android-bytecode.html ================={ Patching an APK using the above ]================= In 2015, SoundCloud changed its app from being able to cache hours of playable music to dropping the cache feature and forcing the user to "stream" the music over and over , wasting cellular bandwidth at best and being unavailable at worst (e.g., on the subway). This very poor move may have been necessitated by the legal climate at the time, but it left the users high and dry (or paying exorbitant fees). This blogpost analyses just what the change was: http://androidcracking.blogspot.com/2015/07/increase-soundcloud-cache.html =================[ DEX format & its manipulations ]================= Slides 5--13 explain the DEX format: http://archive.hack.lu/2013/AbusingDalvikBeyondRecognition.pdf The rest of the talk is devoted to exploiting a difference in parsers for the DEX format to bypass verification of the format, in order to cause unexpected execution on the Dalvik VM. ========================[ Obfuscation ]======================== Note that in all of the above examples, string names are available for types and most variables. This is very helpful for understanding the Java code that resulted in the particular bytecode, but this might not always be there in a compiled app. Obfuscators such as proGuard packaged with Android Studio can rename all strings to random character sequences, to make the reverse engineers' life harder. Here is some discussion of obfuscation: https://d3gpjj9d20n0p3.cloudfront.net/fortiguard/research/obfusk_caro.pdf https://rednaga.io/2016/07/31/detecting_pirated_and_malicious_android_apps_with_apkid/