Hardcore Java /JVM tasks

Hardcore Java /JVM tasks Performance problems from the Contour already were , it's our turn: we present hardcore tasks from the Java conference JBreak 201? aka " hell from Excelsior ".
 
The tasks are given in original formulations, in each problem there can be several correct answers, and for each problem a solution is given under the spoiler.
 
Problem 1
 
Your colleague read the Java Language Specification and wrote the following:
 
void playWithRef () {
Object obj = new Object ();
WeakReference ref = new WeakReference <>(obj);
System.out.println (ref.get ()! = Null);
System.gc ();
System.out.println (ref.get ()! = Null);
}

 
And rake you: what results are possible?
post in our technical blog.
 
Problem 2
 
The wicked hacker removed the original Java file and mixed the pieces of your class file:
 
A: ??? ??? ???r3r31211. B: ???
C: 6a???f6c 616e 672f 4f62 6a???r3r31211. D: cafe babe ??? ???

 

Rearrange them in such a way that a verifiable class file is produced.


 
Answer and solution [/b]

The correct answer is D , C , A , B .


 
Solution [/b]

This task is rather smart, but still teaches something new.


 

is widely known. , that the file class begins with a 4-byte header 0xCAFEBABE , then D exactly goes first. Common sense suggests that a short piece of B goes last - this is the tail.


 

Further it was possible to recollect, that in a class-file there is ConstantPool , in which there are string constants consisting of a two-byte length and the actual string encoded in UTF-8. The only piece resembling UTF-8 is a piece of C Is the UTF-8 representation of the string java /lang /Object (a link to the super class of our class). So before it should be bytes 0x0010 (the string has a length of 16), and the only suitable option is D , i.e. C - the second.


 

Alternatively, you could notice that the entire last line is B consists of zeros, so the penultimate one must end in zeros, that is, A !!


 
The output of javap is [/b]
    class C
minor version: 0
major version: 49
flags: ACC_SUPER
Constant pool:
# 1 = Class # 2 //java /lang /Object
# 2 = Utf8 java /lang /Object
# 3 = Class # 4 //C
# 4 = Utf8 C
{
}

 

Problem 3


 

After listening to the next report about Graal, having inspired the JVM Compiler Interface, you decided to write your compiler for Java! And we decided to start by generating x86_64 code for the method:


 
    static boolean invert (boolean x) {
return! x;
}

 

Which generated code will be correct for this method?


 

Legend: Intel-syntax is used, the call convention is that on the register rcx lies the argument, at rax - result.


 
    A: test ecx, ecx
jnz True
mov eax, 1
ret
True: mov eax, 0
ret
B: xor eax, eax
test ecx, ecx
jnz End
add eax, 1
End: ret
C: mov eax, 1
sub eax, ecx
ret
D: mov eax, ecx
xor eax, 1
ret

 
Answer and solution [/b]

The correct answer is A , B .


 
Solution [/b]

Increasingly, in Java-conferences you can see assembler listings, but in case you are not yet familiar with Intel x86 instruction set. , below is the equivalent C code:


 
    A: res = (arg == 0)? 10;
B: res = 0; if (arg == 0) res + = 1;
C: res = 1; res - = arg;
D: res = arg; res ^ = 1;

 

In fact, all these inversion algorithms work correctly, as long as the input argument takes the usual logical values ​​of 0 and 1 .


 

Then the interesting begins. From the point of view of verifier all short integer types ( ? boolean , byte , char , short ) equivalents of type int . Moreover, , boolean -pecific byte-code instructions and does not exist at all. For example, Byte-code instructions The investigated method is as follows:


 
    public static boolean invert (boolean);
0: iload_0
1: ifne 8
4: iconst_1
5: goto 9
8: iconst_0
9: ireturn

 

Thus, the method that takes boolean , should be ready to work with any int th, and any nonzero value is treated as true . In this case, the "optimized" versions are C and D begin to behave incorrectly C (2) = -1 and D (2) = 3 , and more rectilinear A and B continue to work A (2) = B (2) = 0 .


 

To illustrate these subtleties you will have to manipulate the bytecode. Example Available on GitHub : in the method invert the numbers ? ? ? ? -1 are transmitted and the result, accompanied by calls ? is output. println (boolean) and println (int) .


 

Curious fact: in JDK 8 the HotSpot C2 compiler generated the variant D , and in JDK 9 the generation pattern has changed to more correct.


 
The code generated by HotSpot C2 on Intel x86_64 [/b]

In JDK ? the pattern is clearly visible. D and the output is incorrect:


 
    $ jdk8 /bin /java -Xcomp -Xbatch -XX: -TieredCompilation -XX: CompileCommand = print, Inverter.invert -XX: + UnlockDiagnosticVMOptions -XX: PrintAssemblyOptions = intel BooleanHell
Compiled method (c2) ??? Inverter :: invert (10 bytes)
# {method} {0x0000000012600d08} 'invert' '(Z) Z' in 'Inverter'
# parm0: rdx = boolean
#[sp+0x20](sp of caller)
0x00000000057d7ac0: sub rsp, 0x18
0x00000000057d7ac7: mov QWORD PTR[rsp+0x10], rbp; * synchronization entry
; - Inverter :: invert @ -1 (line 3)
0x00000000057d7acc: mov eax, edx
0x00000000057d7ace: xor eax, 0x1; * ireturn
; - Inverter :: invert @ 9 (line 3)
0x00000000057d7ad1: add rsp, 0x10
0x00000000057d7ad5: pop rbp
0x00000000057d7ad6: test DWORD PTR[rip+0xfffffffffdf58524], eax # 0x0000000003730000
; {poll_return}
0x00000000057d7adc: ret
false (0) -> true (1)
true (1) -> false (0)
true (2) -> true (3)
true (3) -> true (2)
true (-1) -> true (-2)

 

In JDK 9 improved normalization boolean values: the input argument was added to the range {? 1} (instructions ? test and setne ) and the result became correct:


 
    $ jdk9 /bin /java -Xcomp -Xbatch -XX: -TieredCompilation -XX: CompileCommand = print, Inverter.invert -XX: + UnlockDiagnosticVMOptions -XX: PrintAssemblyOptions = intel BooleanHell
Compiled method (c2) ??? Inverter :: invert (10bytes)
# {method} {0x000001fa974d2dc0} 'invert' '(Z) Z' in 'Inverter'
# {method} {0x000001fa974d2dc0} 'invert' '(Z) Z' in 'Inverter'
# parm0: rdx = boolean
#[sp+0x20](sp of caller)
0x000001fafcb57720: sub rsp, 0x18
0x000001fafcb57727: mov QWORD PTR[rsp+0x10], rbp; * synchronization entry
; - Inverter :: invert @ -1 (line 3)
0x000001fafcb5772c: test edx, edx
0x000001fafcb5772e: setne al
0x000001fafcb57731: movzx eax, al
0x000001fafcb57734: xor eax, 0x1; * ireturn {reexecute = 0 rethrow = 0 return_oop = 0}
; - Inverter :: invert @ 9 (line 3)
0x000001fafcb57737: add rsp, 0x10
0x000001fafcb5773b: pop rbp
0x000001fafcb5773c: test DWORD PTR[rip+0xfffffffffdf688be], eax # 0x000001fafaac0000
; {poll_return}
0x000001fafcb57742: ret
false (0) -> true (1)
true (1) -> false (0)
true (2) -> false (0)
true (3) -> false (0)
true (-1) -> false (0)

 

Problem 4


 

Unexpectedly, you realized that you are very interested in what can bring up the challenge of this method:


 
    void guessWhat (Iterable  <?>  .x) {
System.out.println (x.getClass ());
}

 
  •  
  • A : class java.util.ArrayList  
  • B : null  
  • C : interface java.lang.Iterable  
  • D : class java.lang.Integer  

 
Answer and solution [/b]

The correct answer is A , D .


 
Solution [/b]

Variants B and C are impossible, since Object.getClass () always returns a nonzero class, and there are no instances of the interface type. Option A easily realized: guessWhat (new ArrayList ()) .


 

However, the version D is attainable: Integer does not implement the interface Iterable , but nevertheless its copy can come in this method. The clue is that the rigor of the standard Java language system has fallen again under the weakness of the standard JVM verifier system: any reference type is compatible with assignment with any interface. That is, almost everywhere where the interface type is expected (including parameters, return value, fields), you can pass any reference value (that is, arbitrary classes and arrays).


 

This effect is it is possible to demonstrate either by manipulating bytecode, or by partially recompiling class files.


 

Problem 5


 

Once again, believing in the infallibility of javac, you decided to experiment:


 
    class C {
private boolean getBoolean () {
return false;
}
}
interface I {
default boolean getBoolean () {
return true;
}
}
class D extends C implements I {}
public class Test {
public static void main (String[]a) {
foo (new D ());
}
public static void foo (I i) {
System.out.println (i.getBoolean ());
}
}

 

What happens when you try to compile and run class Test ?


 
  •  
  • A : will not compile.  
  • B : will be thrown away. java.lang.IllegalAccessError  
  • C : It is printed " true "  
  • D : It is printed " false "  

 
Answer and solution [/b]

The correct answer is B .


 
Solution [/b]

Many people believe that IllegalAccessError Is the lot of those who are too clever with partial recompilation or obfuscation. So it was with us, when ProGuard during obfuscation gave two different methods (one private, another default) the same names, and the resulting application began to throw IllegalAccessError .


 

However, it turned out that if two such methods will have the same names in the source code, then javac compiles them without any warnings, and during execution also throws out IllegalAccessError .


 

This behavior of the JVM is explained by how the target method is searched for the instruction invokeinterface . According to specifications , first instance-methods of the class and all super-classes are looked through, and only then is a suitable default method found among the super-interfaces, while the privacy of the found method is checked only after the whole process is completed.


 

Thus, the search ends on the private method getBoolean from the super class C , which stood in the way of finding the default method getBoolean from the super-interface I . After that, is already logically discarded. IllegalAccessError .


 

Interestingly, in Java 11 this is it is planned to change , and during the search process skip private methods.


 

Problem 6


 

Suddenly, you find yourself debugging the native code of a compiled Java application. You do not have source code, but you already found the problem method, here it is:


 
    1: lea rax,[rel _Test_foo]
2: push rax
3: mov eax, dword[rcx+0FH]
4: idiv dword[rdx+0FH]
5: mov rbx, qword[rel _Test_array]
6: mov ebx, dword[rbx+3BH]
7: add eax, ebx
8: ret 8

 

You suspected that the implementation of this method can trigger the release of Java exceptions of various types. It remains to understand which instructions can be to blame (indicate their numbers)?


 
  •  
  • StackOverflowError : _________  
  • NullPointerException : _________  
  • ArithmeticException : _________  
  • IndexOutOfBoundsException : _________  

 
Answer and solution [/b]

The correct answer is


 
 
StackOverflowError : 2  
NullPointerException : ? ? 6  
ArithmeticException : 4  
IndexOutOfBoundsException : no  

 
Solution [/b]

The compiler can generate exception checks in many ways. For example, before calling the object field, you can generate explicit check the object as opposed to null with an exception throw in case of failure. However, such explicit checks negatively affect the performance and size of the code. Therefore, the compiler tries to do are implicit checks: only the pointer dereference code is generated, which in the case of a null pointer will result in a hardware exception that the JVM will intercept, recognize and re-throw as a corresponding Java exception.


 

In this task, it was just necessary to find instructions that could provoke such implicit exceptions.


 

StackOverflowError occurs when you try to write /read the next stack slot outside the allowed range. This can occur in instruction push rax .


 

On the release of the implicit ArithmeticException there is also a single candidate: the instruction of integer division idiv dword[rdx+0FH] . If the dereferenced value is zero, the hardware division will go to zero with the subsequent emission of ArithmeticException .


 

Implicit checks where can be thrown. NullPointerException , are very popular in Java code. To find them, it is enough to consider all the places where something is dereferenced. Instruction mov rbx, qword[rel _Test_array] dereference static data at a relative address, so it can never lead to errors. But the instructions are mov eax, dword[rcx+0FH] , idiv dword[rdx+0FH] , mov ebx, dword[rbx+3BH] Dereference method parameters and read static data, that is, they can throw out NullPointerException .


 

Interestingly, the instruction idiv dword[rdx+0FH] contains at once two implicit checks, that sometimes can deliver a lot of problems JVM .


 

Implicit check for IndexOutOfBoundsException must be in the instruction referring to the element of the array. The hint is the reading of a certain _Test_array on the register and its dereference in instructions 5 and 6 . However, it should be noted that with such a template for accessing an array element, indexes beyond the valid range will simply access the memory in the heap adjacent to the array, which does not provoke any hardware exceptions. Therefore, on most processor architectures, the validation is IndexOutOfBoundsException are explicitly generated. However, in rare cases, the compiler can prove that such a check is completely unnecessary, which is what happens in this task. That is, there can not be thrown out at all. IndexOutOfBoundsException .


 

Problem 7


 

The evil hacker again hacked your computer and edited it in the hex editor Helper.class in such a way that the end of the method is sayC became unverifiable:


 
    public class Main {
public static void main (String[].args) {
System.out.print ("A");
Helper.sayB ();
Helper.sayC ();
}
}
public class Helper {
public static void sayB () {
System.out.print ("B");
}
public static void sayC () ​​{
System.out.print ("C");
//bad bytecode goes here
}
}

 

What happens when you run class Main ?


 
  •  
  • A : will be thrown away. VerifyError  
  • B : It is printed " A "And will throw VerifyError  
  • C : It is printed " AB "And will throw VerifyError  
  • D : It is printed " ABC "And will throw VerifyError  

 
Answer and solution [/b]

The correct answer is B .


 
Solution [/b]

Verification of the bytecode of a certain class works before any method of this class is executed. In the class Helper there is an unverifiable method sayC , then the whole class is completely unverifiable. So the options are C and D exactly wrong: the performance will never reach the method. sayB .


 

Next, you need to understand at what point is thrown VerifyError . According to the specification , link resolution errors should be thrown out when the reference is required for execution, even if the JVM has a strong link resolution (all links are resolved immediately when the class is loaded). In this task, reference is to Helper It is necessary only after the output of " A ", So the correct answer is
B
.
 
Case study demonstrates the described behavior. The unverifiable bytecode is obtained with the help of manual manipulations.
 
Nikita Lipsky, aka
? told us more about verification of the Java bytecode. pjBooms
at JBreak 2018 (so far only slides) and at JPoint 2017 ( there is a video of ).
 
Conclusion
 
Although at the conference some were frightened by the assembler, there were quite a few people who decided to immerse themselves in the subtleties of the JVM work: everyone who passed the tasks to our stand, we conducted an express course on the subtleties of bytecode, verifier and implicit exceptions. I hope, and you, after reading the solution,
learned something new
. If so, our goal is achieved!
And finally a poll. What tasks did you like?
1 pro WeakReference
2 about the mixed class-file
3 about the generation! X
4 about the interface variables
5 about the default and private methods
6 about implicit exceptions
7 about the verifier
92 people have already voted. Abstained 56 users.
Only registered users can participate in the survey. Enter , you are welcome.

+ 0 -

Add comment