Generation of type objects on the fly (or madness with Rust)

 3r3747. 3r3-31. 3r33737. In this article we will have a little fun with the programming language Rust, and in particular, with type-objects. 3r33737. 3r33737.  3r3747. 3r33737. When I got acquainted with Rust, one of the details of the implementation of type-objects seemed to me interesting. Namely, the fact that the virtual table of functions is not in the data itself, but in the "thick" pointer to them. Each pointer to a type-object) contains a pointer to the data itself, as well as a link to a virtual table, which will contain the addresses of functions that implement this type-object for a given structure (but since this is an implementation detail, the behavior may change. 3r3373735. leaking :
3r33737.  3r3747.
fn annotate <'a> (input: & 'a String, type_name: & str) -> &' a dyn Object {
let b = Box :: new (Wrapper {
value: input,
type_name: type_name.into (),
}); 3r3747. Box :: leak (b)
}

3r33737.  3r3747. 3r33737. and the test passes! 3r33737. 3r33737.  3r3747. 3r33737. But this is some kind of dubious decision. Not only do we still allocate memory for each "annotation", so also the memory leaks ( Box :: leak Returns a reference to the data stored on the heap, but "forgets" the box itself, that is, automatic release will not occur). 3r33737. 3r33737.  3r3747.

Approach number 2: arena! 3r33737. 3r33737.  3r3747. 3r33737. To begin with, let's try to save these wrappers somewhere so that they would still be released at some point. But at the same time saving the signature annotate as it is. That is, to return a link with reference counting (for example, 3r33737. Rc 3r3-33235. ) Is not appropriate. 3r33737. 3r33737.  3r3747. 3r33737. The simplest option is to create an auxiliary structure, a “type system”, which will be responsible for storing these wrappers. And when we finish, we will free this structure and all the wrappers with it. 3r33737. 3r33737.  3r3747. 3r33737. Something like this. library is used. 3r33737. typed-arena 3r3734. for storing wrappers, but you could get along with type Vec Most importantly, ensure that Wrapper 3r3726. does not move anywhere (in the nightly Rust it was possible to do this using 3r3-33227. pin API ): 3r33737.  3r3747.

    struct TypeSystem {
wrappers: typed_arena :: Arena 3r3747.}
3r3747. impl TypeSystem {
pub fn new () -> Self {
Self {3r3747. wrappers: typed_arena :: Arena :: new (),
}
}
3r3747. ///The result borrows from the `input` parameter, and at the same time should live less,
///than the type system (otherwise a situation is possible when all the wrappers are released,
///and there will be more links to them)! 3r3747. pub fn annotate <'a: 'b, 'b> (
& 'A self,
Input: &' b String,
Type_name: & str
) -> & 'b dyn Object {
self.wrappers.alloc (Wrapper {
value: input,
type_name: type_name.into (),
})
}
}
3r33737.  3r3747. 3r33737. But what happened to the parameter responsible for the link lifetime of the type Wrapper 3r3726. ? We had to get rid of it, since we cannot attribute some fixed lifetime to type 3r3373725. typed_arena :: Arena . Each wrapper has a unique parameter, depending on input ! 3r33737. 3r33737.  3r3747. 3r33737. Instead, we sprinkle a little unsafe Rust to get rid of the lifetime parameter: 3r33737.  3r3747.
    struct Wrapper {3r3r7747. value: * const String,
type_name: String,
}
3r3747. impl Object for Wrapper {
fn type_name (& self) -> & str {
& self.type_name
}
3r3747. ///This conversion is safe, as we guarantee (via the signature
///`annotate`) that the reference to the wrapper (as part of the link to the type-object 3r3747. ///` & Object`) lives less, than the link to the data itself (`String`). 3r3747. fn as_string (& self) -> & String {
unsafe {& * self.value}
}
}
3r33737.  3r3747. 3r33737. And the tests pass again, thereby giving us confidence in the correctness of the decision. In addition to feeling a little awkwardness because of unsafe (as it should be, it is better not to joke with an insecure Rust!). 3r33737. 3r33737.  3r3747. 3r33737. But still, what about the promised version, which does not require additional memory allocations for wrappers? 3r33737. 3r33737.  3r3747.

Approach # 3: let the gates of hell open up

3r33737.  3r3747. 3r33737. Idea. For each unique "type" ("Widget", "Gadget"), we will create a virtual table. Hands, during the execution of the program. And we assign it to the link given to us to the data itself (which we have, as we remember, just String ). 3r33737. 3r33737.  3r3747. 3r33737. First, a small description of what we need to get. So, the link to the type of object, how is it arranged? In fact, these are just two pointers, one to the data itself, and the other to the virtual table. So we write: 3r33737.  3r3747.
    #[repr(C)]3r3747. struct TraitObject {
pub dаta: * const (),
pub vtable: * const (),
}
3r33737.  3r3747. 3r33737. ( # W2w2w26. We need to guarantee the correct location in memory). 3r33737. 3r33737.  3r3747. 3r33737. It seems everything is simple, we will generate a new table for the specified parameters and "collect" a link to the type-object! But what does this table consist of? 3r33737. 3r33737.  3r3747. 3r33737. The correct answer to this question would be “this is the implementation detail”. But we will do so; create a file rust-toolchain at the root of our project and write down there: 3r33737. nightly-2018-12-01 . After all, a fixed assembly can be considered stable, right? 3r33737. 3r33737.  3r3747. 3r33737. Now that we have fixed the version of Rust (in fact, we will need the nightly build for one of the libraries just below). 3r33737. 3r33737.  3r3747. 3r33737. After some Internet search 3r3734. we find out that the table format is simple: first there is a reference to the destructor, then two fields associated with memory allocation (type size and alignment), and then functions, one after the other (the order is at the discretion of the compiler, but we have only two functions, therefore the probability of guessing is quite large, 50%). 3r33737. 3r33737.  3r3747. 3r33737. So we write: 3r33737.  3r3747.
    #[repr(C)]3r3747. #[derive(Clone, Copy)]3r3747. struct VirtualTableHeader {
destructor_fn: fn (* mut ()), 3r3747. size: usize,
align: usize,
}
3r3747. #[repr(C)]3r3747. struct ObjectVirtualTable {
header: VirtualTableHeader,
type_name_fn: fn (* const String) -> * const str,
as_string_fn: fn (* const String) -> * const String,
}
3r33737.  3r3747. 3r33737. Similarly, #[repr(C)] need to guarantee the correct location in memory. I divided into two structures, a little later it will be useful to us. 3r33737. 3r33737.  3r3747. 3r33737. Now let's try to write our type system, which will provide the function 3r33725. annotate . We will need keshiroget generated tables, so let's get the cache: 3r33537. 3r33737.  3r3747.
    struct TypeInfo {
vtable: ObjectVirtualTable,
}
3r3747. #[derive(Default)]3r3747. struct TypeSystem {
infos: RefCell
3r33737.  3r3747. 3r33737. We use the internal state of RefCell to our function TypeSystem :: annotate could get 3r33737. & self as a shared link. This is important because we “borrow” from 3r33737. TypeSystem to ensure that the virtual tables we generated live longer than the reference to the type-object we return from 3r3725. annotate . 3r33737. 3r33737.  3r3747. 3r33737. Since we want to annotate a lot of instances, we cannot borrow & mut self as a mutable link. 3r33737. 3r33737.  3r3747. 3r33737. And we sketch this code: 3r33737.  3r3747.
    impl TypeSystem {
pub fn annotate <'a: 'b, 'b> (
& 'A self,
Input: &' b String,
Type_name: & str
) -> & 'b dyn Object {
let type_name = type_name.to_string (); 3r3747. let mut infos = self.infos.borrow_mut (); 3r3747. let imp = infos.entry (type_name) .or_insert_with (|| unsafe {
//Where do we get it, this table?
let vtable = unimplemented! ();
TypeInfo {vtable}
}); 3r3747. 3r3747. let object_obj = TraitObject {
dаta: input as * const String as * const (),
vtable: & imp.vtable as * const ObjectVirtualTable as * const (),
}; 3r3747. 3r3747. //Convert the constructed structure into a reference to the type-object
unsafe {std :: mem :: transmute :: (object_obj)}
}
}
3r33737.  3r3747. 3r33737. Where do we get this table from? The first three entries in it will match the entries for any other virtual table for the specified type. Therefore, just take and copy them. At first we will get this type: 3r33737.  3r3747.
    trait Whatever {}
impl Whatever for T {}
3r33737.  3r3747. 3r33737. It will be useful to us to get this "any other virtual table". And then, we copy these three records from him: 3r33737.  3r3747.
    let whatever = input as & dyn whatever; 3r3747. let whatever_obj = std :: mem :: transmute ::  <&dyn Whatever, TraitObject>  (whatever); 3r3747. let whatever_vtable_header = whatever_obj.vtable as * const VirtualTableHeader; 3r3747. let vtable = ObjectVirtualTable {
//Copy the records! 3r3747. header: * whatever_vtable_header,
type_name_fn: unimplemented! (),
as_string_fn: unimplemented! (),
}; 3r3747. 3r3747. TypeInfo {vtable}
3r33737.  3r3747. 3r33737. In principle, we could get the size and alignment through r3r3725. std :: mem :: size_of :: () and 3r33737. std :: mem :: align_of :: () . But from where it is still possible to “steal” the destructor, I do not know. 3r33737. 3r33737.  3r3747. 3r33737. Well, but where do we get the addresses of these functions, 3r33737. type_name_fn and 3r33737. as_string_fn ? You may notice that as_string_fn in general, it is not needed, the pointer to the data is always the first entry in the type-object representation. That is, this function is always the same: 3r33737.  3r3747.
    impl Object for String {
//3r3747. 3r3747. fn as_string (& self) -> String {
self
}
}
3r33737.  3r3747. 3r33737. But with the second function is not so easy! It depends on our type name, 3r-3725. type_name . 3r33737. 3r33737.  3r3747. 3r33737. It does not matter, we can simply generate this function in runtime. Take for this library 3r33636. 3r33737. dynasm 3r3734. (currently requires the nightly build of Rust). Read about
 3r3747. 3r33542. function calling conventions 3r3734. . 3r33737. 3r33737.  3r3747. 3r33737. For simplicity, let's assume that we are only interested in Mac OS and Linux (after all these fun transformations, we are not particularly worried about compatibility, right?). And, yes, only x86-6? of course. 3r33737. 3r33737.  3r3747. 3r33737. The second function, as_string easy to implement. We are promised that the first parameter will be in the register 3r33737. RDI . And return the value in 3r33737. RAX 3r3726. . That is, the function code will be something like: 3r33737.  3r3747.
    dynasm! (ops
; mov rax, rdi
; ret
);
3r33737.  3r3747. 3r33737. But the first function is a little trickier. First, we need to return & str , and this is a thick pointer. Its first part is a pointer to a string, and the second part is the length of a string slice. Fortunately, the convention above allows you to return 128-bit results using the 3-3-33725 register. EDX for the second part. 3r33737. 3r33737.  3r3747. 3r33737. It remains to get somewhere a link to the string slice, which contains our string type_name . Rely on type_name we do not want to (although through annotations of the lifetime you can guarantee that type_name will live longer than the returned value). 3r33737. 3r33737.  3r3747. 3r33737. But we have a copy of this string, which we put in the hash table. Crossing our fingers, we will make the assumption that the location of the string slice that will not return String :: as_str will not change from moving the line itself String 3r3726. (and move String will be in the process of changing the size of HashMap , where this string is stored by the key). I don’t know if the standard library guarantees this behavior, but how can we play just? 3r33737. 3r33737.  3r3747. 3r33737. We get the necessary components: 3r33737.  3r3747.
    let type_name_ptr = type_name.as_str (). as_ptr (); 3r3747. let type_name_len = type_name.as_str (). len ();    
3r33737.  3r3747. 3r33737. And we write this function: 3r33737.  3r3747.
    dynasm! (ops
; mov rax, QWORD type_name_ptr as i64
; mov rdx, QWORD type_name_len as i64
; ret
);
3r33737.  3r3747. 3r33737. And finally, the final code annotate : 3r33737.  3r3747.
    pub fn annotate  <'a: 'b, 'b>  (& 'a self, input: &' b String, type_name: & str) -> & 'b Object {
let type_name = type_name.to_string (); 3r3747. 3r3747. //Remember the location and length of the string slice
let type_name_ptr = type_name.as_str (). as_ptr (); 3r3747. let type_name_len = type_name.as_str (). len (); 3r3747. let mut infos = self.infos.borrow_mut (); 3r3747. let imp = infos.entry (type_name) .or_insert_with (|| unsafe {
let mut ops = dynasmrt :: x64 :: Assembler :: new (). unwrap ();
3r3747. ////Create code for the function ` our cnrnrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr mirie – rrrrrrrrrrrrrrrrrrrrrrrrrrrrrroffdrrrrrrrrrr ”–rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrho consho. – an aaaaaaaaa'a” li ”liq” liq ”
let type_name_offset = ops.offset (); 3r3r7747 dynasm! (ops
; mov rax, QrORD type_name_ptr as i64
; . //Create the code for the function `as_string`
Let as_string_offset = ops.offset ();
Dynasm! (Ops
; Mov rax, rdi
; Ret 3r3r7747.); 3r3r???.???.???.???. () .unwrap ();
//Copy parts from the similar table
let whatever = input as & dyn Whatever;
let whatever_obj =
std :: mem :: transmute :: <&dyn Whatever, TraitObject> (whatever);
Let whatever_vtable_header =
whatever_obj.vtable as * const VirtualTableHeader; 3r3747. let vtable = ObjectVirtualTable {
header: * whatever_vtable_header,
type_name_fn: std :: mem :: transmute (buffer.ptr (type_name_offset)),
as_string_fn: std :: mem :: transmute (buffer.ptr (as_string_offset)),
}; 3r3747. 3r3747. TypeInfo {vtable, buffer}
}); 3r3747. 3r3747. assert_eq! (imp.vtable.header.size, std :: mem :: size_of :: ()); 3r3747. assert_eq! (imp.vtable.header.align, std :: mem :: align_of :: ()); 3r3747. 3r3747. let object_obj = TraitObject {
dаta: input as * const String as * const (),
vtable: & imp.vtable as * const ObjectVirtualTable as * const (),
}; 3r3747. unsafe {std :: mem :: transmute :: (object_obj)}
}
3r33737.  3r3747. 3r33737. For the purpose of 3r33737. dynasm you need to add the field buffer in our structure TypeInfo . This field controls the memory that stores the code of our generated functions: 3r33737.  3r3747.
    #[allow(unused)]3r3747. buffer: dynasmrt :: ExecutableBuffer,    
3r33737.  3r3747. 3r33737. And all the tests pass! 3r33737. 3r33737.  3r3747. 3r337. Done, master! 3r33737. 3r33737.  3r3747. 3r33737. So you can easily and naturally generate your type-object implementations in the Rust code! 3r33737. 3r33737.  3r3747. 3r33737. The latter solution actively relies on implementation details and is therefore not recommended for use. But in reality you have to do what is necessary. Desperate times require desperate measures! 3r33737. 3r33737.  3r3747. 3r33737. There is, however, (yet) one feature that I rely on here. Namely, it is safe to release the memory occupied by the virtual table after there are no references to the type-object that uses it. On the one hand, it is logical that a virtual table can be used only through type-object references. On the other hand, the tables provided by Rust have a lifetime of r3r3725. 'static . It is quite possible to assume some code that will separate the table from the link for some of its own purposes (for example, for
for some dirty work.). 3r33737. 3r33737.  3r3747. 3r33737. The source code can be
find here 3r3734. . 3r33737. 3r33737. 3r3747. 3r3747. 3r3747. 3r33737. ! function (e) {function t (t, n) {if (! (n in e)) {for (var r, a = e.document, i = a.scripts, o = i.length; o-- ;) if (-1! == i[o].src.indexOf (t)) {r = i[o]; break} if (! r) {r = a.createElement ("script"), r.type = "text /jаvascript", r.async =! ? r.defer =! ? r.src = t, r.charset = "UTF-8"; var d = function () {var e = a.getElementsByTagName ("script")[0]; e. ): d ()}}} t ("//mediator.mail.ru/script/2820404/"""_mediator") () (); 3r3741. 3r3747. 3r33737. 3r3747. 3r3747. 3r3747. 3r3747.
+ 0 -

Add comment