~ chicken-core (master) /manual/Data representation
Trap1[[toc:]]2[[tags: manual]]34== Data representation56There exist two different kinds of data objects in the CHICKEN system:7immediate and non-immediate objects.89=== Immediate objects1011Immediate objects are represented by a single machine word, 32 or 64 bits depending on the architecture. They come in four different flavors:1213'''fixnums''', that is, small exact integers, where the lowest order bit is14set to 1. This gives fixnums a range of 31 bits for the actual15numeric value (63 bits on 64-bit architectures).1617'''characters''', where the four lowest-order bits are equal to18{{C_CHARACTER_BITS}}, currently 1010. The Unicode code point19of the character is encoded in the next 24 bits, from 0x00 to 0x1FFFFF.20Code points from 0xDC80 to 0xDCFF are not allowed and used to21mark invalid UTF-8 bytes in strings (see below).2223'''booleans''', where the four lowest-order bits are equal to {{C_BOOLEAN_BITS}},24currently 0110. The next bit is one for #t and zero for #f.2526'''other values''': the empty list, the value of unbound identifiers,27the undefined value (void), and end-of-file. The four lowest-order bits are equal to28{{C_SPECIAL_BITS}}, currently 1110. The next four bits contain an identifying29number for this type of object, one of:30{{C_SCHEME_END_OF_LIST}}, currently 0000;31{{C_SCHEME_UNDEFINED}}, currently 0001;32{{C_SCHEME_UNBOUND}}, currently 0010; or33{{C_SCHEME_END_OF_FILE}}, currently 0011.3435=== Non-immediate objects3637Collectively, the two lowest-order bits are known as the ''immediate mark bits''. When the lowest bit is set, the object is a fixnum, as described above, and the next bit is part of its value. When the lowest bit is clear but the next bit is set, it is an immediate object other than a fixnum. If neither bit is set, the object is non-immediate, as described below.3839Non-immediate objects are blocks of data represented by a pointer into40the heap. The pointer's immediate mark bits must be zero to indicate the object is non-immediate;41this guarantees the data block is aligned on a 4-byte boundary, at minimum. Alignment of data words42is required on modern architectures anyway, so we get the ability to distinguish between immediate and non-immediate objects for free.4344The first word of the data block contains a header, which gives45information about the type of the object. The header is a46single machine word.4748The 24 (56 on 64-bit systems) lowest-order bits contain the length of49the data object, which is either the number of bytes in a string or50byte-vector, or the number of elements for a vector or record type. This51allows a maximum size for string or byte-vectors of 2^24 bytes, or52approximately 16 MB, on 32-bit systems, and 2^56 bytes, or approximately5372 PB, on 64-bit systems.5455The remaining bits are placed in the high-order end of the header.56The four highest-order bits are used for garbage57collection or internal data type dispatching.5859; C_GC_FORWARDING_BIT : Flag used for forwarding garbage collected object pointers.6061; C_BYTEBLOCK_BIT : Flag that specifies whether this data object contains raw bytes (a bytevector) or pointers to other data objects.6263; C_SPECIALBLOCK_BIT : Flag that specifies whether this object contains a ''special'' non-object pointer value in its first slot. An example for this kind of objects are closures, which are a vector-type object with the code-pointer as the first item. This is also used to turn a pair's car into a weak reference in the symbol table, to allow its symbol to be collected.6465; C_8ALIGN_BIT : Flag that specifies whether the data area of this block should be aligned on an 8-byte boundary (floating-point numbers, for example).6667After these four bits comes a 4-bit type code representing one of the following types:6869'''vectors''': vector objects with type bits {{C_VECTOR_TYPE}}, currently 0000.7071'''symbols''': vector objects with type bits {{C_SYMBOL_TYPE}},72currently 0001. The three slots contain the toplevel variable value,73the print-name (a string), and the property list of the symbol. When74manipulating these slots, the symbol table's container needs to be75manipulated as well, to control garbage collection of the symbol: if76the symbol is undefined and has no property list, the symbol table's77container should be a weak reference ({{C_WEAK_PAIR_TYPE}}), otherwise78it should be a strong reference ({{C_PAIR_TYPE}}).7980'''strings''': objects with type bits {{C_STRING_TYPE}}, currently 0010,81holding a bytevector with the string contents represented as an UTF-8 encoded82byte sequence. Note that invalid UTF-8 sequences are allowed inside strings,83extracted characters at positions that contain invalid UTF-884sequences are represented as the trailing surrogate pair half (U+DC00|xx).8586'''pairs''': vector-like object with type bits {{C_PAIR_TYPE}}, currently 0011.87The car and the cdr are contained in the first and second slots,88respectively.8990'''closures''': special vector objects with type bits91{{C_CLOSURE_TYPE}}, currently 0100. The first slot contains a pointer to a92compiled C function. Any extra slots contain the free variables (since93a flat closure representation is used).9495'''flonums''': byte-vector objects with type bits96{{C_FLONUM_BITS}}, currently 0101. Slots one and two (or a single slot on9764 bit architectures) contain a 64-bit floating-point number, in the98representation used by the host system's C compiler.99100'''bignums''': special vector objects with type bits101{{C_BIGNUM_TYPE}}, currently 0110. This contains only one slot, which102points to a bytevector that contains the number's limbs in a special103format: The first word contains a 1 if the number is negative, 0 if it104is positive. The remaining words form the bignum's limbs. A105bytevector is used because the limbs are stored in the raw machine106format, which would be invalid Scheme objects when viewed as slots in107a vector.108109'''ports''': special vector objects with type bits110{{C_PORT_TYPE}}, currently 0111. The first slot contains a pointer to a file-111stream, if this is a file-pointer, or NULL if not. The other slots112contain housekeeping data used for this port.113114'''structures''': vector objects with type bits115{{C_STRUCTURE_TYPE}}, currently 1000. The first slot contains a symbol that116specifies the kind of structure this record is an instance of. The other117slots contain the actual record items.118119'''bytevector''': a raw sequence of bytes with type bits {{C_BYTEVECTOR_TYPE}}.120121'''pointer-vectors''': vector objects of native pointers - these are122actually structures where the first slot holds a bytevector containing the 32- or 64-bit123pointer values.124125'''locatives''': special vector objects with type bits126{{C_LOCATIVE_TYPE}}, currently 1010. A locative object holds 4 slots:127a raw pointer to the location inside the object referred to by the locative,128the offset in bytes from the start of the object referred to, the type of129the location (whether it refers to an unboxed numeric value or a normal130object slot that holds a pointer to Scheme data) and a flag indicating131whether this locative is "weak". If the locative is non-weak, slot #4 holds132a pointer to the object referred to.133134'''pointers''': special vector objects with type bits135{{C_POINTER_TYPE}}, currently 1001. The single slot contains a machine pointer.136137'''tagged pointers''': special vector objects with type bits138{{C_TAGGED_POINTER_TYPE}}, currently 1011, Tagged pointers are similar to pointers,139but the object contains an additional140slot with a tag (an arbitrary data object) that identifies the type141of the pointer.142143'''ratnums''': vector-like objects with type-bits {{C_RATNUM_TYPE}},144currently 1100. The first slot contains the numerator (which can be145positive or negative), the second slot contains the denominator, which146is always positive. These numbers are always simplified, so their gcd147will always be 1.148149'''lambda infos''': byte-vector objects with type-bits {{C_LAMBDA_INFO_TYPE}}, currently 1101.150151'''cplxnums''': vector-like objects with type-bits {{C_CPLXNUM_TYPE}},152currently 1110. The first slot contains the real part, the second slot153contains the imaginary part of the complex number. These two numbers154are of matching exactness: Either both are flonums or none are.155156The actual data follows immediately after the header. Note that157block addresses are always aligned to the native machine-word158boundary.159160Data objects may be allocated outside of the garbage collected heap, as161long as their layout follows the above mentioned scheme. But care has to162be taken not to mutate these objects with heap-data (i.e. non-immediate163objects), because this will confuse the garbage collector.164165For more information see the header file {{chicken.h}}.166167168---169Previous: [[C interface]]170171Next: [[Modules]]