From f535940bd55f8d659a5a26370665c5567c6ba55e Mon Sep 17 00:00:00 2001 From: Graydon Hoare Date: Fri, 13 Jan 2012 15:42:59 -0800 Subject: [PATCH] Import task and memory-model docs. --- doc/rust.md | 222 +++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 213 insertions(+), 9 deletions(-) diff --git a/doc/rust.md b/doc/rust.md index b498751e8806..534ba9c6c5b8 100644 --- a/doc/rust.md +++ b/doc/rust.md @@ -736,13 +736,24 @@ macro-generated and user-written code can cause unintentional capture. Future versions of Rust will address these issues. -# Memory and concurrency model + +# Memory and concurrency models + +Rust has a memory model centered around concurrently-executing _tasks_. Thus +its memory model and its concurrency model are best discussed simultaneously, +as parts of each only make sense when considered from the perspective of the +other. + +When reading about the memory model, keep in mind that it is partitioned in +order to support tasks; and when reading about tasks, keep in mind that their +isolation and communication mechanisms are only possible due to the ownership +and lifetime semantics of the memory model. ## Memory model -A Rust task's memory consists of a static set of *items*, a set of tasks -each with its own *stack*, and a *heap*. Immutable portions of the -heap may be shared between tasks, mutable portions may not. +A Rust [task](#tasks)'s memory consists of a static set of *items*, a set of +tasks each with its own *stack*, and a *heap*. Immutable portions of the heap +may be shared between tasks, mutable portions may not. Allocations in the stack consist of *slots*, and allocations in the heap consist of *boxes*. @@ -774,10 +785,12 @@ shared or unique boxes, and/or references. Sharing memory between tasks can only be accomplished using *unsafe* constructs, such as raw pointer operations or calling C code. -When a task sends a value of *unique* kind over a channel, it loses -ownership of the value sent and can no longer refer to it. This is statically -guaranteed by the combined use of "move semantics" and unique kinds, within -the communication system. +When a task sends a value satisfying the `send` interface over a channel, it +loses ownership of the value sent and can no longer refer to it. This is +statically guaranteed by the combined use of "move semantics" and the +compiler-checked _meaning_ of the `send` interface: it is only instantiated +for (transitively) unique kinds of data constructor and pointers, never shared +pointers. When a stack frame is exited, its local allocations are all released, and its references to boxes (both shared and owned) are dropped. @@ -831,7 +844,6 @@ fn incr(&i: int) { } ~~~~~~~~ - ### Memory boxes A _box_ is a reference to a heap allocation holding another value. There @@ -899,6 +911,198 @@ fn main() { } ~~~~~~~~ +## Tasks + +An executing Rust program consists of a tree of tasks. A Rust _task_ +consists of an entry function, a stack, a set of outgoing communication +channels and incoming communication ports, and ownership of some portion of +the heap of a single operating-system process. + +Multiple Rust tasks may coexist in a single operating-system process. The +runtime scheduler maps tasks to a certain number of operating-system threads; +by default a number of threads is used based on the number of concurrent +physical CPUs detected at startup, but this can be changed dynamically at +runtime. When the number of tasks exceeds the number of threads -- which is +quite possible -- the tasks are multiplexed onto the threads ^[This is an M:N +scheduler, which is known to give suboptimal results for CPU-bound concurrency +problems. In such cases, running with the same number of threads as tasks can +give better results. The M:N scheduling in Rust exists to support very large +numbers of tasks in contexts where threads are too resource-intensive to use +in a similar volume. The cost of threads varies substantially per operating +system, and is sometimes quite low, so this flexibility is not always worth +exploiting.] + + +### Communication between tasks + +With the exception of *unsafe* blocks, Rust tasks are isolated from +interfering with one another's memory directly. Instead of manipulating shared +storage, Rust tasks communicate with one another using a typed, asynchronous, +simplex message-passing system. + +A _port_ is a communication endpoint that can *receive* messages. Ports +receive messages from channels. + +A _channel_ is a communication endpoint that can *send* messages. Channels +send messages to ports. + +Each port is implicitly boxed and mutable; as such a port has a unique +per-task identity and cannot be replicated or transmitted. If a port value is +copied, both copies refer to the *same* port. New ports can be +constructed dynamically and stored in data structures. + +Each channel is bound to a port when the channel is constructed, so the +destination port for a channel must exist before the channel itself. A channel +cannot be rebound to a different port from the one it was constructed with. + +Channels are weak: a channel does not keep the port it is bound to +alive. Ports are owned by their allocating task and cannot be sent over +channels; if a task dies its ports die with it, and all channels bound to +those ports no longer function. Messages sent to a channel connected to a dead +port will be dropped. + +Channels are immutable types with meaning known to the runtime; channels can +be sent over channels. + +Many channels can be bound to the same port, but each channel is bound to a +single port. In other words, channels and ports exist in an N:1 relationship, +N channels to 1 port. ^[It may help to remember nautical terminology +when differentiating channels from ports. Many different waterways -- +channels -- may lead to the same port.} + +Each port and channel can carry only one type of message. The message type is +encoded as a parameter of the channel or port type. The message type of a +channel is equal to the message type of the port it is bound to. The types of +messages must satisfy the `send` built-in interface. + +Messages are generally sent asynchronously, with optional rate-limiting on the +transmit side. A channel contains a message queue and asynchronously sending a +message merely inserts it into the sending channel's queue; message receipt is +the responsibility of the receiving task. + +Messages are sent on channels and received on ports using standard library +functions. + + +### Task lifecycle + +The _lifecycle_ of a task consists of a finite set of states and events +that cause transitions between the states. The lifecycle states of a task are: + +* running +* blocked +* failing +* dead + +A task begins its lifecycle -- once it has been spawned -- in the *running* +state. In this state it executes the statements of its entry function, and any +functions called by the entry function. + +A task may transition from the *running* state to the *blocked* state any time +it makes a blocking recieve call on a port, or attempts a rate-limited +blocking send on a channel. When the communication expression can be completed +-- when a message arrives at a sender, or a queue drains sufficiently to +complete a rate-limited send -- then the blocked task will unblock and +transition back to *running*. + +A task may transition to the *failing* state at any time, due being +killed by some external event or internally, from the evaluation of a +`fail` expression. Once *failing*, a task unwinds its stack and +transitions to the *dead* state. Unwinding the stack of a task is done by +the task itself, on its own control stack. If a value with a destructor is +freed during unwinding, the code for the destructor is run, also on the task's +control stack. Running the destructor code causes a temporary transition to a +*running* state, and allows the destructor code to cause any subsequent +state transitions. The original task of unwinding and failing thereby may +suspend temporarily, and may involve (recursive) unwinding of the stack of a +failed destructor. Nonetheless, the outermost unwinding activity will continue +until the stack is unwound and the task transitions to the *dead* +state. There is no way to "recover" from task failure. Once a task has +temporarily suspended its unwinding in the *failing* state, failure +occurring from within this destructor results in *hard* failure. The +unwinding procedure of hard failure frees resources but does not execute +destructors. The original (soft) failure is still resumed at the point where +it was temporarily suspended. + +A task in the *dead* state cannot transition to other states; it exists +only to have its termination status inspected by other tasks, and/or to await +reclamation when the last reference to it drops. + + +### Task scheduling + +The currently scheduled task is given a finite *time slice* in which to +execute, after which it is *descheduled* at a loop-edge or similar +preemption point, and another task within is scheduled, pseudo-randomly. + +An executing task can yield control at any time, by making a library call to +`std::task::yield`, which deschedules it immediately. Entering any other +non-executing state (blocked, dead) similarly deschedules the task. + + +### Spawning tasks + +A call to `std::task::spawn`, passing a 0-argument function as its single +argument, causes the runtime to construct a new task executing the passed +function. The passed function is referred to as the _entry function_ for +the spawned task, and any captured environment is carries is moved from the +spawning task to the spawned task before the spawned task begins execution. + +The result of a `spawn` call is a `std::task::task` value. + +An example of a `spawn` call: + +~~~~ +import std::task::*; +import std::comm::*; + +fn helper(c: chan) { + // do some work. + let result = ...; + send(c, result); +} + +let p: port; + +spawn(bind helper(chan(p))); +// let task run, do other things. +// ... +let result = recv(p); +~~~~ + + +### Sending values into channels + +Sending a value into a channel is done by a library call to `std::comm::send`, +which takes a channel and a value to send, and moves the value into the +channel's outgoing buffer. + +An example of a send: + +~~~~ +import std::comm::*; +let c: chan = ...; +send(c, "hello, world"); +~~~~ + + +### Receiving values from ports + +Receiving a value is done by a call to the `recv` method, on a value of type +`std::comm::port`. This call causes the receiving task to enter the *blocked +reading* state until a task is sending a value to the port, at which point the +runtime pseudo-randomly selects a sending task and moves a value from the head +of one of the task queues to the call's return value, and un-blocks the +receiving task. See [communication system](#communication-system). + +An example of a *receive*: + +~~~~~~~~ +import std::comm::*; +let p: port = ...; +let s: str = recv(p); +~~~~~~~~ + # Runtime services, linkage and debugging