add documentation for Memory

andrewrk · andrewrk · commit 567175f833b1 · 2019-03-18T21:40:24.000-04:00
closes #1904
diff --git a/doc/langref.html.in b/doc/langref.html.in
@@ -7928,13 +7928,261 @@ pub fn main() void {
 
       {#header_close#}
       {#header_open|Memory#}
-      <p>TODO: explain no default allocator in zig</p>
-      <p>TODO: show how to use the allocator interface</p>
-      <p>TODO: mention debug allocator</p>
-      <p>TODO: importance of checking for allocation failure</p>
-      <p>TODO: mention overcommit and the OOM Killer</p>
-      <p>TODO: mention recursion</p>
-      {#see_also|Pointers#}
+      <p>
+      The Zig language performs no memory management on behalf of the programmer. This is
+      why Zig has no runtime, and why Zig code works seamlessly in so many environments,
+      including real-time software, operating system kernels, embedded devices, and
+      low latency servers. As a consequence, Zig programmers must always be able to answer
+      the question:
+      </p>
+      <p>{#link|Where are the bytes?#}</p>
+      <p>
+      Like Zig, the C programming language has manual memory management. However, unlike Zig,
+      C has a default allocator - <code>malloc</code>, <code>realloc</code>, and <code>free</code>.
+      When linking against libc, Zig exposes this allocator with {#syntax#}std.heap.c_allocator{#endsyntax#}.
+      However, by convention, there is no default allocator in Zig. Instead, functions which need to
+      allocate accept an {#syntax#}*Allocator{#endsyntax#} parameter. Likewise, data structures such as
+      {#syntax#}std.ArrayList{#endsyntax#} accept an {#syntax#}*Allocator{#endsyntax#} parameter in
+      their initialization functions:
+      </p>
+      {#code_begin|test|allocator#}
+const std = @import("std");
+const Allocator = std.mem.Allocator;
+const assert = std.debug.assert;
+
+test "using an allocator" {
+    var buffer: [100]u8 = undefined;
+    const allocator = &std.heap.FixedBufferAllocator.init(&buffer).allocator;
+    const result = try concat(allocator, "foo", "bar");
+    assert(std.mem.eql(u8, "foobar", result));
+}
+
+fn concat(allocator: *Allocator, a: []const u8, b: []const u8) ![]u8 {
+    const result = try allocator.alloc(u8, a.len + b.len);
+    std.mem.copy(u8, result, a);
+    std.mem.copy(u8, result[a.len..], b);
+    return result;
+}
+      {#code_end#}
+      <p>
+      In the above example, 100 bytes of stack memory are used to initialize a
+      {#syntax#}FixedBufferAllocator{#endsyntax#}, which is then passed to a function.
+      As a convenience there is a global {#syntax#}FixedBufferAllocator{#endsyntax#}
+      available for quick tests at {#syntax#}std.debug.global_allocator{#endsyntax#},
+      however it is deprecated and should be avoided in favor of directly using a
+      {#syntax#}FixedBufferAllocator{#endsyntax#} as in the example above.
+      </p>
+      <p>
+      Currently Zig has no general purpose allocator, but there is
+      <a href="https://github.com/andrewrk/zig-general-purpose-allocator/">one under active development</a>.
+      Once it is merged into the Zig standard library it will become available to import
+      with {#syntax#}std.heap.default_allocator{#endsyntax#}. However, it will still be recommended to
+      follow the {#link|Choosing an Allocator#} guide.
+      </p>
+
+      {#header_open|Choosing an Allocator#}
+      <p>What allocator to use depends on a number of factors. Here is a flow chart to help you decide:
+      </p>
+      <ol>
+          <li>
+              Are you making a library? In this case, best to accept an {#syntax#}*Allocator{#endsyntax#}
+              as a parameter and allow your library's users to decide what allocator to use.
+          </li>
+          <li>Are you linking libc? In this case, {#syntax#}std.heap.c_allocator{#endsyntax#} is likely
+              the right choice, at least for your main allocator.</li>
+          <li>
+              Is the maximum number of bytes that you will need bounded by a number known at
+              {#link|comptime#}? In this case, use {#syntax#}std.heap.FixedBufferAllocator{#endsyntax#} or
+              {#syntax#}std.heap.ThreadSafeFixedBufferAllocator{#endsyntax#} depending on whether you need
+              thread-safety or not.
+          </li>
+          <li>
+              Is your program a command line application which runs from start to end without any fundamental
+              cyclical pattern (such as a video game main loop, or a web server request handler),
+              such that it would make sense to free everything at once at the end?
+              In this case, it is recommended to follow this pattern:
+              {#code_begin|exe|cli_allocation#}
+const std = @import("std");
+
+pub fn main() !void {
+    var direct_allocator = std.heap.DirectAllocator.init();
+    defer direct_allocator.deinit();
+
+    var arena = std.heap.ArenaAllocator.init(&direct_allocator.allocator);
+    defer arena.deinit();
+
+    const allocator = &arena.allocator;
+
+    const ptr = try allocator.create(i32);
+    std.debug.warn("ptr={*}\n", ptr);
+}
+              {#code_end#}
+              When using this kind of allocator, there is no need to free anything manually. Everything
+              gets freed at once with the call to {#syntax#}arena.deinit(){#endsyntax#}.
+          </li>
+          <li>
+              Are the allocations part of a cyclical pattern such as a video game main loop, or a web
+              server request handler? If the allocations can all be freed at once, at the end of the cycle,
+              for example once the video game frame has been fully rendered, or the web server request has
+              been served, then {#syntax#}std.heap.ArenaAllocator{#endsyntax#} is a great candidate. As
+              demonstrated in the previous bullet point, this allows you to free entire arenas at once.
+              Note also that if an upper bound of memory can be established, then
+              {#syntax#}std.heap.FixedBufferAllocator{#endsyntax#} can be used as a further optimization.
+          </li>
+          <li>
+              Are you writing a test, and you want to make sure {#syntax#}error.OutOfMemory{#endsyntax#}
+              is handled correctly? In this case, use {#syntax#}std.debug.FailingAllocator{#endsyntax#}.
+          </li>
+          <li>
+              Finally, if none of the above apply, you need a general purpose allocator. Zig does not
+              yet have a general purpose allocator in the standard library,
+              <a href="https://github.com/andrewrk/zig-general-purpose-allocator/">but one is being actively developed</a>.
+              You can also consider {#link|Implementing an Allocator#}.
+          </li>
+      </ol>
+      {#header_close#}
+
+      {#header_open|Where are the bytes?#}
+      <p>String literals such as {#syntax#}"foo"{#endsyntax#} are in the global constant data section.
+      This is why it is an error to pass a string literal to a mutable slice, like this:
+      </p>
+      {#code_begin|test_err|expected type '[]u8'#}
+fn foo(s: []u8) void {}
+
+test "string literal to mutable slice" {
+    foo("hello");
+}
+      {#code_end#}
+      <p>However if you make the slice constant, then it works:</p>
+      {#code_begin|test|strlit#}
+fn foo(s: []const u8) void {}
+
+test "string literal to constant slice" {
+    foo("hello");
+}
+      {#code_end#}
+      <p>
+      Just like string literals, `const` declarations, when the value is known at {#link|comptime#},
+      are stored in the global constant data section. Also {#link|Compile Time Variables#} are stored
+      in the global constant data section.
+      </p>
+      <p>
+      `var` declarations inside functions are stored in the function's stack frame. Once a function returns,
+      any {#link|Pointers#} to variables in the function's stack frame become invalid references, and
+      dereferencing them becomes unchecked {#link|Undefined Behavior#}.
+      </p>
+      <p>
+      `var` declarations at the top level or in {#link|struct#} declarations are stored in the global
+      data section.
+      </p>
+      <p>
+      The location of memory allocated with {#syntax#}allocator.alloc{#endsyntax#} or
+      {#syntax#}allocator.create{#endsyntax#} is determined by the allocator's implementation.
+      </p>
+      </p>TODO: thread local variables</p>
+      {#header_close#}
+
+      {#header_open|Implementing an Allocator#}
+      <p>Zig programmers can implement their own allocators by fulfilling the Allocator interface.
+      In order to do this one must read carefully the documentation comments in std/mem.zig and
+      then supply a {#syntax#}reallocFn{#endsyntax#} and a {#syntax#}shrinkFn{#endsyntax#}.
+      </p>
+      <p>
+      There are many example allocators to look at for inspiration. Look at std/heap.zig and
+      at this
+      <a href="https://github.com/andrewrk/zig-general-purpose-allocator/">work-in-progress general purpose allocator</a>.
+      TODO: once <a href="https://github.com/ziglang/zig/issues/21">#21</a> is done, link to the docs
+      here.
+      </p>
+      {#header_close#}
+
+      {#header_open|Heap Allocation Failure#}
+      <p>
+      Many programming languages choose to handle the possibility of heap allocation failure by
+      unconditionally crashing. By convention, Zig programmers do not consider this to be a
+      satisfactory solution. Instead, {#syntax#}error.OutOfMemory{#endsyntax#} represents
+      heap allocation failure, and Zig libraries return this error code whenever heap allocation
+      failure prevented an operation from completing successfully.
+      </p>
+      <p>
+      Some have argued that because some operating systems such as Linux have memory overcommit enabled by
+      default, it is pointless to handle heap allocation failure. There are many problems with this reasoning:
+      </p>
+      <ul>
+          <li>Only some operating systems have an overcommit feature.
+              <ul>
+                  <li>Linux has it enabled by default, but it is configurable.</li>
+                  <li>Windows does not overcommit.</li>
+                  <li>Embedded systems do not have overcommit.</li>
+                  <li>Hobby operating systems may or may not have overcommit.</li>
+              </ul>
+          </li>
+          <li>
+              For real-time systems, not only is there no overcommit, but typically the maximum amount
+              of memory per application is determined ahead of time.
+          </li>
+          <li>
+              When writing a library, one of the main goals is code reuse. By making code handle
+              allocation failure correctly, a library becomes eligible to be reused in
+              more contexts.
+          </li>
+          <li>
+              Although some software has grown to depend on overcommit being enabled, its existence
+              is the source of countless user experience disasters. When a system with overcommit enabled,
+              such as Linux on default settings, comes close to memory exhaustion, the system locks up
+              and becomes unusable. At this point, the OOM Killer selects an application to kill
+              based on heuristics. This non-deterministic decision often results in an important process
+              being killed, and often fails to return the system back to working order.
+          </li>
+      </ul>
+      {#header_close#}
+
+      {#header_open|Recursion#}
+      <p>
+      Recursion is a fundamental tool in modeling software. However it has an often-overlooked problem:
+      unbounded memory allocation.
+      </p>
+      <p>
+      Recursion is an area of active experimentation in Zig and so the documentation here is not final.
+      You can read a
+      <a href="https://ziglang.org/download/0.3.0/release-notes.html#recursion">summary of recursion status in the 0.3.0 release notes</a>.
+      </p>
+      <p>
+      The short summary is that currently recursion works normally as you would expect. Although Zig code
+      is not yet protected from stack overflow, it is planned that a future version of Zig will provide
+      such protection, with some degree of cooperation from Zig code required.
+      </p>
+      {#header_close#}
+
+      {#header_open|Lifetime and Ownership#}
+      <p>
+      It is the Zig programmer's responsibility to ensure that a {#link|pointer|Pointers#} is not
+      accessed when the memory pointed to is no longer available. Note that a {#link|slice|Slices#}
+      is a form of pointer, in that it references other memory.
+      </p>
+      <p>
+      In order to prevent bugs, there are some helpful conventions to follow when dealing with pointers.
+      In general, when a function returns a pointer, the documentation for the function should explain
+      who "owns" the pointer. This concept helps the programmer decide when it is appropriate, if ever,
+      to free the pointer.
+      </p>
+      <p>
+      For example, the function's documentation may say "caller owns the returned memory", in which case
+      the code that calls the function must have a plan for when to free that memory. Probably in this situation,
+      the function will accept an {#syntax#}*Allocator{#endsyntax#} parameter.
+      </p>
+      <p>
+      Sometimes the lifetime of a pointer may be more complicated. For example, when using
+      {#syntax#}std.ArrayList(T).toSlice(){#endsyntax#}, the returned slice has a lifetime that remains
+      valid until the next time the list is resized, such as by appending new elements.
+      </p>
+      <p>
+      The API documentation for functions and data structures should take great care to explain
+      the ownership and lifetime semantics of pointers. Ownership determines whose responsibility it
+      is to free the memory referenced by the pointer, and lifetime determines the point at which
+      the memory becomes inaccessible (lest {#link|Undefined Behavior#} occur).
+      </p>
+      {#header_close#}
 
       {#header_close#}
       {#header_open|Compile Variables#}