Core containers¶

Warning

Axom’s containers are currently the target of a refactoring effort. The following information is subject to change, as are the interfaces for the containers themselves. When changes are made to the Axom containers, the changes will be reflected here.

Axom Core contains the Array, ArrayView, and StackArray classes. Among other things, these data containers facilitate porting code that uses std::vector to GPUs.

Array¶

Array is a multidimensional contiguous container template. In the 1-dimensional case, this class behaves similar to std::vector. In higher dimensions, some vector-like functionality, such as push_back, are not available and multidimensional-specific operations will mirror the ndarray provided by numpy when possible.

The Array object manages all memory. Typically, the Array object will allocate extra space to facilitate the insertion of new elements and minimize the number of reallocations. The actual capacity of the array (i.e., total number of elements that the Array can hold) can be queried via the capacity() method. When allocated memory is used up, inserting a new element triggers a reallocation. At each reallocation, extra space is allocated according to the resize_ratio parameter, which is set to 2.0 by default. To return all extra memory, an application can call shrink().

Warning

Reallocations tend to be costly operations in terms of performance. Use reserve() when the number of nodes is known a priori, or use a constructor that takes an actual size and capacity when possible.

Note

The Array destructor deallocates and returns all memory associated with it to the system.

Here’s an example showing how to use Array instead of std::vector.

  // Here is an Array of ints with length three.
  axom::Array<int> a(3);
  std::cout << "Length of a = " << a.size() << std::endl;
  a[0] = 2;
  a[1] = 5;
  a[2] = 11;

  // An Array increases in size if a value is pushed back.
  a.push_back(4);
  std::cout << "After pushing back a value, a's length = " << a.size() << std::endl;

  // You can also insert a value in the middle of the Array.
  // Here we insert value 6 at position 2 and value 1 at position 4.
  showArray(a, "a");
  a.insert(2, 6);
  a.insert(4, 1);
  std::cout << "After inserting two values, ";
  showArray(a, "a");

The output of this example is:

Length of a = 3
After appending a value, a's length = 4
Array a = [2, 5, 11, 4]
After inserting two values, Array a = [2, 5, 6, 11, 1, 4]

Applications commonly store tuples of data in a flat array or a std::vector. In this sense, it can be thought of as a two-dimensional array. Array supports arbitrary dimensionalities but an alias, MCArray (or Multi-Component Array), is provided for this two-dimensional case.

The MCArray (i.e., Array<T, 2>) class formalizes tuple storage, as shown in the next example.

  // Here is an MCArray of ints, containing two triples.
  const int numTuples = 2;
  const int numComponents = 3;
  axom::MCArray<int> b(numTuples, numComponents);
  // Set tuple 0 to (1, 4, 2).
  b(0, 0) = 1;
  b(0, 1) = 4;
  b(0, 2) = 2;
  // Set tuple 1 to one tuple, (8, 0, -1).
  // The first argument to set() is the buffer to copy into the MCArray, the
  // second is the number of tuples in the buffer, and the third argument
  // is the first tuple to fill from the buffer.
  int ival[3] = {8, 0, -1};
  b.set(ival, 3, 3);

  showTupleArray(b, "b");

  // Now, insert two tuples, (0, -1, 1), (1, -1, 0), into the MCArray, directly
  // after tuple 0.
  int jval[6] = {0, -1, 1, 1, -1, 0};
  b.insert(1, numTuples * numComponents, jval);

  showTupleArray(b, "b");

The output of this example is:

MCArray b with 2 3-tuples = [
  [1, 4, 2]
  [8, 0, -1]
]
MCArray b with 4 3-tuples = [
  [1, 4, 2]
  [0, -1, 1]
  [1, -1, 0]
  [8, 0, -1]
]

ArrayView¶

It is also often useful to wrap an external, user-supplied buffer without taking ownership of the data. For this purpose Axom provides the ArrayView class, which is a lightweight wrapper over a buffer that provides one- or multi-dimensional indexing/reshaping semantics. For example, it might be useful to reinterpret a flat (one-dimensional) array as a two-dimensional array. This is accomplished via MCArrayView which, similar to the MCArray alias, is an alias for ArrayView<T, 2>.

  // The internal buffer maintained by an MCArray is accessible.
  int* pa = a.data();
  // An MCArray can be constructed with a pointer to an external buffer.
  // Here's an Array interpreting the memory pointed to by pa as three 2-tuples.
  axom::MCArrayView<int> c(pa, 3, 2);

  showArray(a, "a");
  showTupleArrayView(c, "c");

  // Since c is an alias to a's internal memory, changes affect both Arrays.
  a[0] = 1;
  c(1, 1) = 9;

  std::cout << "Array a and MCArrayView c use the same memory, a's internal buffer." << std::endl;
  showArray(a, "a");
  showTupleArrayView(c, "c");

The output of this example is:

Array a = [2, 5, 6, 11, 1, 4]
MCArrayView c with 3 2-tuples = [
  [2, 5]
  [6, 11]
  [1, 4]
]
Array a and MCArrayView c use the same memory, a's internal buffer.
Array a = [1, 5, 6, 9, 1, 4]
MCArrayView c with 3 2-tuples = [
  [1, 5]
  [6, 9]
  [1, 4]
]

Note

The set of permissible operations on an ArrayView is somewhat limited, as operations that would cause the buffer to resize are not permitted.

ArrayView can view memory spaces at regular intervals while ignoring spaces in between. By default, the spacing is one, meaning adjacent elements are one space apart. A spacing of 2 sees every other elements in memory. A spacing of N sees every N-th element. Spacing is set in the ArrayView constructor.

The following example creates a 2D array of 4-tuples and uses an ArrayView to access the 3rd item in each 4-tuple.

  // An array of tuples can be viewed as a tuple of arrays, by specifying
  // the spacing between elements to include.  The spacing skips over the
  // elements to exclude from the view.
  const axom::StackArray<axom::IndexType, 2> shape {2, 3};  // Shape of 2x3 array.
  constexpr axom::IndexType TUPSIZE = 4;
  // 2D array of tuples (stored as a 3D array).
  axom::Array<std::string, 3> arrayOf4tuples(shape[0], shape[1], TUPSIZE);
  for(int i = 0; i < shape[0]; ++i)
  {
    for(int j = 0; j < shape[1]; ++j)
    {
      for(int t = 0; t < TUPSIZE; ++t)
      {
        arrayOf4tuples(i, j, t) = axom::fmt::format("({},{}).{}", i, j, t);
      }
    }
  }
  // View the 3rd of the 4-tuples, as if they were in their own array.
  axom::ArrayView<std::string, 2> viewOfThird(arrayOf4tuples.data() + 2,
                                              shape,
                                              TUPSIZE);  // 2D array of the third component of each tuple
  std::cout << "Third components of 2D array of 4-tuples:" << std::endl;
  show2DArrayView(viewOfThird);

The output of this example is:

Third components of 2D array of 4-tuples:
a(0,0) = (0,0).2
a(0,1) = (0,1).2
a(0,2) = (0,2).2
a(1,0) = (1,0).2
a(1,1) = (1,1).2
a(1,2) = (1,2).2

In the future, it will also be possible to restride an ArrayView.

Iteration is also identical between the Array and ArrayView classes. In particular:

operator() indexes into multidimensional data. Currently, the number of indexes passed must match the dimensionality of the array.

operator[] indexes into the full buffer, i.e., arr[i] is equivalent to arr.data()[i].

begin() and end() refer to the full buffer.

Consider the following example:

  // Iteration over multidimensional arrays uses the shape() method
  // to retrieve the extents in each dimension.
  for(int i = 0; i < c.shape()[0]; i++)
  {
    for(int j = 0; j < c.shape()[1]; j++)
    {
      // Note that c's operator() accepts two arguments because it is two-dimensional
      std::cout << "In ArrayView c, index (" << i << ", " << j << ") yields " << c(i, j) << std::endl;
    }
  }

  // To iterate over the "flat" data in an Array, regardless of dimension,
  // use a range-based for loop.
  std::cout << "Range-based for loop over ArrayView c yields: ";
  for(const int value : c)
  {
    std::cout << value << " ";
  }
  std::cout << std::endl;

  // Alternatively, the "flat" data can be iterated over with operator[]
  // from 0 -> size().
  std::cout << "Standard for loop over ArrayView c yields: ";
  for(int i = 0; i < c.size(); i++)
  {
    std::cout << c.flatIndex(i) << " ";
  }
  std::cout << std::endl;

The output of this example is:

In ArrayView c, index (0, 0) yields 1
In ArrayView c, index (0, 1) yields 5
In ArrayView c, index (1, 0) yields 6
In ArrayView c, index (1, 1) yields 9
In ArrayView c, index (2, 0) yields 1
In ArrayView c, index (2, 1) yields 4
Range-based for loop over ArrayView c yields: 1 5 6 9 1 4
Standard for loop over ArrayView c yields: 1 5 6 9 1 4

Using Arrays in GPU Code¶

Instead of writing kernels and device functions that operate on raw pointers, we can use ArrayView in device code. The basic “workflow” for this process is as follows:

Create an Array allocated in device-accessible memory via either specifying an allocator ID or using a class template parameter for the desired memory space.

Write a kernel that accepts an ArrayView parameter by value, not by reference or pointer.

Create an ArrayView from the Array to call the function. For non-templated kernels an implicit conversion is provided.

The full template signature for Array (ArrayView has an analogous signature) is Array<typename T, int DIM = 1, MemorySpace SPACE = MemorySpace::Dynamic>. Of particular interest is the last parameter, which specifies the memory space in which the array’s data are allocated. The default, Dynamic, means that the memory space is set via an allocator ID at runtime.

Note

Allocating Array s in different memory spaces is only possible when Umpire is available. To learn more about Umpire, see the Umpire documentation

Setting the MemorySpace to an option other than Dynamic (for example, MemorySpace::Device) provides a compile-time guarantee that data can always be accessed from a GPU. “Locking down” the memory space at compile time can help to prevent illegal memory accesses and segmentation faults when pointers are dereferenced from the wrong execution space.

To summarize, there are a couple different options for creating an ArrayView. Consider a function that takes as an argument an ArrayView on the device:

void takesDeviceArrayView(axom::ArrayView<int, 1, axom::MemorySpace::Device>) { }

To create an argument to this function we can select the space either at runtime or at compile-time as follows:

  constexpr int N = 10;
  // An device array can be constructed by either specifying the corresponding allocator ID...
  const int device_allocator_id =
    axom::getUmpireResourceAllocatorID(umpire::resource::MemoryResourceType::Device);
  axom::Array<int> device_array_dynamic(N, N, device_allocator_id);
  // ...or by providing the memory space via template parameter:
  axom::Array<int, 1, axom::MemorySpace::Device> device_array_template(N);

The first way we can create the required ArrayView is by implicit conversion, which also simplifies the process of “locking down” a MemorySpace::Dynamic array to an explicit memory space - MemorySpace:Device in this case.

  takesDeviceArrayView(device_array_dynamic);
  takesDeviceArrayView(device_array_template);

Warning

If we had attempted to convert from a MemorySpace::Dynamic array that had been allocated in host memory, for example, an error would be produced at runtime.

We can also explicitly construct the ArrayView before calling the function.

  using DeviceArrayView = axom::ArrayView<int, 1, axom::MemorySpace::Device>;

  DeviceArrayView view_of_dynamic_array(device_array_dynamic);
  takesDeviceArrayView(view_of_dynamic_array);

  DeviceArrayView view_of_template_array(device_array_template);
  takesDeviceArrayView(view_of_template_array);

  // Create an explicit ArrayView using Array::view()
  auto view_of_array_using_view_method = device_array_dynamic.view();
  takesDeviceArrayView(view_of_array_using_view_method);

  DeviceArrayView view_of_array_using_operator_equals = device_array_dynamic;
  takesDeviceArrayView(view_of_array_using_operator_equals);

  DeviceArrayView view_of_array_from_pointer(device_array_dynamic.data(), N);
  takesDeviceArrayView(view_of_array_from_pointer);

A more realistic example of this functionality involves a GPU kernel requiring that its argument arrays be allocated in a specific memory space. To illustrate how different memory spaces can be required, the following kernel requires that its input arrays A and B are in unified memory and its output array C is in device memory.

// Aliases used for convenience
using UnifiedIntArrayView = axom::ArrayView<int, 1, axom::MemorySpace::Unified>;
using DeviceIntArrayView = axom::ArrayView<int, 1, axom::MemorySpace::Device>;

__global__ void add(const UnifiedIntArrayView A, const UnifiedIntArrayView B, DeviceIntArrayView C)
{
  for(int i = 0; i < A.size(); i++)
  {
    C[i] = A[i] + B[i];
  }
}

The following snippet illustrates how one would create and initialize the inputs/outputs to this kernel.

  const int unified_alloc_id =
    axom::getUmpireResourceAllocatorID(umpire::resource::MemoryResourceType::Unified);

  // The last template parameter specifies a memory space.
  // Its default value is Dynamic, which lets the user specify the
  // memory space at runtime with a memory allocator ID.  The
  // third constructor parameter specifies the allocator.
  // If this argument is not provided host memory will be allocated.
  axom::Array<int> A_dynamic(N, N, unified_alloc_id);

  // We also have the option to "lock down" the memory space to allow for
  // compile-time guarantees against dereferencing pointers in the wrong memory space.
  axom::Array<int, 1, axom::MemorySpace::Unified> B_unified(N);

  // Despite having different types, both of these arrays are in unified memory.
  for(int i = 0; i < N; i++)
  {
    A_dynamic[i] = i * 5;
    B_unified[i] = i * 2;
  }

  // The result array is allocated in device memory
  axom::Array<int, 1, axom::MemorySpace::Device> C_device(N);

Note

Unless the Dynamic memory space is in use, the Array constructor will ignore an allocator ID that doesn’t match its memory space, and in debug builds will print a warning at runtime.

We can now launch the kernel and display the results via a transfer back to host-accessible memory:

  // Passing by reference is not possible for CUDA kernels, so the three arrays
  // are converted to corresponding ArrayViews that are "shallow copies" of the
  // original Array.
  // Note that even though A's memory space has not been locked down at compile time,
  // we are able to pass it as an argument - it will be implicitly converted to an ArrayView
  // of the correct type. Also note that if we had not constructed A with the UM allocator ID,
  // this conversion would fail and produce an error at runtime.
  add<<<1, 1>>>(A_dynamic, B_unified, C_device);

  // Since our result array is in device memory, we copy it to host memory so we can view it.
  axom::Array<int, 1, axom::MemorySpace::Host> C_host = C_device;
  std::cout << "Array C_host = " << C_host << std::endl;

  // We can also use a dynamic array, if we specify an allocator ID for host memory in the copy constructor.
  axom::Array<int> C_dynamic(C_device, host_alloc_id);
  std::cout << "Array C_dynamic = " << C_dynamic << std::endl;

If RAJA is available, we can also use Axom’s acceleration utilities to perform an operation on the GPU via a lambda:

  // To use a lambda as a kernel, we create the ArrayViews explicitly.
  const UnifiedIntArrayView A_view = A_dynamic;
  const UnifiedIntArrayView B_view = B_unified;
  // Create a new array for our RAJA result
  axom::Array<int, 1, axom::MemorySpace::Device> C_device_raja(N);
  DeviceIntArrayView C_view = C_device_raja;

    // Write to the underlying array through C_view, which is captured by value
    #if defined(__CUDACC__)
  using ExecSpace = axom::CUDA_EXEC<1>;
    #elif defined(__HIPCC__)
  using ExecSpace = axom::HIP_EXEC<1>;
    #else
  using ExecSpace = axom::SEQ_EXEC;
    #endif

  axom::for_all<ExecSpace>(0, N, [=] AXOM_HOST_DEVICE(axom::IndexType i) {
    C_view[i] = A_view[i] + B_view[i] + 1;
  });

  // Finally, copy things over to host memory so we can display the data
  axom::Array<int, 1, axom::MemorySpace::Host> C_host_raja = C_view;
  std::cout << "Array C_host_raja = " << C_host_raja << std::endl;

By default, Array copies and moves will propagate the allocator ID; this ensures that objects with Array members do not accidentally move their data to the host when copied or moved:

  // For a Dynamic space memory array, copies and moves will use the allocator
  // from the other array, unless otherwise specified.
  axom::Array<int> C_device_copy(C_device);
  axom::Array<int> C_device_copy_assign = C_device;
  axom::Array<int> C_device_move(std::move(C_device_copy));
  axom::Array<int> C_device_move_assign = std::move(C_device_copy_assign);

  // An allocator ID may be passed into the copy constructor, which creates an
  // array in that memory space.
  const int host_alloc_id = axom::getDefaultAllocatorID();
  axom::Array<int> C_host_copy(C_device, host_alloc_id);

  // The semantics for Arrays with compile-time specified memory spaces is similar:
  // when possible, an allocator ID from the other array is used.
  axom::Array<int, 1, axom::MemorySpace::Device> C_explicit_device_copy(C_device);

  // Just as before, an allocator ID may be specified explicitly.
  axom::Array<int, 1, axom::MemorySpace::Unified> C_explicit_unified_copy(C_device, unified_alloc_id);

  // Note that if an allocator ID is incompatible with a memory space, the default
  // allocator ID for that memory space is used. Both of these examples will copy
  // memory to the host:
  axom::Array<int, 1, axom::MemorySpace::Host> C_use_host_alloc(C_device);
  // The below will also print a warning in debug mode:
  axom::Array<int, 1, axom::MemorySpace::Host> C_use_host_alloc_2(C_device, unified_alloc_id);

StackArray¶

The StackArray class is a work-around for a limitation in older versions of the nvcc compiler, which do not capture arrays on the stack in device lambdas. More details are in the API documentation and in the tests.