bw_processing.array_creation

Functions

chunked(iterable, chunk_size)

create_array(iterable[, nrows, dtype])

Create a numpy array data iterable. Returns a filepath of a created file (if filepath is provided, or the array.

create_chunked_array(iterable, ncols[, dtype, bucket_size])

Create a numpy array from an iterable of indeterminate length.

create_chunked_structured_array(iterable, dtype[, ...])

Create a numpy structured array from an iterable of indeterminate length.

create_structured_array(iterable, dtype[, nrows, ...])

Create a numpy structured array for data iterable. Returns a filepath of a created file (if filepath is provided, or the array.

get_ncols(iterator)

peek(iterator)

Module Contents

bw_processing.array_creation.chunked(iterable, chunk_size)[source]
bw_processing.array_creation.create_array(iterable, nrows=None, dtype=np.float32)[source]

Create a numpy array data iterable. Returns a filepath of a created file (if filepath is provided, or the array.

iterable can be data already in memory, or a generator.

nrows can be supplied, if known. If iterable has a length, it will be determined automatically. If nrows is not known, this function generates chunked arrays until iterable is exhausted, and concatenates them.

Either nrows or ncols must be specified.

bw_processing.array_creation.create_chunked_array(iterable, ncols, dtype=np.float32, bucket_size=500)[source]

Create a numpy array from an iterable of indeterminate length.

Needed when we can’t determine the length of the iterable ahead of time (e.g. for a generator or a database cursor), so can’t create the complete array in memory in on step

Creates a list of arrays with bucket_size rows until iterable is exhausted, then concatenates them.

Parameters:
  • iterable – Iterable of data used to populate the array.

  • ncols – Number of columns in the created array.

  • dtype – Numpy dtype of the created array

  • bucket_size – Number of rows in each intermediate array.

Returns:.

Returns the created array. Will return a zero-length array if iterable has no data.

bw_processing.array_creation.create_chunked_structured_array(iterable, dtype, bucket_size=20000)[source]

Create a numpy structured array from an iterable of indeterminate length.

Needed when we can’t determine the length of the iterable ahead of time (e.g. for a generator or a database cursor), so can’t create the complete array in memory in on step

Creates a list of arrays with bucket_size rows until iterable is exhausted, then concatenates them.

Parameters:
  • iterable – Iterable of data used to populate the array.

  • dtype – Numpy dtype of the created array

  • format_function – If provided, this function will be called on each row of iterable before insertion in the array.

  • bucket_size – Number of rows in each intermediate array.

Returns:.

Returns the created array. Will return a zero-length array if iterable has no data.

bw_processing.array_creation.create_structured_array(iterable, dtype, nrows=None, sort=False, sort_fields=None)[source]

Create a numpy structured array for data iterable. Returns a filepath of a created file (if filepath is provided, or the array.

iterable can be data already in memory, or a generator.

nrows can be supplied, if known. If iterable has a length, it will be determined automatically. If nrows is not known, this function generates chunked arrays until iterable is exhausted, and concatenates them.

bw_processing.array_creation.get_ncols(iterator)[source]
bw_processing.array_creation.peek(iterator)[source]