Module ojalgo

Class BatchNode<T>

java.lang.Object
org.ojalgo.data.batch.BatchNode<T>

public final class BatchNode<T> extends Object
A batch processing data node for when there's no way to fit the data in memory.

Data is stored in sharded files, and data is written/consumed and processed concurrently.

The data is processed in batches. Each batch is processed in a single thread. The number of threads is controlled by

invalid reference
#parallelism(IntSupplier)
.
  • Method Details

    • newBuilder

      public static <T> BatchNode.Builder<T> newBuilder(File directory, DataInterpreter<T> interpreter)
    • newInstance

      public static <T> BatchNode<T> newInstance(File directory, DataInterpreter<T> interpreter)
    • dispose

      public void dispose()
      Dispose of this node and explicitly delete all files.
    • newWriter

      public ToFileWriter<T> newWriter()
    • processAll

      public void processAll(Consumer<T> processor)
      Process each and every item individually
      Parameters:
      processor - Must be able to consume concurrently
    • processAll

      public void processAll(Supplier<Consumer<T>> processorFactory)
      Similar to processAll(Consumer) but you provide a consumer constructor/factory rather than a specific consumer. Internally there will be 1 consumer per worker thread instantiated. This variant is for when the consumer(s) are stateful.
    • processCombineable

      public <R, A extends TwoStepMapper<T, R>> void processCombineable(Supplier<A> aggregatorFactory, Consumer<A> processor)
      Similar to processMergeable(Supplier, Consumer) but the processor is called with the aggregator instance itself rather than its extracted results. This corresponds to
      invalid reference
      TwoStepMapper#Combineable
      rather than
      invalid reference
      TwoStepMapper#Mergeable
      .
      See Also:
    • processMapped

      @Deprecated public <R> void processMapped(Supplier<? extends TwoStepMapper<T,R>> aggregatorFactory, Consumer<R> processor)
      Deprecated.
      v54 Use
      invalid reference
      #processMergeable(Supplier<? extends TwoStepMapper<T, H>>,Consumer<H>)
      instead
      Process mapped/derived data in batches.

      There will be one TwoStepMapper instance per underlying file/shard – that's a batch. Those instances are likely to contain some sort of Collection or Map that hold mapped/derived data.

      You must make sure that all data items that need to be in the same TwoStepMapper instance (in the same batch) are in the same file/shard. You control the number of shards via BatchNode.Builder.fragmentation(int) and which item goes in which shard via BatchNode.Builder.distributor(ToIntFunction).

      Type Parameters:
      R - The mapped/derived data holding type
      Parameters:
      aggregatorFactory - Produces the TwoStepMapper mapping instances
      processor - Consumes the mapped/derived data - the results of one whole TwoStepMapper instance at the time
    • processMergeable

      public <R, A extends TwoStepMapper<T, R>> void processMergeable(Supplier<A> aggregatorFactory, Consumer<R> processor)
      Each shard is processed/aggregated separately by a TwoStepMapper instance. The results are then processed/merged by the provided processor.

      There is one TwoStepMapper instance per underlying worker thread. Those instances are reset and reused for each shard.

      The processor is called concurrently from multiple threads.

      You must make sure that all data items that need to be in the same aggregator instance are in the same file/shard. You control the number of shards via BatchNode.Builder.fragmentation(int) and which item goes in which shard via BatchNode.Builder.distributor(ToIntFunction).

      Parameters:
      aggregatorFactory - Produces the TwoStepMapper aggregator instances
      processor - Consumes the aggregated/derived data - the results of one whole TwoStepMapper instance at the time
    • reduceByCombining

      public <R, A extends TwoStepMapper.Combineable<T, R, A>> R reduceByCombining(Supplier<A> aggregatorFactory)
      Calls processCombineable(Supplier, Consumer) with the
      invalid reference
      TwoStepMapper.Combineable#merge(Object)
      method of a global TwoStepMapper.Combineable instance as the consumer.
    • reduceByMerging

      public <R, A extends TwoStepMapper.Mergeable<T, R>> R reduceByMerging(Supplier<A> aggregatorFactory)
    • reduceMapped

      @Deprecated public <R, A extends TwoStepMapper.Mergeable<T, R>> R reduceMapped(Supplier<A> aggregatorFactory)
      Deprecated.
      v54 Use
      invalid reference
      #reduceByMerging(Supplier<A>)
      instead
      Same as processMergeable(Supplier, Consumer), but then also reduce/merge the total results using
      invalid reference
      TwoStepMapper#merge(Object)
      .

      Create a class that implements TwoStepMapper and make sure to also implement

      invalid reference
      TwoStepMapper#merge(Object)
      - you can only use this if merging partial (sub)results is possible. Use a constructor or factory method that produce instances of that type as the argument to this method.