This is a submodule of std.algorithm
. It contains generic algorithms that implement set operations.
The functions multiwayMerge
, multiwayUnion
, setDifference
, setIntersection
, setSymmetricDifference
expect a range of sorted ranges as input.
All algorithms are generalized to accept as input not only sets but also multisets. Each algorithm documents behaviour in the presence of duplicated inputs.
Function Name | Description |
---|---|
cartesianProduct | Computes Cartesian product of two ranges. |
largestPartialIntersection | Copies out the values that occur most frequently in a range of ranges. |
largestPartialIntersectionWeighted | Copies out the values that occur most frequently (multiplied by per-value weights) in a range of ranges. |
multiwayMerge | Merges a range of sorted ranges. |
multiwayUnion | Computes the union of a range of sorted ranges. |
setDifference | Lazily computes the set difference of two or more sorted ranges. |
setIntersection | Lazily computes the intersection of two or more sorted ranges. |
setSymmetricDifference | Lazily computes the symmetric set difference of two or more sorted ranges. |
Lazily computes the Cartesian product of two or more ranges. The product is a range of tuples of elements from each respective range.
The conditions for the two-range case are as follows:
If both ranges are finite, then one must be (at least) a forward range and the other an input range.
If one range is infinite and the other finite, then the finite range must be a forward range, and the infinite range can be an input range.
If both ranges are infinite, then both must be forward ranges.
When there are more than two ranges, the above conditions apply to each adjacent pair of ranges.
R1 range1
| The first range |
R2 range2
| The second range |
RR ranges
| Two or more non-infinite forward ranges |
RR otherRanges
| Zero or more non-infinite forward ranges |
std.typecons.Tuple
representing elements of the cartesian product of the given ranges.import std.algorithm.searching : canFind; import std.range; import std.typecons : tuple; auto N = sequence!"n"(0); // the range of natural numbers auto N2 = cartesianProduct(N, N); // the range of all pairs of natural numbers // Various arbitrary number pairs can be found in the range in finite time. assert(canFind(N2, tuple(0, 0))); assert(canFind(N2, tuple(123, 321))); assert(canFind(N2, tuple(11, 35))); assert(canFind(N2, tuple(279, 172)));
import std.algorithm.searching : canFind; import std.typecons : tuple; auto B = [ 1, 2, 3 ]; auto C = [ 4, 5, 6 ]; auto BC = cartesianProduct(B, C); foreach (n; [[1, 4], [2, 4], [3, 4], [1, 5], [2, 5], [3, 5], [1, 6], [2, 6], [3, 6]]) { assert(canFind(BC, tuple(n[0], n[1]))); }
import std.algorithm.comparison : equal; import std.typecons : tuple; auto A = [ 1, 2, 3 ]; auto B = [ 'a', 'b', 'c' ]; auto C = [ "x", "y", "z" ]; auto ABC = cartesianProduct(A, B, C); assert(ABC.equal([ tuple(1, 'a', "x"), tuple(1, 'a', "y"), tuple(1, 'a', "z"), tuple(1, 'b', "x"), tuple(1, 'b', "y"), tuple(1, 'b', "z"), tuple(1, 'c', "x"), tuple(1, 'c', "y"), tuple(1, 'c', "z"), tuple(2, 'a', "x"), tuple(2, 'a', "y"), tuple(2, 'a', "z"), tuple(2, 'b', "x"), tuple(2, 'b', "y"), tuple(2, 'b', "z"), tuple(2, 'c', "x"), tuple(2, 'c', "y"), tuple(2, 'c', "z"), tuple(3, 'a', "x"), tuple(3, 'a', "y"), tuple(3, 'a', "z"), tuple(3, 'b', "x"), tuple(3, 'b', "y"), tuple(3, 'b', "z"), tuple(3, 'c', "x"), tuple(3, 'c', "y"), tuple(3, 'c', "z") ]));
Given a range of sorted forward ranges ror
, copies to tgt
the elements that are common to most ranges, along with their number of occurrences. All ranges in ror
are assumed to be sorted by less
. Only the most frequent tgt.length
elements are returned.
less | The predicate the ranges are sorted by. |
RangeOfRanges ror
| A range of forward ranges sorted by less . |
Range tgt
| The target range to copy common elements to. |
SortOutput sorted
| Whether the elements copied should be in sorted order. The function largestPartialIntersection is useful for e.g. searching an inverted index for the documents most likely to contain some terms of interest. The complexity of the search is Ο(n * log(tgt.length) ), where n is the sum of lengths of all input ranges. This approach is faster than keeping an associative array of the occurrences and then selecting its top items, and also requires less memory (largestPartialIntersection builds its result directly in tgt and requires no extra memory). If at least one of the ranges is a multiset, then all occurences of a duplicate element are taken into account. The result is equivalent to merging all ranges and picking the most frequent tgt.length elements. |
largestPartialIntersection
does not allocate extra memory, it will leave ror
modified. Namely, largestPartialIntersection
assumes ownership of ror
and discretionarily swaps and advances elements of it. If you want ror
to preserve its contents after the call, you may want to pass a duplicate to largestPartialIntersection
(and perhaps cache the duplicate in between calls).import std.typecons : tuple, Tuple; // Figure which number can be found in most arrays of the set of // arrays below. double[][] a = [ [ 1, 4, 7, 8 ], [ 1, 7 ], [ 1, 7, 8], [ 4 ], [ 7 ], ]; auto b = new Tuple!(double, uint)[1]; // it will modify the input range, hence we need to create a duplicate largestPartialIntersection(a.dup, b); // First member is the item, second is the occurrence count writeln(b[0]); // tuple(7.0, 4u) // 7.0 occurs in 4 out of 5 inputs, more than any other number // If more of the top-frequent numbers are needed, just create a larger // tgt range auto c = new Tuple!(double, uint)[2]; largestPartialIntersection(a, c); writeln(c[0]); // tuple(1.0, 3u) // 1.0 occurs in 3 inputs // multiset double[][] x = [ [1, 1, 1, 1, 4, 7, 8], [1, 7], [1, 7, 8], [4, 7], [7] ]; auto y = new Tuple!(double, uint)[2]; largestPartialIntersection(x.dup, y); // 7.0 occurs 5 times writeln(y[0]); // tuple(7.0, 5u) // 1.0 occurs 6 times writeln(y[1]); // tuple(1.0, 6u)
Similar to largestPartialIntersection
, but associates a weight with each distinct element in the intersection.
If at least one of the ranges is a multiset, then all occurences of a duplicate element are taken into account. The result is equivalent to merging all input ranges and picking the highest tgt.length
, weight-based ranking elements.
less | The predicate the ranges are sorted by. |
RangeOfRanges ror
| A range of forward ranges sorted by less . |
Range tgt
| The target range to copy common elements to. |
WeightsAA weights
| An associative array mapping elements to weights. |
SortOutput sorted
| Whether the elements copied should be in sorted order. |
import std.typecons : tuple, Tuple; // Figure which number can be found in most arrays of the set of // arrays below, with specific per-element weights double[][] a = [ [ 1, 4, 7, 8 ], [ 1, 7 ], [ 1, 7, 8], [ 4 ], [ 7 ], ]; auto b = new Tuple!(double, uint)[1]; double[double] weights = [ 1:1.2, 4:2.3, 7:1.1, 8:1.1 ]; largestPartialIntersectionWeighted(a, b, weights); // First member is the item, second is the occurrence count writeln(b[0]); // tuple(4.0, 2u) // 4.0 occurs 2 times -> 4.6 (2 * 2.3) // 7.0 occurs 3 times -> 4.4 (3 * 1.1) // multiset double[][] x = [ [ 1, 1, 1, 4, 7, 8 ], [ 1, 7 ], [ 1, 7, 8], [ 4 ], [ 7 ], ]; auto y = new Tuple!(double, uint)[1]; largestPartialIntersectionWeighted(x, y, weights); writeln(y[0]); // tuple(1.0, 5u) // 1.0 occurs 5 times -> 1.2 * 5 = 6
Merges multiple sets. The input sets are passed as a range of ranges and each is assumed to be sorted by less
. Computation is done lazily, one union element at a time. The complexity of one popFront
operation is Ο(log(ror.length)
). However, the length of ror
decreases as ranges in it are exhausted, so the complexity of a full pass through MultiwayMerge
is dependent on the distribution of the lengths of ranges contained within ror
. If all ranges have the same length n
(worst case scenario), the complexity of a full pass through MultiwayMerge
is Ο(n * ror.length * log(ror.length)
), i.e., log(ror.length)
times worse than just spanning all ranges in turn. The output comes sorted (unstably) by less
.
The length of the resulting range is the sum of all lengths of the ranges passed as input. This means that all elements (duplicates included) are transferred to the resulting range.
For backward compatibility, multiwayMerge
is available under the name nWayUnion
and MultiwayMerge
under the name of NWayUnion
. Future code should use multiwayMerge
and MultiwayMerge
as nWayUnion
and NWayUnion
will be deprecated.
less | Predicate the given ranges are sorted by. |
RangeOfRanges ror
| A range of ranges sorted by less to compute the union for. |
ror
. MultiwayMerge
does not allocate extra memory, it will leave ror
modified. Namely, MultiwayMerge
assumes ownership of ror
and discretionarily swaps and advances elements of it. If you want ror
to preserve its contents after the call, you may want to pass a duplicate to MultiwayMerge
(and perhaps cache the duplicate in between calls).import std.algorithm.comparison : equal; double[][] a = [ [ 1, 4, 7, 8 ], [ 1, 7 ], [ 1, 7, 8], [ 4 ], [ 7 ], ]; auto witness = [ 1, 1, 1, 4, 4, 7, 7, 7, 7, 8, 8 ]; assert(equal(multiwayMerge(a), witness)); double[][] b = [ // range with duplicates [ 1, 1, 4, 7, 8 ], [ 7 ], [ 1, 7, 8], [ 4 ], [ 7 ], ]; // duplicates are propagated to the resulting range assert(equal(multiwayMerge(b), witness));
Computes the union of multiple ranges. The input ranges are passed as a range of ranges and each is assumed to be sorted by less
. Computation is done lazily, one union element at a time. multiwayUnion(ror)
is functionally equivalent to multiwayMerge(ror).uniq
.
"The output of multiwayUnion has no duplicates even when its inputs contain duplicates."
less | Predicate the given ranges are sorted by. |
RangeOfRanges ror
| A range of ranges sorted by less to compute the intersection for. |
ror
. See also: multiwayMerge
import std.algorithm.comparison : equal; // sets double[][] a = [ [ 1, 4, 7, 8 ], [ 1, 7 ], [ 1, 7, 8], [ 4 ], [ 7 ], ]; auto witness = [1, 4, 7, 8]; assert(equal(multiwayUnion(a), witness)); // multisets double[][] b = [ [ 1, 1, 1, 4, 7, 8 ], [ 1, 7 ], [ 1, 7, 7, 8], [ 4 ], [ 7 ], ]; assert(equal(multiwayUnion(b), witness)); double[][] c = [ [9, 8, 8, 8, 7, 6], [9, 8, 6], [9, 8, 5] ]; auto witness2 = [9, 8, 7, 6, 5]; assert(equal(multiwayUnion!"a > b"(c), witness2));
Lazily computes the difference of r1
and r2
. The two ranges are assumed to be sorted by less
. The element types of the two ranges must have a common type.
In the case of multisets, considering that element a
appears x
times in r1
and y
times and r2
, the number of occurences of a
in the resulting range is going to be x-y
if x > y or 0 othwerise.
less | Predicate the given ranges are sorted by. |
R1 r1
| The first range. |
R2 r2
| The range to subtract from r1 . |
r1
and r2
. setSymmetricDifference
import std.algorithm.comparison : equal; import std.range.primitives : isForwardRange; //sets int[] a = [ 1, 2, 4, 5, 7, 9 ]; int[] b = [ 0, 1, 2, 4, 7, 8 ]; assert(equal(setDifference(a, b), [5, 9])); static assert(isForwardRange!(typeof(setDifference(a, b)))); // multisets int[] x = [1, 1, 1, 2, 3]; int[] y = [1, 1, 2, 4, 5]; auto r = setDifference(x, y); assert(equal(r, [1, 3])); assert(setDifference(r, x).empty);
Lazily computes the intersection of two or more input ranges ranges
. The ranges are assumed to be sorted by less
. The element types of the ranges must have a common type.
In the case of multisets, the range with the minimum number of occurences of a given element, propagates the number of occurences of this element to the resulting range.
less | Predicate the given ranges are sorted by. |
Rs ranges
| The ranges to compute the intersection for. |
import std.algorithm.comparison : equal; // sets int[] a = [ 1, 2, 4, 5, 7, 9 ]; int[] b = [ 0, 1, 2, 4, 7, 8 ]; int[] c = [ 0, 1, 4, 5, 7, 8 ]; assert(equal(setIntersection(a, a), a)); assert(equal(setIntersection(a, b), [1, 2, 4, 7])); assert(equal(setIntersection(a, b, c), [1, 4, 7])); // multisets int[] d = [ 1, 1, 2, 2, 7, 7 ]; int[] e = [ 1, 1, 1, 7]; assert(equal(setIntersection(a, d), [1, 2, 7])); assert(equal(setIntersection(d, e), [1, 1, 7]));
Lazily computes the symmetric difference of r1
and r2
, i.e. the elements that are present in exactly one of r1
and r2
. The two ranges are assumed to be sorted by less
, and the output is also sorted by less
. The element types of the two ranges must have a common type.
If both ranges are sets (without duplicated elements), the resulting range is going to be a set. If at least one of the ranges is a multiset, the number of occurences of an element x
in the resulting range is abs(a-b)
where a
is the number of occurences of x
in r1
, b
is the number of occurences of x
in r2
, and abs
is the absolute value.
If both arguments are ranges of L-values of the same type then SetSymmetricDifference
will also be a range of L-values of that type.
less | Predicate the given ranges are sorted by. |
R1 r1
| The first range. |
R2 r2
| The second range. |
r1
and r2
. setDifference
import std.algorithm.comparison : equal; import std.range.primitives : isForwardRange; // sets int[] a = [ 1, 2, 4, 5, 7, 9 ]; int[] b = [ 0, 1, 2, 4, 7, 8 ]; assert(equal(setSymmetricDifference(a, b), [0, 5, 8, 9][])); static assert(isForwardRange!(typeof(setSymmetricDifference(a, b)))); //mutisets int[] c = [1, 1, 1, 1, 2, 2, 2, 4, 5, 6]; int[] d = [1, 1, 2, 2, 2, 2, 4, 7, 9]; assert(equal(setSymmetricDifference(c, d), setSymmetricDifference(d, c))); assert(equal(setSymmetricDifference(c, d), [1, 1, 2, 5, 6, 7, 9]));
© 1999–2018 The D Language Foundation
Licensed under the Boost License 1.0.
https://dlang.org/phobos/std_algorithm_setops.html