Primitive Set Operations in Eclipse Collections — Part 1
Eclipse Collections has a rich assortment of data structures, and one of them is a Set. Recently, I worked on an issue to implement union
, intersect
and difference
operations in sets for primitive types.
The next few sections will cover each operation’s objective, design considerations, and code implementation. The last section will cover the takeaways.
Union — What does this operation do?
Method signature: setA.union(setB)
union
as the name indicates, it takes elements from two sets and combines them into one.
Set A — 1, 2, 3, 4.
Set B — 4, 5.
Union — 1, 2, 3, 4, 5.
Union — Design considerations
- Size — My first approach was to take elements from
Set A
and add all the elements fromSet B
to the resulting set. But, what ifSet B
is larger thanSet A
. Is it then optimal to create a set fromSet A
first and then mergeSet B
into the resulting set? In the case of union, we know that the elements from both sets are part of the final set; hence it is reasonable to start with the larger set and iterate over the smaller set to add them to the larger set. - Empty set(s) — Does it matter that one of the two sets is empty?
If one of the two sets is empty, the expected result is the non-empty set. If both sets are empty, then the expected result is an empty set. This can be confirmed by writing useful unit tests.
Union — Code implementation
🛈 Eclipse Collections has an existing API withAll
which allows you to add elements from one set to the other.
default MutableIntSet union(IntSet set)
{
if (this.size() > set.size())
{
return this.toSet().withAll(set);
}
else
{
return set.toSet().withAll(this);
}
}
Union — Usage
@Test
public void union()
{
MutableIntSet setA = IntSets.mutable.with(1, 2, 3, 5);
MutableIntSet setB = IntSets.mutable.with(2, 3, 4);
MutableIntSet expected = IntSets.mutable.with(1, 2, 3, 4, 5);
MutableIntSet actual = setA.union(setB);
Assert.assertEquals(expected, actual);
}
Unit tests for union covering scenarios for equal-sized, unequal-sized, and empty sets.
private void assertUnion(MutableIntSet set1, MutableIntSet set2, MutableIntSet expected)
{
MutableIntSet actual = set1.union(set2);
Assert.assertEquals(expected, actual);
}@Test
public void union()
{
this.assertUnion(
this.newWith(1, 2, 3),
this.newWith(3, 4, 5),
this.newWith(1, 2, 3, 4, 5));
this.assertUnion(
this.newWith(1, 2, 3, 6),
this.newWith(3, 4, 5),
this.newWith(1, 2, 3, 4, 5, 6));
this.assertUnion(
this.newWith(1, 2, 3),
this.newWith(3, 4, 5, 6),
this.newWith(1, 2, 3, 4, 5, 6));
this.assertUnion(
this.newWith(),
this.newWith(),
this.newWith());
this.assertUnion(
this.newWith(),
this.newWith(3, 4, 5),
this.newWith(3, 4, 5));
this.assertUnion(
this.newWith(1, 2, 3),
this.newWith(),
this.newWith(1, 2, 3));
}
Intersect — What does this operation do?
Method signature: setA.intersect(setB)
Intersect takes two elements from two sets and only retains the common elements in the resulting set.
Set A — 1, 2, 3, 4.
Set B — 3, 4, 5, 6.
Intersect — 3, 4.
Intersect — Design considerations
- Size — This matters in the case of
intersect
too, but it works differently than the case above. Since we are only interested in the common elements between the two sets, the smaller set between them will be our starting point. For the sake of this argument, consider a case of million items inSet A
and three items inSet B
and we are interested in the common elements. It would make sense to start with the smaller set as we know the result set will contain only three elements at most. - Empty set (s)— In this case, if either or both sets are empty, the resulting set will be empty. This can be confirmed quickly by adding unit tests.
Intersect — Code Implementation
🛈 Eclipse Collections has an existing API that allows us to select
all elements that evaluate true for the given Predicate.
default MutableIntSet intersect(IntSet set)
{
if (this.size() < set.size())
{
return this.select(set::contains);
}
else
{
return set.select(this::contains, this.newEmpty());
}
}
Intersect — Usage
@Test
public void intersect()
{
MutableIntSet setA = IntSets.mutable.with(1, 2, 3, 5);
MutableIntSet setB = IntSets.mutable.with(2, 3, 4);
MutableIntSet expected = IntSets.mutable.with(2, 3);
MutableIntSet actual = setA.intersect(setB);
Assert.assertEquals(expected, actual);
}
Unit tests for intersect covering scenarios for equal-sized, unequal-sized, and empty sets.
Difference — What does this operation do?
Method signature: setA.difference(setB)
Difference takes elements that are unique to Set A
only and not Set B
.
Set A — 1, 2, 3, 4.
Set B — 3, 4, 5, 6.
Difference— 1, 2.
Difference — Design considerations
As you may have guessed, the size of these two sets doesn't matter, as our starting point will always be Set A
. If Set A
is empty, then the resulting set will be empty, and if Set B
is empty, then the resulting set will be Set A
.
Difference — Code implementation
🛈 Eclipse Collections has an existing API that allows us to reject
which returns all elements that evaluate false for the Predicate.
In other words, it is the opposite of select
.
default MutableIntSet difference(IntSet set)
{
return this.reject(set::contains);
}
Difference — Usage
@Test
public void difference()
{
MutableIntSet setA = IntSets.mutable.with(1, 2, 3, 5);
MutableIntSet setB = IntSets.mutable.with(2, 3, 4);
MutableIntSet expected = IntSets.mutable.with(1, 5);
MutableIntSet actual = setA.difference(setB);
Assert.assertEquals(expected, actual);
}
Unit tests for difference covering scenarios for equal-sized, unequal-sized, and empty sets.
Takeaways
- Naming convention — Naming is of utmost significance when it comes to code readability. In Eclipse Collections, we have chosen to name these operations as
union
,intersect
anddifference
to effectively communicate the mathematical set operations. - Write Unit Tests — This point is worth reiterating. I started writing unit tests to confirm my understanding of how these new APIs should behave. Once I had the first implementation ready, the next step was to verify that the unit tests were still passing after adding the API’s optimization logic.
- Think Optimization — In my first implementation, the new APIs worked as expected. To further improve the code, I added optimization improvements to the algorithm, such as a simple size check, and other minor code refactoring.
I am a committer for the Eclipse Collections OSS project at Eclipse Foundation. Eclipse Collections is open for contributions.