Why I wrote my own symmetric difference?
I worked on an Apache Maven Plugin (more accurately on the maven-dependency-plugin) while I realized that it has a
dependency commons-collections4
(version 4.2) which is used only once in the whole code base. That means that only
a single import statement of those packages (import org.apache.commons.collections4.*
) in the code exists. The code
looks like this (see AnalyzeDuplicateMojo):
1private Set<String> findDuplicateDependencies( List<Dependency> modelDependencies )
2{
3 List<String> modelDependencies2 = new ArrayList<>();
4 for ( Dependency dep : modelDependencies )
5 {
6 modelDependencies2.add( dep.getManagementKey() );
7 }
8
9 // @formatter:off
10 return new LinkedHashSet<>(
11 CollectionUtils.disjunction( modelDependencies2, new LinkedHashSet<>( modelDependencies2 ) ) );
12 // @formatter:on
13}
That made me take a closer look on that. I analyzed the dependency a bit more and the commons-collections4
is a jar
file of ca. 734 KiB in size and contains ca. 300 classes.
So I thought, does it really make sense to depend on a jar file for a few classes being used? So what exactly is
the CollectionUtils.disjunction
doing? Based on it's javadoc it is calculating the symmetric difference
of two sets. An excerpt from Wikipedia:
...also known as the disjunctive union, is the set of elements which are in either of the sets, but not in their intersection. For example, the symmetric difference of the sets {1,2,3} and {3,4} is {1,2,4}.
So that brought me to the conclusion that it might be worth to implement that on my own to prevent depending on a library only for a few classes.
So after a bit of tests and some retries I got the following result:
1class Util
2{
3
4 public static <O> Collection<O> symmetricDifference( Iterable<? extends O> elementsOne,
5 Iterable<? extends O> elementsTwo )
6 {
7 Collection<O> resultOne = StreamSupport.stream( elementsOne.spliterator(), false ).collect( toList() );
8 Collection<O> resultTwo = StreamSupport.stream( elementsTwo.spliterator(), false ).collect( toList() );
9
10 HashSet<O> hashSet = new HashSet<>(resultOne);
11 hashSet.addAll( resultTwo );
12
13 Map<O, Long> cardinalityOne =
14 resultOne.stream().collect( groupingBy( identity(), counting() ) );
15 Map<O, Long> cardinalityTwo =
16 resultTwo.stream().collect( groupingBy( identity(), counting() ) );
17
18 List<O> newList = new ArrayList<>();
19
20 for ( O item : hashSet )
21 {
22 Long cardOne = cardinalityOne.getOrDefault( item, 0L );
23 Long cardTwo = cardinalityTwo.getOrDefault( item, 0L );
24 long max = Math.max( cardOne, cardTwo );
25 long min = Math.min( cardOne, cardTwo );
26 addItem( newList, item, max - min );
27 }
28
29 return newList;
30 }
31
32 private static <O> void addItem( List<O> items, O item, long count )
33 {
34 for ( int i = 0; i < count; i++ )
35 {
36 items.add( item );
37 }
38 }
39
40}
Of course, I've written unit tests to verify the code. The tests I've written at the beginning where to
compare the results of my own code with the results of CollectionUtils.disjunction
. Afterwards I have
removed the calling to CollectionUtils.disjunction
as well as the dependency.
Apart from that we can discuss the above code that it is not the optimal solution or could be simplified which is true.
You can argue that using Iterable<? extends O>
as parameters might be change by using
Collection<O>
instead which is correct. Still there are other options to simplify the code in different
ways like using Collections.nCopies()
and using summingInt(..)
instead of counting()
which results
in type change from Long
to Integer
. In the end makes the code smaller and easier. So the final
result looks like this:
1class Util
2{
3
4 static <O> Collection<O> symmetricDifference( Collection<O> elementsOne,
5 Collection<O> elementsTwo )
6 {
7 Set<O> hashSet = new HashSet<>( elementsOne );
8 hashSet.addAll( elementsTwo );
9
10 Map<O, Integer> cardinalityOne =
11 elementsOne.stream().collect( groupingBy( identity(), summingInt( e -> 1 ) ) );
12 Map<O, Integer> cardinalityTwo =
13 elementsTwo.stream().collect( groupingBy( identity(), summingInt( e -> 1 ) ) );
14
15 return hashSet.stream().flatMap( item -> {
16 int cardOne = cardinalityOne.getOrDefault( item, 0 );
17 int cardTwo = cardinalityTwo.getOrDefault( item, 0 );
18 int max = Math.max( cardOne, cardTwo );
19 int min = Math.min( cardOne, cardTwo );
20 return Collections.nCopies( max - min, item ).stream();
21 } ).collect( toList() );
22 }
23
24}
So in the end it is worth to reconsider the usage of dependencies because in this case I've written a class with less than 20 lines of code to replace a dependency with 300+ classes of which only a few (3 If I analyzed that correctly) are really being used.