Set

Set #

A Set is a fundamental data structure available in many programming languages, which stores unique elements. The primary characteristic of a Set is that it contains no duplicates; each element can appear only once. Sets are particularly useful when you want to keep track of a collection of elements without worrying about their order or occurrence count. Here’s an overview of the key aspects and operations associated with Sets:

Basic Definition and Properties #

  • Uniqueness: A Set is defined by its unique members, ensuring no duplicates are present within it. This makes checking for membership (whether a particular element exists in the set) efficient compared to data structures that allow duplicates, like lists or arrays.
  • Ordering: The order of elements in a Set can vary based on the implementation and whether you’re working with an unordered collection (like Python’s set or Java’s HashSet) versus an ordered one (e.g., Python’s frozenset, which is actually just a frozen set, but lacks methods that modify its content).
  • Dynamic Size: Sets can grow and shrink dynamically as elements are added and removed, though their performance characteristics depend on the underlying implementation.

Disjoint Set #

The Disjoint Set data structure, also known as Union-Find or Merge-Find Set, is a powerful abstract data type that allows you to efficiently manage and query the connected components of a graph. It supports two primary operations: Union (combining sets) and Find (determining which set an element belongs to), both having efficient implementations.

Key Properties and Operations: #

  1. Disjoint Sets: Each disjoint set consists of elements partitioned into non-overlapping subsets, ensuring that no two elements in the same subset are connected by a path.
  2. Union Operation: This operation merges two distinct sets into one. It’s typically implemented with careful consideration to maintain optimal time complexity (usually O(log n) for both insertions and unions).
  3. Find Operation: Determines the representative element of a set in which an item belongs, usually through path compression techniques that flatten the structure of the tree representing sets, achieving nearly constant-time operations.

Implementation Details: #

The Disjoint Set Data Structure can be implemented using two main approaches: Weighted Quick Union (WQU) and Quick Find for union operation, along with path compression optimization in both. For the find operation, there are also variations like Lazy Union and Path Compression that further optimize performance.

Weighted Quick Union (WQU): #

In WQU, each set is represented by a tree where elements point to their parents, with trees of different sizes linked together in a specific order during the union operation to keep the depth of the trees as balanced as possible. The find operation traverses up the parent pointers until it finds the root of an element’s set (the representative).

Quick Find: #

Quick Find is simpler but less efficient for larger datasets due to its O(n) time complexity for both union and find operations, where n is the number of elements. Each element points directly to a set representative. However, this method provides constant-time performance for the find operation but not for unions or dynamic insertion.

Path Compression: #

Path compression optimizes the efficiency of both operations by making every visited node in the find operation point directly to the root when found. This significantly reduces the height of trees over time, leading to nearly constant-time performance even for subsequent operations.

Applications: #

Disjoint Set Data Structures are widely used in computer science applications requiring efficient management of disconnected components, including network connectivity problems, Kruskal’s algorithm for finding a minimum spanning tree (MST) of a graph, and cycle detection in graphs.

Algorithm #

Here’s an algorithm for implementing disjoint set.

Algorithm for Disjoint Set with Path Compression #

  1. Initialization: Start by representing each element as a node, where the parent of each node is itself initially (indicating that they are their own sets). This can be implemented using an array parent[] where parent[i] = i for all elements from 0 to N-1.

  2. Find: To find which set a particular element belongs to, follow these steps:

    • Start at the node corresponding to the given element’s index (element).
    • If this node is its own parent, it’s the representative of its set, and you can return this value directly.
    • Otherwise, recursively or iteratively traverse up through the parents until you reach an element that points to itself. This path represents a sequence from the given element back to the root of its set (the set’s representative). - To optimize future Find operations, apply Path Compression: after finding the representative, make every node on this path point directly to the representative by updating each node’s parent pointer to the representative. This step significantly speeds up subsequent Find operations for these nodes and any others connected through them.
  3. Union: To merge two disjoint sets into a single set, execute the following steps:

    • Perform Find on both elements (A and B) to find their respective representatives (roots). Let’s say rootA is the root of A’s set and rootB is the root of B’s set.
    • If they are already in the same set, no action is needed. However, if they are different sets (i.e., their roots are not equal), make one representative point to the other by setting the parent of rootA or rootB to be the other root. This unites the two sets into a single set.
    • Optionally, apply Path Compression again during this operation for all nodes found in either path (including those from previous Union operations) as they may need to update their pointers directly to the new representative.

Pseudocode #

SimpleUnion(i, j) {
    p[i] = j;
}

SimpleFind(i) {
    while (p[i] >= 0) do
        i = p[i];
    return i;
}

WeightedUnion(i, j) {
    // Union sets with roots i and j. i != j, using the weighting rule
    // p[i] = -count[i] and p[j] = -count[j]
    temp = p[i] + p[j];
    if (p[i] > p[j]) then
        // i has fewer nodes
        p[i] j;
        p[j] = temp;
    else
        // j has fewer or equal nodes
        p[j] = i;
        p[i] = temp;
}

Code #

import <algorithm>;
import <numeric>;
import <print>;
import <ranges>;
import <vector>;

struct Set {
  std::vector<ssize_t> parent{};
  std::vector<ssize_t> rank{};

  constexpr Set(const ssize_t &size) {
    parent.resize(size);
    rank.resize(size);

    std::iota(parent.begin(), parent.end(), 0);
    std::ranges::fill(rank, 0);
  }

  constexpr auto find(const ssize_t &node) -> ssize_t {
    if (node == parent.at(node))
      return node;

    return parent.at(node) = find(parent.at(node));
  }

  constexpr auto union_set(ssize_t u, ssize_t v) -> void {
    u = find(u);
    v = find(v);

    if (u != v) {
      if (rank.at(u) < rank.at(v))
        std::swap(u, v);
      parent.at(v) = u;
      if (rank.at(u) == rank.at(v))
        ++rank[u];
    }
  }
};
int main() {
  const ssize_t size{5};
  Set disjoint_set(size);

  disjoint_set.union_set(0, 1);
  disjoint_set.union_set(1, 2);
  disjoint_set.union_set(3, 4);

  for (ssize_t i : std::ranges::iota_view{0, size})
    std::println("Find({}):{}", i, disjoint_set.find(i));

  std::print("Parent array: ");
  for (ssize_t i : std::ranges::iota_view{0, size})
    std::print("{} ", disjoint_set.parent[i]);
  std::print("\n");
  return 0;
}

Explanation #

1. Struct Definition #

struct Set {
  std::vector<ssize_t> parent{};
  std::vector<ssize_t> rank{};
  • struct Set defines a new struct type named Set.
  • Inside the struct, two member variables are declared:
    • std::vector<ssize_t> parent{}: This is a vector that will hold the parent of each element. It is used to keep track of the representatives (or roots) of each subset.
    • std::vector<ssize_t> rank{}: This is a vector that will hold the rank (or depth) of each element. It is used to keep the tree flat by attaching smaller trees under the root of larger trees.

2. Constructor #

  constexpr Set(const ssize_t &size) {
    parent.resize(size);
    rank.resize(size);

    std::iota(parent.begin(), parent.end(), 0);
    std::ranges::fill(rank, 0);
  }
  • constexpr Set(const ssize_t &size) is a constructor that initializes a Set instance with a given size.
  • parent.resize(size); and rank.resize(size); resize the parent and rank vectors to the given size, initializing them to hold size elements.
  • std::iota(parent.begin(), parent.end(), 0); initializes the parent vector such that each element is its own parent. std::iota is a standard algorithm that fills the range with sequentially increasing values starting from 0. After this, parent[i] == i for all i in the range [0, size).
  • std::ranges::fill(rank, 0); sets all elements in the rank vector to 0. std::ranges::fill is a standard algorithm that assigns the value 0 to each element in the rank vector.

3. find() Method #

constexpr auto find(const ssize_t &node) -> ssize_t {
    // If the node is its own parent, it is the root of its set
    if (node == parent.at(node))
        return node;

    // Path compression: recursively find the root and update the parent
    return parent.at(node) = find(parent.at(node));
}
Method Definition #
constexpr auto find(const ssize_t &node) -> ssize_t {
  • constexpr auto find(const ssize_t &node) -> ssize_t: This defines a constexpr method named find that takes a single parameter node of type ssize_t and returns a value of type ssize_t.
Method Body #
    if (node == parent.at(node))
      return node;
  • The method checks if node is its own parent, i.e., if node is the root of its set. This is done using parent.at(node), which accesses the element at index node in the parent vector with bounds checking (thanks to the .at() method).
  • If node is its own parent, it means node is the representative of its set, and the method returns node.
Path Compression #
    return parent.at(node) = find(parent.at(node));
  • If node is not its own parent, the method recursively calls find on parent.at(node), which finds the root of node’s set.
  • The result of the recursive find call is then assigned back to parent.at(node). This step is the path compression optimization: it makes each node on the path from node to the root point directly to the root. This flattens the structure of the tree, reducing the time complexity of future find operations.
  • Finally, the method returns the root of the set containing node.

4. union_set() Method #

constexpr auto union_set(ssize_t u, ssize_t v) -> void {
    // Find the roots of the sets containing u and v
    u = find(u);
    v = find(v);

    // If u and v are in different sets, merge them
    if (u != v) {
        // Union by rank: ensure the higher rank tree remains the root
        if (rank.at(u) < rank.at(v))
            std::swap(u, v);

        // Make u the parent of v
        parent.at(v) = u;

        // If ranks were equal, increment the rank of the new root
        if (rank.at(u) == rank.at(v))
            ++rank[u];
    }
}
Method Definition #
constexpr auto union_set(ssize_t u, ssize_t v) -> void {
  • constexpr auto union_set(ssize_t u, ssize_t v) -> void: This defines a constexpr method named union_set that takes two parameters u and v of type ssize_t and returns void.
Finding the Roots #
    u = find(u);
    v = find(v);
  • The method starts by finding the roots of the sets containing u and v. This is done using the find method previously defined. After this step, u and v are the representatives (roots) of their respective sets.
Checking if Already Unified #
    if (u != v) {
  • The condition checks if the roots u and v are different. If they are the same, u and v are already in the same set, and no union operation is needed.
Union by Rank #
      if (rank.at(u) < rank.at(v))
        std::swap(u, v);
  • If u and v are different, the method performs union by rank. It compares the ranks of the roots u and v.
    • If rank[u] < rank[v], it swaps u and v to ensure that u has the higher rank. This keeps the tree shallower by attaching the smaller tree under the root of the larger tree.
Merging the Sets #
      parent.at(v) = u;
  • The method sets parent[v] to u, effectively making u the parent of v. This merges the set containing v into the set containing u.
Updating the rank #
      if (rank.at(u) == rank.at(v))
        ++rank[u];
    }
  • If the ranks of u and v were equal, the rank of the new root u is incremented by 1. This is because the depth of the tree increases when two trees of the same rank are merged.

5. main() Method #

int main() {
  // Define the size of the disjoint set
  const ssize_t size{5};
  // Create an instance of Set with the specified size
  Set disjoint_set(size);

  // Perform union operations
  disjoint_set.union_set(0, 1);
  disjoint_set.union_set(1, 2);
  disjoint_set.union_set(3, 4);

  // Print the results of find operations for each element
  for (ssize_t i : std::ranges::iota_view{0, size})
    std::println("Find({}):{}", i, disjoint_set.find(i));

  // Print the parent array
  std::print("Parent array: ");
  for (ssize_t i : std::ranges::iota_view{0, size})
    std::print("{} ", disjoint_set.parent[i]);
  std::print("\n");
  return 0;
}
Performing Union Operations #
  disjoint_set.union_set(0, 1);
  disjoint_set.union_set(1, 2);
  disjoint_set.union_set(3, 4);
  • disjoint_set.union_set(0, 1); merges the sets containing elements 0 and 1.
  • disjoint_set.union_set(1, 2); merges the sets containing elements 1 and 2. Since 1 is already united with 0, this effectively unites elements 0, 1, and 2 into a single set.
  • disjoint_set.union_set(3, 4); merges the sets containing elements 3 and 4.
Printing the Results of Find Operations #
  for (ssize_t i : std::ranges::iota_view{0, size})
    std::println("Find({}):{}", i, disjoint_set.find(i));
  • This loop iterates over the range [0, size) using std::ranges::iota_view{0, size}.
  • For each element i, it calls disjoint_set.find(i) to find the representative (root) of the set containing i.
  • std::println("Find({}):{}", i, disjoint_set.find(i)); prints the result in the format Find(i):root, where root is the representative of the set containing i.
Printing the Parent Array #
  std::print("Parent array: ");
  for (ssize_t i : std::ranges::iota_view{0, size})
    std::print("{} ", disjoint_set.parent[i]);
  std::print("\n");
  • std::print("Parent array: "); prints a label for the parent array. This loop iterates over the range [0, size) using std::ranges::iota_view{0, size}.
  • For each element i, it prints the value of disjoint_set.parent[i] followed by a space.
  • std::print("\n"); prints a newline character to end the line.

Output #

❯ ./main
Find(0):0
Find(1):0
Find(2):0
Find(3):3
Find(4):3
Parent array: 0 0 0 3 3

Explanation #

  1. Initialization:
  • parent array: [0, 1, 2, 3, 4]
  • rank array: [0, 0, 0, 0, 0]
  1. Union Operations:
  • union_set(0, 1):

    • find(0) returns 0.
    • find(1) returns 1.
    • rank[0] == rank[1], so parent[1] is set to 0 and rank[0] is incremented.
    • parent array: [0, 0, 2, 3, 4]
    • rank array: [1, 0, 0, 0, 0]
  • union_set(1, 2):

    • find(1) returns 0 (since parent[1] is 0).
    • find(2) returns 2.
    • rank[0] > rank[2], so parent[2] is set to 0.
    • parent array: [0, 0, 0, 3, 4]
    • rank array: [1, 0, 0, 0, 0]
  • union_set(3, 4):

    • find(3) returns 3.
    • find(4) returns 4.
    • rank[3] == rank[4], so parent[4] is set to 3 and rank[3] is incremented.
    • parent array: [0, 0, 0, 3, 3]
    • rank array: [1, 0, 0, 1, 0]
  1. Find Operations:
  • find(0) returns 0.
  • find(1) returns 0 (since parent[1] is 0).
  • find(2) returns 0 (since parent[2] is 0).
  • find(3) returns 3.
  • find(4) returns 3 (since parent[4] is 3).