Similar String Groups - DFS + BFS + Union-Find solutions
Problem Statement:
Two strings X and Y are similar if we can swap two letters (in different positions) of X, so that it equals Y. Also two strings X and Y are similar if they are equal.
For example, "tars" and "rats" are similar (swapping at positions 0 and 2), and "rats" and "arts" are similar, but "star" is not similar to "tars", "rats", or "arts".
Together, these form two connected groups by similarity: {"tars", "rats", "arts"} and {"star"}. Notice that "tars" and "arts" are in the same group even though they are not similar. Formally, each group is such that a word is in the group if and only if it is similar to at least one other word in the group.
We are given a list strs of strings where every string in strs is an anagram of every other string in strs. How many groups are there?
Example 1:
Input: strs = ["tars","rats","arts","star"] Output: 2
Example 2:
Input: strs = ["omv","ovm"] Output: 1
Constraints:
1 <= strs.length <= 3001 <= strs[i].length <= 300strs[i]consists of lowercase letters only.- All words in
strshave the same length and are anagrams of each other.
Solution:
Notion of similarity: Two strings are similar if they differ at exactly 0 or 2 positions. This is because we are given that all strings are anagrams of each other, so 0 swaps means differ at 0 positions and 1 swap means differ at exactly 2 positions.
We can now create a graph where each string in strs is a node in the graph and two nodes are connected if they are similar strings as defined above.
DFS solution: We run DFS for each unvisited node. Number of times we have to start DFS is the answer.
class Solution {
public:
void dfs(vector<vector<int>>&G, vector<bool>&visited, int cur)
{
visited[cur] = true;
for (int nbd: G[cur]) if (!visited[nbd]) dfs(G, visited, nbd);
}
bool isSimilar(string s1, string s2)
{
int diff=0;
for (int i=0; i<s1.length(); i++) if (s1[i]!=s2[i]) if(++diff>2) return false;
return true;
}
int numSimilarGroups(vector<string>& strs)
{
int n = strs.size(), res=0;
vector<vector<int>> G(n);
for(int i=0; i<n; i++) for(int j=i+1; j<n; j++) if(isSimilar(strs[i],strs[j]))
{
G[i].push_back(j);
G[j].push_back(i);
}
vector<bool>visited(n,false);
for (int i=0; i<n; i++) if(!visited[i])
{
dfs(G,visited,i);
res++;
}
return res;
}
};
BFS solution: Same thing can be done using a queue in a BFS.
class Solution {
public:
void bfs(vector<vector<int>>&G, vector<bool>&visited, int cur)
{
queue<int>Q;
Q.push(cur);
visited[cur] = true;
while (!Q.empty())
{
int node = Q.front();
Q.pop();
for (int nbd: G[node]) if(!visited[nbd])
{
Q.push(nbd);
visited[nbd] = true;
}
}
}
bool isSimilar(string s1, string s2)
{
int diff=0;
for (int i=0; i<s1.length(); i++) if (s1[i]!=s2[i]) if(++diff>2) return false;
return true;
}
int numSimilarGroups(vector<string>& strs)
{
int n = strs.size(), res=0;
vector<vector<int>> G(n);
for(int i=0; i<n; i++) for(int j=i+1; j<n; j++) if(isSimilar(strs[i],strs[j]))
{
G[i].push_back(j);
G[j].push_back(i);
}
vector<bool>visited(n,false);
for (int i=0; i<n; i++) if(!visited[i])
{
bfs(G,visited,i);
res++;
}
return res;
}
};
Union-Find solution or Disjoint-Set-Union (DSU) solution: We can also create a Union-Find data structure where we create union for each pair of similar words. Read more on Union-Find in this LC explore card. Here we will use an optimized implementation of Union-Find with path compression and union by rank. Finally the number of unique root nodes in the UF data structure is the answer.
class UnionFind {
public:
UnionFind(int sz) : root(sz), ranks(sz) {
for (int i = 0; i < sz; i++) {
root[i] = i;
ranks[i] = 1;
}
}
int find(int x) {
if (x==root[x]) return x;
return root[x] = find(root[x]);
}
void unionSet(int x, int y) {
int rootX = find(x);
int rootY = find(y);
if (rootX == rootY) return;
if(ranks[rootX]==ranks[rootY]) ranks[rootX]++;
else if (ranks[rootX]<ranks[rootY]) swap(rootY,rootX);
root[rootY] = rootX;
}
private:
vector<int> root;
vector<int> ranks;
};
class Solution {
public:
bool isSimilar(string s1, string s2)
{
int diff=0;
for (int i=0; i<s1.length(); i++) if (s1[i]!=s2[i]) if(++diff>2) return false;
return true;
}
int numSimilarGroups(vector<string>& strs)
{
int n = strs.size();
UnionFind uf = UnionFind(n);
for(int i=0; i<n; i++) for(int j=i+1; j<n; j++)
if(isSimilar(strs[i],strs[j])) uf.unionSet(i, j);
unordered_set<int> myset;
for(int i=0; i<n; i++) myset.insert(uf.find(i));
return myset.size();
}
};
All three solutions have TC of $O(n^2 \vert s \vert)$ and SC of DFS and BFS solutions is $O(n^2)$ (to store the graph) whereas SC of Union-Find solution is $O(n)$.