Array     Bit Manipulation     C++     Enumeration     Hard     Hash Table     String    

Problem Statement:

You are given an array of strings ideas that represents a list of names to be used in the process of naming a company. The process of naming a company is as follows:

  1. Choose 2 distinct names from ideas, call them ideaA and ideaB.
  2. Swap the first letters of ideaA and ideaB with each other.
  3. If both of the new names are not found in the original ideas, then the name ideaA ideaB (the concatenation of ideaA and ideaB, separated by a space) is a valid company name.
  4. Otherwise, it is not a valid name.

Return the number of distinct valid names for the company.

 

Example 1:

Input: ideas = ["coffee","donuts","time","toffee"]
Output: 6
Explanation: The following selections are valid:
- ("coffee", "donuts"): The company name created is "doffee conuts".
- ("donuts", "coffee"): The company name created is "conuts doffee".
- ("donuts", "time"): The company name created is "tonuts dime".
- ("donuts", "toffee"): The company name created is "tonuts doffee".
- ("time", "donuts"): The company name created is "dime tonuts".
- ("toffee", "donuts"): The company name created is "doffee tonuts".
Therefore, there are a total of 6 distinct company names.

The following are some examples of invalid selections:
- ("coffee", "time"): The name "toffee" formed after swapping already exists in the original array.
- ("time", "toffee"): Both names are still the same after swapping and exist in the original array.
- ("coffee", "toffee"): Both names formed after swapping already exist in the original array.

Example 2:

Input: ideas = ["lack","back"]
Output: 0
Explanation: There are no valid selections. Therefore, 0 is returned.

 

Constraints:

  • 2 <= ideas.length <= 5 * 104
  • 1 <= ideas[i].length <= 10
  • ideas[i] consists of lowercase English letters.
  • All the strings in ideas are unique.

Solution:

Intuition

Consider the example with all words from either 'a' or 'b':

 
ideas = ["apple","and","alpha","amaze","aye","aid","bye","bid","bat","beef","brown"]
 

Based on the starting letter and the suffix we can see this pattern:

 
'a': ["pple","nd","lpha","maze","ye","id"]
'b': ["ye","id","at","eef","rown"]
 

Notice that "ye" and "id" are present in both. These cannot be used for company names because for example suppose we use "aye","bat" we will have "bye aat" as the company name but "bye" is present in original list of ideas hence not valid. So, we need to find the suffixes present in only one of the arrays:

 
'a': ["pple","lpha","nd","maze"]
'b': ["at","eef","rown"]
 

We can choose 12 pairs from here and from them we will have 24 company names:

 
company_names = ["aat bpple", "bpple aat", "aeef bpple", "bpple aeef", "arown bpple", "bpple arown", "aat blpha", "blpha aat", "aeef blpha", "blpha aeef", "arown blpha", "blpha arown", "aat bnd", "bnd aat", "aeef bnd", "bnd aeef", "arown bnd", "bnd arown", "aat bmaze", "bmaze aat", "aeef bmaze", "bmaze aeef", "arown bmaze", "bmaze arown"]
 

Hence answer for this example is 24.

Approach

We will use 26 HashSets to store suffixes for each alphabet. For each pair of alphabets, we will add their contribution as $2n1n2$ where $n1,n2$ are the suffixes from each alphabet not present in the other one.

Code

 
class Solution {
public:
    long long distinctNames(vector<string>& ideas) 
    {
        long long res = 0;
        vector<unordered_set<string>> ideaSet(26, unordered_set<string>{});
        for(string idea: ideas) ideaSet[idea[0]-'a'].insert(idea.substr(1));
        for (int i=0; i<26; i++)
        {
            for(int j=i+1; j<26; j++)
            {
                int common=0;
                for(string idea: ideaSet[j]) common+=ideaSet[i].count(idea);
                int n1 = ideaSet[i].size()-common, n2=ideaSet[j].size()-common;
                res += 2LL*n1*n2;
            }
        }
        return res;
    }
};
 

Complexity

Worst case time complexity is $O(nm)$ where $n$ is the number of words in ideas, $m$ is the maximum size of word in ideas . Space complexity: $O(nm)$

Alternative solution

We can have slightly more efficient solution (same TC) if instead of storing suffixes we just maintain a 26x26 frequency table to check validity of name.

 
class Solution {
public:
    long long distinctNames(vector<string>& ideas) 
    {
        long long res = 0;
        unordered_set<string> ideaSet(ideas.begin(),ideas.end());
        vector<vector<int>>freq_table(26,vector<int>(26,0));
        for (string idea: ideas)
        {
            for(char ch='a'; ch<='z'; ch++)
            {
                string namePart = string(1,ch)+idea.substr(1);
                if(!ideaSet.count(namePart))
                    freq_table[idea[0]-'a'][ch-'a']++;
            }
        }
        for (int i=0; i<26; i++)
            for (int j=i+1; j<26; j++)
                res += 2LL*freq_table[i][j]*freq_table[j][i];
        return res;
    }
};