Regular Expression Matching - Recursion and memoization
Problem Statement:
Given an input string s
and a pattern p
, implement regular expression matching with support for '.'
and '*'
where:
'.'
Matches any single character.'*'
Matches zero or more of the preceding element.
The matching should cover the entire input string (not partial).
Example 1:
Input: s = "aa", p = "a" Output: false Explanation: "a" does not match the entire string "aa".
Example 2:
Input: s = "aa", p = "a*" Output: true Explanation: '*' means zero or more of the preceding element, 'a'. Therefore, by repeating 'a' once, it becomes "aa".
Example 3:
Input: s = "ab", p = ".*" Output: true Explanation: ".*" means "zero or more (*) of any character (.)".
Constraints:
1 <= s.length <= 20
1 <= p.length <= 20
s
contains only lowercase English letters.p
contains only lowercase English letters,'.'
, and'*'
.- It is guaranteed for each appearance of the character
'*'
, there will be a previous valid character to match.
Solution:
Without *
character in pattern
, we can have following logic for regex:
bool isMatch(string s, string p)
{
int slen = s.length(), plen = p.length();
if ((plen==0) && (slen==0)) return true;
if ((plen==0) || (slen==0)) return false;
bool fm = (s[0]==p[0] || p[0]=='.');
return fm && isMatch(s.substr(1), p.substr(1));
}
Here fm
denotes first character matching. Notice how we have enforced edge cases. We are saying that both s
and p
must end together for true
. With the addition of *
character in pattern
, things change significantly. Now, p
can end after s
to account for zero matches in case of *
, notice however that s
cannot end after p
has ended. Also, we will need to add check in s
before getting fm
.
bool isMatch(string s, string p)
{
int slen = str.length(), plen = p.length();
if (plen==0) return (slen==0);
bool fm = (slen>0) && ((p[0]==s[0]) || (p[0]=='.'));
if (plen>=2 && p[1]=='*')
return isMatch(s,p.substr(2))||(fm && isMatch(s.substr(1), p));
else
return fm && isMatch(s.substr(1), p.substr(1));
}
Here, the non-asterisk logic is same. However for asterisk, we have two cases: 0 match and 1 match. For zero match, we just increment two positions in p
and for one match, we increment 1 position in s
and again run with the same p
(because *
signifies any number of matches).
This passes for all tests but gives TLE at the end.
Now, Let us add memoization to get AC.
class Solution {
vector<vector<int>> dp;
public:
bool isMatch(string s, string p)
{
int slen=s.length(), plen=p.length();
if (plen==0) return (slen==0);
if (dp.size()==0) dp = vector<vector<int>>(slen+1, vector<int>(plen+1,2));
if (dp[slen][plen]!=2) return dp[slen][plen];
bool fm = (slen>0) && (s[0]==p[0] || p[0]=='.');
if (plen>=2 && p[1]=='*')
return dp[slen][plen] = (fm && isMatch(s.substr(1), p)) || isMatch(s, p.substr(2));
else
return dp[slen][plen] = fm && isMatch(s.substr(1), p.substr(1));
}
};
We get AC for this.
Same solution in java:
class Solution {
int[][] dp;
public boolean isMatch(String s, String p)
{
int slen=s.length(), plen=p.length();
if (plen==0) return (slen==0);
if (dp==null)
{
dp = new int[slen+1][plen+1];
for (int[] row: dp) Arrays.fill(row, 2);
}
if (dp[slen][plen]!=2) return (dp[slen][plen]==1);
boolean fm = (slen>0) && (s.charAt(0)==p.charAt(0) || p.charAt(0)=='.');
boolean ret;
if (plen>=2 && p.charAt(1)=='*')
ret = (fm && isMatch(s.substring(1), p)) || isMatch(s, p.substring(2));
else
ret = fm && isMatch(s.substring(1), p.substring(1));
dp[slen][plen] = (ret)?1:0;
return ret;
}
};