Microsoft Interview Question
Dev Leads Dev LeadsCountry: India
Interview Type: In-Person
This is called levenshtein distance.
int LevenshteinDistance(string s, string t){
if(s.size()==0)
return t.size();
if(t.size()==0)
return s.size();
// check last char
int cost;
if (s[s.size()-1] == t[t.size()-1])
cost = 0;
else
cost = 1;
// minimum deleting char from s, from t and from both
return minimum(LevenshteinDistance(s.substr(0,s.size()-1),t) + 1,
LevenshteinDistance(s,t.substr(0,t.size()-1)) + 1,
LevenshteinDistance(s.substr(0,s.size()-1),t.substr(t.size()-1)) + cost);
}
I see that you have used dynamic programming which is O(m*n). I think we can solve the problem in O(m).
m - length of longer string. n - length of shorter string.
Instead of finding the distance, just return boolean if distance is 1. something like following
boolean AreSeperatedbyOne(String s1, String s2)
{
for (int i = 0; i < s1.Length && i < s2.Length; i++)
{
if (s1[i] != s2[i])
{
return s1.Substring(i + 1) == s2.Substring(i + 1) //case of change
|| s1.Substring(i + 1) == s2.Substring(i) //case of s1 has extra
|| s1.Substring(i) == s2.Substring(i + 1); //case of s2 has extra
}
}
return Math.Abs(s1.Length - s2.Length) == 1;
int diffcount(string& s1, string& s2, int begin1, int begin2, int n)
{
int ret = 0;
int i = 0;
for(i=0; i<n; i++)
{
if( s1.at(begin1+i) != s2.at(begin2+i) )
{
ret++;
}
}
cout<<" diffcount: "<<ret<<"\n";
return ret;
}
bool IsEditDistanceOne(string& s1, string& s2)
{
bool ret = false;
int n1 = s1.length(), n2 = s2.length();
if ( n1 == n2 )
{
ret = ( diffcount(s1,s2,0,0,s1.length()) == 1 );
}
else
{
int count = 0;
string *one = &s1, *two = &s2;
if( n2>n1 )
{
one = &s2; two = &s1;
}
int i = 0;
while( (i<two->length() ) && (count<2) )
{
if( one->at(i+count) == two->at(i) )
{
i++;
}
else
{
count++;
}
}
ret = (count==1);
}
return ret;
}
This should solve the problem in O(n). Please let me know if you find any problems with this code. :)
Here you're just assuming substitution.
However, insertion and deletion also render a string to be 1-edit distance away.
There's really no better than O(n*m) if we want to calculate the exact distance. And mine is actually very inefficient cause I'm just using plain recusion rather than DP. But it works, it's simple enough to deliver the idea.
@imps I do consider the of insertion and deletion. Can you probably give me an example where my code fails. For example, I tested on strings like "sport" and "sort" and my code gave me an edit distance of one.
@imps Thanks for pointing this out to me. However, i think its a minor implementation error. I feel that the following should be correct.
bool IsEditDistanceOne(string& s1, string& s2)
{
bool ret = false;
int n1 = s1.length(), n2 = s2.length();
if ( n1 == n2 )
{
ret = ( diffcount(s1,s2,0,0,s1.length()) == 1 );
}
else
{
int count = 0;
string *one = &s1, *two = &s2;
if( n2>n1 )
{
one = &s2; two = &s1;
}
int i = 0;
while( (i<two->length() ) && (count<2) )
{
if( one->at(i+count) == two->at(i) )
{
i++;
}
else
{
count++;
}
}
// Account for number of characters left in longer string
count += ( one->length() - i - count );
ret = (count==1);
}
return ret;
}
Since we only need to tell if edit distance is 1 or not, we can leave out the computation of edit distance and do a comparison to identify the distance. break immediately on the first sign of distance being > 1. Calculating edit distance is a recurrance problem and atleast O(nm). telling if distance is 1 or not can be done in O(m), m being the length of shorter string.
1. Compare String a and b. if size(a) = size(b) or size(a)+1=size(b) or size(b)+1=size(a)
then we can go ahead else edit distance is not 1.
2. have a counter set to 0
have i pointing to first char of a and j pointing to first char of b
if ( (i==size(a) or j==size(b)))
then stop prcessing and check counter
if(counter==1)
then edit distance is 1 else not
if(charAt(i) == charAt(j))
increment i and j
else
increment counter. if counter==2 break , else increment i and j
Once you determined the distance is not 1 in first pass, do the same operation as above backwards. This time if we find the edit to be greater than 1 then edit distance is not 1.
this is O(m) where m is length of shorter string. n=m if both string are of same length.
Edit distance is interesting topic but to cater only to this question, I believe this is more efficient.
bool IsDistanceOne(string& a, string& b) {
int i, j, count;
int diff = abs(a.length() - b.length());
if (diff > 1) {
return false;
}
count = 0;
if (diff == 0) {
for(i = 0; i < a.length(); i++) {
if (a[i] != b[i]) {
count++;
if (count > 1) return false;
}
}
// Strings are same
if (count == 0) {
return false;
}
}
if (diff == 1) {
i = 0;
j = 0;
while(i < a.length() && j < b.length()) {
if (a[i] != b[j]) {
count++;
if (count > 1) return false;
if (a.length() - b.length() == 1) {
//deletion case in a
i++;
} else {
//addition case in a
j++;
}
} else {
i++;
j++;
}
}
}
return true;
}
This should work...
- Edit distance one means, there is one character which is different.
- We have to address following 4 cases.
Lets call two strings as s1 and s2.
Case 1 : - if(s1.firstchar!= s2.firstchar && s1.lastchar!= s2.lastchar)
which means there are two differences thus. return false.
case 2 :- if(s1.firstchar==s2.firstchar && s1.lastchar!= s2.lastchar)
which means there is differnce at end already(keep track of end indices), start from first char then keep on comparing until you find a difference. If you find and its not tracked indices then return false
case 3:- if(s1.firstchar!=s2.firstchar && s1.lastchar== s2.lastchar)
which means there is differnce at start already(keep track of start indices), start from last char then keep on comparing until you find a difference. If you find and its not tracked indices then return false
case 4 :- if(s1.firstchar==s2.firstchar && s1.lastchar== s2.lastchar)
which means there is difference in between somewhere, start comparing from first character keep moving forward until you find a difference. once you found a difference (keep note of current indices) start comparing char from end. if you again found difference at same different indices return false , but if any of the previously tracked indices are same as current difference indices then edit distance is one.
For eg. s1 ="abcde" s2="acde"
step1 :- compare "a" then there is conflict at "b" and "c"
step2 :- so we went to end again start from there we found conflict with "b" and "a"
step3 :- both "b" do have same index so it has edit distance of one.
Note:- we can add more bound checking like comparing length of string. Please let me know, if it won't work any test case.
import java.util.*;
import java.lang.Math;
public class EditDistance
{
// This problem is also called the Levenshtein distance problem
// To edit the strings, basic operations needed are < 1. Insert; 2. Delete 3. Replace >
// Lets define E(x[1...i], y[1...j]) as the matrix which keeps track of the
// number of opertation that needs to be performed to make the change
// Base cases:
// 1. E(0,0) = 0 ( nothing changed)
// 2. E(1,0) = 1 ( replace or delete a character in x, nothing changes in y)
// So, if the same operating is done for 'k' indexes, edit distance would be 'k'. E(k,0) = k or E(0,k) = k
// E(i,j) : defined as edit distance to make all values from x[1...i] to look like y[1...j] or vice-versa
// E(i,j) = mininum {
// (1+ E(i-1,j)),
// (1+ E(i, j-1)),
// ( s + E(i-1,j-1) ) if x(i) == y(j) s =0: s=1
// }
public static int computeEditDistance(String s1, String s2)
{
int n = s1.length() + 1;
int m = s2.length() + 1;
int [][] e = new int[n][m];
for(int i=0; i<n; i++)
{
e[i][0] = i;
}
for(int j=0; j<m; j++)
{
e[0][j]=j;
}
for( int i=1; i<n; i++){
for(int j=1; j<m; j++)
{
e[i][j] = computeMin( 1 + e[i-1][j] , 1 + e[i][j-1],
e[i-1][j-1] + (s1.charAt(i-1) == s2.charAt(j-1)? 0:1) );
}
}
return e[n-1][m-1];
}
public static int computeMin(int a, int b, int c)
{
return Math.min(Math.min(a,b),c);
}
public static void main(String[] args)
{
// Test cases1
String s1 = "cats";
String s2 = "cat";
int result = computeEditDistance(s1,s2);
System.out.println("Edit Distance:" + result);
// Test Case2
s1 = "cats";
s2 = "cats";
result = computeEditDistance(s1,s2);
System.out.println("Edit Distance:" + result);
//Test Case3
s1 = "cat";
s2 = "networks";
result = computeEditDistance(s1,s2);
System.out.println("Edit Distance:" + result);
}
}
import java.util.*;
import java.lang.Math;
public class EditDistance
{
// This problem is also called the Levenshtein distance problem
// To edit the strings, basic operations needed are < 1. Insert; 2. Delete 3. Replace >
// Lets define E(x[1...i], y[1...j]) as the matrix which keeps track of the
// number of opertation that needs to be performed to make the change
// Base cases:
// 1. E(0,0) = 0 ( nothing changed)
// 2. E(1,0) = 1 ( replace or delete a character in x, nothing changes in y)
// So, if the same operating is done for 'k' indexes, edit distance would be 'k'. E(k,0) = k or E(0,k) = k
// E(i,j) : defined as edit distance to make all values from x[1...i] to look like y[1...j] or vice-versa
// E(i,j) = mininum {
// (1+ E(i-1,j)),
// (1+ E(i, j-1)),
// ( s + E(i-1,j-1) ) if x(i) == y(j) s =0: s=1
// }
public static int computeEditDistance(String s1, String s2)
{
int n = s1.length() + 1;
int m = s2.length() + 1;
int [][] e = new int[n][m];
for(int i=0; i<n; i++)
{
e[i][0] = i;
}
for(int j=0; j<m; j++)
{
e[0][j]=j;
}
for( int i=1; i<n; i++){
for(int j=1; j<m; j++)
{
e[i][j] = computeMin( 1 + e[i-1][j] , 1 + e[i][j-1],
e[i-1][j-1] + (s1.charAt(i-1) == s2.charAt(j-1)? 0:1) );
}
}
return e[n-1][m-1];
}
public static int computeMin(int a, int b, int c)
{
return Math.min(Math.min(a,b),c);
}
public static void main(String[] args)
{
// Test cases1
String s1 = "cats";
String s2 = "cat";
int result = computeEditDistance(s1,s2);
System.out.println("Edit Distance:" + result);
// Test Case2
s1 = "cats";
s2 = "cats";
result = computeEditDistance(s1,s2);
System.out.println("Edit Distance:" + result);
//Test Case3
s1 = "cat";
s2 = "networks";
result = computeEditDistance(s1,s2);
System.out.println("Edit Distance:" + result);
}
}
Might not handle all cases basic ones are fine.
public int GetEditDistance(string s1, string s2)
{
int l1 = s1.Length;
int l2 = s2.Length;
int l = l1 > l2 ? l2 : l1;
int ed = 0;
for (int i = 0, i1=0, i2=0; i < l; i++)
{
if (s1[i1] == s2[i2])
{
i1++; i2++;
}
else
{
ed++;
if (l - i1 > 1 && l - i2 > 1)
{
if (s1[i1] == s2[i2 + 1]) i2++;
else if (s1[i1 + 1] == s2[i2]) i1++;
else { i1++; i2++; }
}
}
}
return ed;
}
I'd like to solve the problem like this. O(N) algorithm. This algorithm is using the fact that if one edit is made, then the remaining string should be same.
#include <cstring>
#include <cassert>
using namespace std;
bool one_edit_apart(const char* s, const char* t) {
auto n = strlen(s), m = strlen(t);
m > n ? swap(n, m), swap(s, t), 0 : 0;
while (*t)
if (*s++ != *t++)
return n == m ? strcmp(s, t) == 0 : strcmp(s, t-1) == 0;
return n - m == 1; // to check a case such as "xyz" and "xy"
};
#include <iostream>
#include <vector>
using namespace std;
int EditDistance(const string &s1, const string &s2){
vector<vector<int> > dp(s1.size() + 1, vector<int>(s2.size() + 1));
for(int i = 0; i < dp[i].size(); ++i) {
dp[0][i] = i;
}
for(int i = 0; i < dp.size(); ++i) {
dp[i][0] = i;
}
for(int i = 1; i < dp.size(); ++i){
for(int j = 1; j < dp[i].size(); ++j){
int del = dp[i][j - 1] + 1;
int ins = dp[i - 1][j] + 1;
int rpl = dp[i - 1][j - 1] + (s1[i - 1] != s2[j - 1]);
dp[i][j] = min(del, ins);
dp[i][j] = min(dp[i][j], rpl);
}
}
return dp[dp.size() - 1][dp[0].size() - 1];
}
int main() {
cout << boolalpha << (EditDistance("abc", "afgr") == 1);
return 0;
}
Please rply if it failing in any scenario:
#include <iostream>
#include <vector>
#include <math.h>
using namespace std;
bool EditDistance(const string &s1, const string &s2){
int diff = s1.size() - s2.size();
if(abs(diff) > 1)
return false;
int i = 0, j = 0, count = 0;
while((i < s1.size()) && (j < s2.size())) {
if(s1[i] == s2[j]) {
i++, j++;
}
else if(s1.size() == s2.size()) {
i++, j++, count++;
}
else if(s1.size() > s2.size()) {
i++, count++;
}
else {
j++, count++;
}
if(count > 1)
return false;
}
count += s1.size() - i + s2.size() - j;
return (count == 1);
}
int main() {
cout << boolalpha << EditDistance("d", "a");
return 0;
}
Here is the method
A, B are two input strings
mismatchCount = 0
if A.Length = B.Length
for i = 1 to A.Length
if A[i] <> B[i]
mismatchCount ++;
end
if mismatchCount = 1 return true else false;
else if |A.Length - B.Length| = 1
if A.Length is bigger
CheckSame = true
for i = 1 to A.Length-1
if CheckSame and A[i] <> B[i] and A[i+1] = B[i]
CheckSame = false;
else if (Not of CheckSame and A[i+1] = B[i] ) or (CheckSame and A[i] = B[i])
do nothing
else
return false;
end
end
else
do the similar thing with B
else return false
Complexity :
Time : O(n)
Space : O(1)
public static bool IsDistanceOne(string a, string b)
{
if (Math.Abs(a.Length - b.Length) != 1)
return false;
int i = -1;
int j = -1;
while (true)
{
i++;
j++;
if (i == a.Length)
return true;
if (a[i] == b[j])
continue;
if (i != j)
return false;
if (a.Length < b.Length)
i--;
else
j--;
}
}
- maverick November 28, 2014