Interra System Interview Question
Software Engineer / DevelopersCountry: India
Interview Type: In-Person
Actually I think the complexity is O(N * log log N) since the average number of divisors of number up to N is log log N (proved by Ramanujan)
If this is the case, that means that average time complexity is O(N*log log N). I think worst case is still o(N^(1+epsilon)).
In fact, average time complexity for the algorithm is even better than O(N* log log N), since inner loop terminates earlier in most of the cases for average input.
This can be done in O(n).
First, create a suffix tree out of the string. Now the problem is reduced to examining the longest branch of the suffix tree. Specifically, we only need to look at the last and second-to-last nodes of the longest branch, because the substring which represents the edge between those two nodes is the only possible option for the repeating substring. This is because if the whole string truly is just a repeated substring, then the longest branch in the suffix tree will branch off (not necessarily only) at equidistant intervals: one branch to continue the longest branch and one branch for the terminating character, eg. $.
Building a suffix tree is O(n).
Now we can simply take that substring we just found and check if the string is really composed of only that string. This is obviously O(n).
The final complexity is this O(2*n) = O(n).
I agree that building a Suffix tree is O(n). But I've never been able to wrap my head around Ukkonen's algorithm to do that. Can you write code to do so within a 45 minute interview?
I could probably get some skeleton code done, but I seriously doubt I'd be able to do a properly working O(n) suffix tree construction in 45mins, at least while talking through it. But then again, what kind of an interviewer would pose that kind of a question unless hiring specifically for some string algorithm role.
Your solution is really nice in that it could be realistically done in an interview, though.
Nice idea. +1
So far this is the only O(N) scheme which works.
Let's try to brainstorm so we can simplify it so it can be done on a whiteboard in less than 30 minutes.
Here's my 5 cents:
In fact we don't need full implementation of Ukkonen's algo for this problem. First of all it is not necessary to execute the last step of appending end-of-string char at the end. Also the final answer can be provided while processing of the last letter of the input string, without need of completion of the suffix tree construction.
Found an interesting property of application of Ukkonen's algorithm on this problem. Once all characters from the repeating pattern are added to the tree, the algorithm enters steady state and doesn't change the state until it hits first character which doesn't match the pattern. Such first character causes aggressive splitting. If there a repetition of a (larger) pattern algorithm goes into steady state again. The answer is 'true' if last letter added to the tree doesn't change the state.
abababxabababxabababy
^ - adding a
^ - adding b
^^^^ - nothing happens (state isn't changed)
^ - adding x causes a lot of splitting in the tree
^^^^^.... nothing happens until y is hit
this is classical problem on prefix function
private boolean isPattern(char []a){
int n = a.length;
if (n == 0) return true;
int []p = new int[n];
for(int i = 1; i < n; ++i){
int j = p[i-1];
while(j > 0 && a[j] != a[i]) j = p[j-1];
if (a[j] == a[i]) ++j;
p[i] = j;
}
// System.out.println(Arrays.toString(p));
int k = n - p[n-1];
return p[n-1] > 0 && n % k == 0;
}
This is yet another incorrect implementation...
It returns bunch of false positives. Counter-examples:
a
ab
abc
abcd
I don't think K can be <=0 at the end, so your additional check is no-op.
So it still fails on the same set of counter-examples. Before posting your next attempt, could you please verify whether it actually works?
sorry, 0xF4 i meant
return p[n-1] > 0 && n % k == 0;
here is full working code
private boolean isPattern(char []a){
int n = a.length;
int []p = new int[n];
for(int i = 1; i < n; ++i){
int j = p[i-1];
while(j > 0 && a[j] != a[i]) j = p[j-1];
if (a[j] == a[i]) ++j;
p[i] = j;
}
// System.out.println(Arrays.toString(p));
int k = n - p[n-1];
return p[n-1] > 0 && n % k == 0;
}
Thanks! This completely changes the picture. Without this condition I wasn't even able to understand your intent. Could you please edit your original post with the solution updating return expression and add "if (n==0) return true" to make it 100% perfect.
Let's vote this up, this O(N) solution deserves to be on the top.
I am going to make the java Regex do the work for me:
public static void main(String[] args) {
String pattern = "xyz";
String text = "xyzxyzxyz";
String [] delimitingStrs = text.split("("+pattern+")+");
if(delimitingStrs.length == 0)
System.out.println("Contains the pattern");
else
System.out.println("Does not contain the pattern");
}
/* package whatever; // don't place package name! */
import java.util.*;
import java.lang.*;
import java.io.*;
/* Name of the class has to be "Main" only if the class is public. */
class Ideone
{
public static boolean patternMatch(char[] s,char[] p)
{
boolean isMatched = false;
for(int i=0;i<s.length;)
{
System.out.print(" "+i);
for(int j=0;j<p.length;j++)
{
//System.out.print(" "+s[i] );
if(i< s.length && s[i] == p[j])
{
isMatched = true;
}
else
{
isMatched = false;
}
i++;
}
//i = i+1;
}
return isMatched;
}
public static void main (String[] args) throws java.lang.Exception
{
// your code goes here
System.out.println(" isMatched : "+patternMatch("xyzxyzxyz".toCharArray(),"xyz".toCharArray()));
System.out.println(" isMatched : "+patternMatch("xyzxyzxyzA".toCharArray(),"xyz".toCharArray()));
}
}
How about this?
bool checkPattern (const string& str) {
bool is_find = false;
string pattern1 = ""
string pattern2 = "";
int i = 0;
int j = str.size() - 1;
while (i >= j) {
pattern1 += str[i];
pattern2 = str[j] + pattern2;
if (pattern1 == pattern2) {
is_find = true;
break;
}
}
if (is_find) {
if (str.size() % pattern1.size() != 0) {
return false;
} else {
for (int i=0; i<str.size(); i+=pattern1.size()) {
if (pattern1 != str.substr(i,pattern1.size())) {
return false;
}
}
return true;
}
}
}
O(n) algorithm:
the idea is to increment the possible pattern length by 1 and check if you can arrive to the end of the string using that pattern, otherwise you increment patternlength by 1 until patternlength<=s.length/2
- Use two pointers i and j
- Initialize i=0, PatternLength=1, j=i+PatternLength
while PatternLength<=s.length/2 && j<s.length
if(s.charAt(i)==s.charAt(j))
increment i to i=(i+1)%s.length and j to j++
else
increment patternlength
start checking from patternlength+1
then count the number of repetitions of the pattern in the string by doing j/patternlength,
if patternrepetitions*patternlength==s.length the string is following the pattern.
here is a step by step description of the algorithm for the string "xyzxyzxyz"
Strng s = "xyzxyzxyz";
Step 1:
i=0 j=1
s[i]=x,s[j]=y
PatternLength: 1 PatternRepetitions: 0
Step 2:
i=0 j=2
s[i]=x,s[j]=z
PatternLength: 2 PatternRepetitions: 0
Step 3:
i=0 j=3
s[i]=x,s[j]=x
PatternLength: 3 PatternRepetitions: 0
Step 4:
i=1 j=4
s[i]=y,s[j]=y
PatternLength: 3 PatternRepetitions: 1
Step 5:
i=2 j=5
s[i]=z,s[j]=z
PatternLength: 3 PatternRepetitions: 1
Step 6:
i=0 j=6
s[i]=x,s[j]=x
PatternLength: 3 PatternRepetitions: 2
Step 7:
i=1 j=7
s[i]=y,s[j]=y
PatternLength: 3 PatternRepetitions: 2
Step 8:
i=2 j=8
s[i]=z,s[j]=z
PatternLength: 3 PatternRepetitions: 2
Final Step:
String: xyzxyzxyz
Pattern: xyz
PatternLength: 3 PatternRepetitions: 3
9 steps to check for a string of length 9
This is the code in java to implement this algorithm:
public static boolean followPattern(String s) {
if(s==null || s.length()==0) return false;
int i = 0;
int j = 1;
int patternlength = 1;
int repetitions = 0;
int steps = 1;
while(patternlength<=s.length()/2 && j<s.length()) {
if(s.charAt(i)==s.charAt(j)) {
j++;
i=(i+1)%patternlength;
repetitions=j/patternlength;
}
else {
i=0;
patternlength++;
j=i+patternlength;
repetitions=0;
}
steps++;
}
if(patternlength*repetitions==s.length()) {
//System.out.println("String: "+s+"\nPattern: "+s.substring(0,patternlength)+"\nPatternLength: "+patternlength+"\nRepetitions: "+repetitions);
return true;
}
else {
return false;
}
}
Good try, good description, clean code - but it is O(N^2).
Essentially, you are having 2 nested loops carefully hidden inside one while. It took me a while to discover this :)
Think what happens on inputs like "aaaaaaaaa....aaaaaaaaaaz"
#include<stdio.h>
int check_pattern(char *str)
{
int i=0, j=0;
while(*str)
{
if(str[i] != str[j+1])
j++;
else
{
i++;
j++;
if(str[i] == '\0')
return 1;
}
str++;
}
return 0;
}
int main()
{
char str[20];
int res;
printf("Enter the pattern\n");
scanf("%s",str);
res=check_pattern(str);
if(res == 1 )
printf("pattern match\n");
else
printf("unknown pattern\n");
return 0;
}
~
~
~
import java.util.Scanner;
public class Pattern {
public static void main(String args[])
{
Scanner sc = new Scanner(System.in);
System.out.println("Enter a String: ");
String str = sc.nextLine();
boolean status = checkPattern(str);
System.out.println("Given Pattern Status is "+status);
}
static boolean checkPattern(String str)
{
boolean status = false;
char st[] = str.toCharArray();
int length = str.length();
for(int i=0;i<length;i=i+3)
{
if((st[i]=='x')&&(st[i+1]=='y')&&(st[i+2]=='z'))
{
status = true;
continue;
}
else
{
status = false;
break;
}
}
return status;
}
}
This can be done in O(n) complexity using a process similar to calculation of T for the KMP algorithm. If there is a pattern then there should exist a run of ascending values at the end of T with a length at least half the total length of the string. This has the added benefit of not requiring perfectly formed sets of the pattern (ie: 'xyzxyzx' would be positive since it's a repetition of 'xyz').
This is a sample of what I would expect as internal calculations:
Str: x y z x y z x y z x
t: -1 -1 -1 0 1 2 3 4 5 6
true since t ends as >= Str.length() / 2
Str: x y z d x y z x y z x
t: -1 -1 -1 -1 0 1 2 0 1 2 0
false
public static boolean hasPattern(String str){
//build T
if(str == null){
throw new NullPointerException();
}
if(str.length() < 2){
return false;
}
int t = -1;
int i1 = 0;
int i2 = 1;
while(i2 < str.length()){
if(str.charAt(i1) == str.charAt(i2)){
t++;
i1++;
}
else{
t = -1;
i1 = 0;
}
i2++;
}
//verify correct length
return (t + 1) >= (str.length() >>> 1);
}
I like the direction of your thinking. I also like your extension, where last part of the string can be the prefix of the pattern. This doesn't contradict problem definition, it makes sense, and, in fact, makes problem more interesting completely killing my o(N^(1+e)) implementation above, since the % trick can't be applied anymore.
I wonder whether this can be evolved into something which really works.
For now, let me prove that you algorithm (as written) is wrong by finding a long repeating pattern where t at the end is less than half-length. Let me do even simpler thing - let's find long string with repeating pattern where t = 0 at the end.
This can be done by "executing" your algorithm in reverse order and reconstructing the input, given t = 0 at the end. If t = 0 then it must be -1 at the previous step and so on... Since we assumed that there is a repeating pattern and t starts with -1, the t's sequence must also follow repeating cycles. Something like -1 -1 0 -1 -1 0 ... -1 -1 0
This yields input strings like
abaaba
abaabaaba
aba...abaaba
Those are counter-examples for your algorithm. I don't see an immediate way to fix it. Do you?
> try string: abacabacabaca
What's wrong with it?
It returns 'true' as designed in zortlord's interpretation of the problem.
pattern is 'abac' last 'a' follows the pattern since it is prefix of the pattern.
@0xF4 I don't see how your counter-examples work. If we have "abaaba" then we get t = {-1,-1,0,0,1,2} and for "abaabaaba" we get t={-1,-1,0,0,1,2,3,4,5,6} and in both cases the last value of t is larger or equal to half of the string.
Really? Let's try this.
Try running your program with input "abaaba" and it will return 'false':
http : //goo.gl/9AHINQ
Press Compile and Run. If you add printing your 't' - then you'll see the -1-1/0 sequence which I predicted.
BTW, I realized what is your intent, and I think I found the problem. Could you please try replacing the body of your loop with something like this:
if (str[i1] == str[i2]){
t++;
i1++;
}
else {
if (str[0] == str[i2]) {
t = 0;
i1 = 1;
}
else {
t = -1;
i1 = 0;
}
}
i2++;
and let us know if it works as expected on edge cases?
If yes, that means we finally have simple and clean O(N) time / O(1) space solution!
O(m * n)
public boolean checkPattern (String s, String p){
if (s.length() % p.length() != 0)
return false ;
for (int i = 0 ; s.length() - i >= p.length() ; i += p.length()) {
int j = 0;
while (j < p.length() && s.charAt(i + j) == p.charAt(j)) j++;
if (j != p.length()) return false ;
}
return true ;
}
Here's algorithm with worst case o(N^(1+epsilon)) time complexity and O(1) space, where epsilon is arbitrary small constant larger than zero. So algorithm is "almost linear".
The inner loop complexity is O(N) however inner loop runs only d(N) times where d(N) is well-known "divisor function" - number of divisors of N. d(n) = o(N^epsilon), so overall time complexity is o(N*N^epsilon)=o(N^(1+epsilon)).
- 0xF4 January 03, 2015