Amazon Interview Question
Software Engineer / Developersyou cannot do it in O(1) space with O(N) time if there are more than one duplicated elements in the array. If there was one, i would just sum it up and get the difference from arithmetic series of 1..N
/* The question is find and delete duplicates in an array of size N with a max value N in time complexity O(N), and space O(1)
The time complexity of this program is O(2n + n +n ) = O(4n) i.e. O(n). Space complexity = O(1).
Assuming the input array can store an element of value '2n'.
From the problem we know that the elements range 1 to n so we place elements in their corresponding positions.
Pseudo code:
for each element a[i]
if(a[i]==i+1 || a[i]>n)
continue //i.e. either inplace or duplicate
else
while(true)
if(a[i]>n) //i.e. a duplicate that is already detected
break
if(a[a[i]-1]==a[i])
a[i]+=n //i.e. a duplicate is detected
break
swap a[i] and a[a[i]-1]
if(a[a[i]-1]==a[i]) //i.e. swapped inplace
break
Complexity analysis:
Consider for an element a[i], k elements are swapped. i.e. k elements are inplace.
which means each of these k elements require atmost 1 comparison i.e. O(k) and the max value k
can take is n-1 => O(n-1 + n) == O(2n) ==> O(n)
For example, element a[0], the worst case would be, doing a maximum swaps of (n-1) which
means that n-1 elements are in place. so for each of these n-1 elements we only do one comparison i.e. a[i]==i+1
Therefore, atmost 2n comparisons are made
Once we place all elements in place, all elements greater than n are duplicates,
move duplicate elements to the end, so parse once again and swap duplicates to the end.
=> Complexity = O(2n + n + n) = O(4n) ==> O(n)
Test case:
1 5 3 8 4 3 8 5 2 // i=0,
1 4 3 8 5 3 8 5 2 // i=1 swap a[1] and a[a[1]-1] i.e. swap '5' and '4'
1 8 3 4 5 3 8 5 2 // i=1 swap a[1] and a[3] i.e. swap '4' and '8',
1 5 3 4 5 3 8 8 2 // i=1 swap a[1] and a[6] i.e. swap '8' and '5',
1 14 3 4 5 3 8 8 2 // i=1 Since, a[5-1]==5 && a[1]==5, a[i]+=n => 14
1 14 3 4 5 3 8 8 2 // i=2, since a[2] = 2+1 do nothing
1 14 3 4 5 3 8 8 2 // i=3, since a[3] = 3+1 do nothing
1 14 3 4 5 3 8 8 2 // i=4, since a[4] = 4+1 do nothing
1 14 3 4 5 12 8 8 2 // i=5 Since, a[3-1]==a[i], a[i]=n+a[i] => 12
1 14 3 4 5 12 17 8 2 // i=6 Since, a[8-1]==a[i], a[i]=n+a[i] => 17
1 14 3 4 5 12 17 8 2 // i=7 Since, a[7]==8; //inplace so continue
1 2 3 4 5 12 17 8 14 // i=8 swap a[1] and a[8] i.e. swap '14' and '2'
//All duplicates are greater than n with a value a[i]-n i.e. 12,17,14 i.e. 3,8,5
*/
#include <iostream>
using namespace std;
//!Function to swap values, pass by reference
void swap(int *x,int *y){
*x = *y + *x;
*y = *x - *y;
*x = *x - *y;
cout<<"swapping '"<<*x<<"' and '"<<*y;
}
int main(){
int a[] = {3,3,3,3,3,2,2,2,2,2,2,3,3,3,3,3,2,2,2,2,2,2,4};
int n = sizeof(a)/sizeof(int);
int numComparisons = 0;
for (int i=0;i<n;++i){
++numComparisons;
if(a[i]==i+1 || a[i]>n)
continue; //i.e. either inplace or duplicate
else{
while(i<n){
++numComparisons;
if(a[i]>n) //i.e. a duplicate that is already detected
break;
int u=a[a[i]-1];
if (u==a[i])
{
a[i]+=n; //i.e. a new duplicate is detected
break;
}
cout<<endl<<"swapping a["<<i<<"] and a["<<(a[i]-1)<<"] i.e. ";swap(&a[i],&a[a[i]-1]);
if(a[a[i]-1]==a[i])
break;
}
}
}
//Move duplicates to the end, though there are two while loops,
//they break on a same condition i.e. when both pivots meet and
//pivots are ALWAYS MOVING INWARDS so the maximum number of iterations for these while loops together is N
int endPivot = sizeof(a)/sizeof(int) -1;
int beginPivot = 0;
while(beginPivot<=endPivot){
++numComparisons;
if(a[beginPivot]>n){
while(beginPivot<=endPivot){
++numComparisons;
if(a[endPivot]<n){
swap(&a[beginPivot],&a[endPivot]);
break;
}
--endPivot;
}
}
++beginPivot;
}
//Final array
cout<<endl<<"Final Array : ";
for (int k=0;k<n;++k){
cout<<a[k]<<",";
}
//Print duplicates
cout<<endl<<"Duplicates are :: ";
for (int i=beginPivot;i<n;++i){
++numComparisons;
if(a[i]>n)
cout<<a[i]-n<<",";
}
cout<<endl<<"Number of comparisons"<<numComparisons<<endl;
}
The pseudocode and example explains it.
The logic is:
Since we know there are elements from 1 to N, if no duplicates then it should be exactly 1 to N. So if we place every element as we read into its correct place i.e. a[0]=1,a[1]=2,a[2]=3,a[3]=4,a[4]=5......etc then we should have 1 to N.
And consider, if we found an element that is already in place then we ignore it. And if we found an element that we want to move to its correct position but it already exists in the correct position then it's a duplicate. And we need something to mark this as duplicate so I increase the value by N so any value x greater than N are duplicates with value x-N.
So in the example we have 9 elements => n=9 so we have values from 1 to 9 from the problem statement.
first we start at a[0] and we find that a[0]==1 i.e. 1 is in its place.
next element a[1] which is 5, this has to be at position 4, check a[4]==5 or not, its not. so we swap a[1] with a[4].
In while loop, we check if the swapped element is in place i.e. a[1]==2? its not, so we should move a[1] i.e. '4' to its correct position i.e. a[3]. is a[3]==4?? its not, so swap a[1] and a[3] i.e. swap '4' and '8'.
again in while loop, we check if a[1]==2? its not, so we swap '8' and '5' the same way as above.
again in while loop,check if a[1]==2? its not, so we should move a[1] to the correct position i.e. a[a[1]-1] => a[5-1] => a[4], BUT a[4] is already == 5 so this is a duplicate, we mark it by increasing the value by n i.e. 9+5 =14. break out of while loop
next element a[2], a[2]==3 , so it is in place, do nothing.
next element a[3], a[3]==4 , so it is in place, do nothing....so on
first of all you are just detecting duplicates, the question says remove duplicates. You should be reading the question twice before posting intelligent answers
the problem says numbers are in range of 1...N, but not necessary contain all number from 1 to N. Without this strong assumption, your method will be incorrect.
The algorithm works to find duplicates between 1..N, the algorithm is built for the problem.
A solution is written for a problem, if a problem says "build an algorithm", then there is no problem statement at all. I hope people who make comments think once before they talk.
You replied with some non-sense, but without answering papaya's question. You made wrong assumption, pointed out by others, but you didn't try to fix or explain, rather, you just insist you were correct. How could such a thing happen on the earth??
papaya says, "the problem says numbers are in range of 1...N, but not necessary contain all number from 1 to N."
Yes, I know that. I am not assuming that.
what am saying is if there are numbers between 1 to N in an array and complexity is O(N) that means the size of array cannot be greater than N and no element is greater than N, right?
Now, if no duplicates exist then the array has to be of size N right?
Now if duplicates exist then after moving the elements to their correct positions you should be left with duplicates. If you try the following array
{3,3,3,3,3,2,2,2,2,2,2,4};
The algorithm will give,
Final Array :15,2,3,4,15,15,14,14,14,14,14,15,
Duplicates are :: 3,3,3,2,2,2,2,2,3,
Correct me if am wrong.
@extinct: I was just lazy to write the deletion code, but anyways, added the code for moving duplicates to the end of the array, now, you can delete or truncate or .... anything.
again the complexity remains the same O(n) or O(4n), space complexity O(1).
@chenna
it's not O(1) in space as you assume the array can be indexed upto N. Extra space you need N - n(number of elements).
space complexity O(N).
@MN space complexity is the additional memory required. As chennas algo uses the input array and not an additional array i.e the algo is inplace. You should think before you post.
that's what I wrote above... if your array has only n (<N) elements ...why would it have more space allocated? you should perhaps read my comment carefully
buddy..you are not getting the point. the point is not about n or N. The point is whatever may be the size of array, the algorithm doesn't use any extra space.
For example: consider n = n1, if the program takes n1+k bytes of memory. then for n = n2, the program takes n2+k bytes of memory, the algorithm takes k bytes of memory irrespective of value of n. so the space complexity = O(k) ==> O(1). I hope this helps
ok... let's work through one example here.
suppose R = {1 to 10^6} and Arr = [1,2,10,100,10^4, 10^6,10^6,10^4}.
you get n=8 and N = 10^6
now because of the fact you use "if(a[a[i]-1]==a[i])" you need Arr to be accessible upto N.
you are right you always need constant space but that could be at most N. I don't think this would qualify for a O(1) solution.
Edit : yeah, I guess I realize the merit in your argument now. It's O(1) as the space required is constant N. It does uses extra memory, so it is still not inplace but that's not what we are looking for.
Correct me if I am wrong?
I guess I was working under the assumption that N<=n because the problem doesn't make sense if N>n consider the following
N=5 and array = 1 1 1 2 1 4 3 1...... 10000 elements
Now as per the question find an algorithm O(5) for an array O(10000)
So I deduced that N<=n. Thanks
so The question is find and delete duplicates in an array of size N with a max value N in time complexity O(N), and space O(1).
I wrote the right solution, there's no 'n'. There's only N which is the size of the array and max of the array.
damn it.. someone got me confused somewhere
I think the problem stmt intends to convey that the solution should have linear time complexity. So if the length of the array is m ,m could be >> N (a relatively small number)
Let's define this frequency array F[1...N] that would hold the frequencies of the element k in the original array A. So e.g. if N = 5, and the original array is
[2,1,3,1,1,1,3,5,2,4], the frequency array F would be [4,2,2,1,1]. Observe that regardless of the size of the original array, the size of F would always be the same. So F would actually occupy constant space for any value of m.
The algo now gets very simple. A single pass over the original array A
initialize the elements of F to zero
for (i=1...N){
if( F[a[i]] == 0){
F[a[i]]++;
}else{ // if we have already encountered the *value* a[i] in an earlier iteration
a[i]=NULL;
}
}
Run the above algo on [2,1,3,1,1,1,3,5,2,4]. At the end of the pass,the array become
[2,1,3,null,null,null,null,5,null,4], which is what's the goal ..enjoy !!
Dude, space complexity would indeed be O(1). Even if the size of F is N, it's constant (because N is a finite number)and is independent of the size (m) of the original array.
In fact, I already explained the constancy of the additional space used in my original post.Put some thought and read through the post before shooting a comment
In fact, I already explained the constancy of the additional space used in my original post.Put some thought and read through the post before shooting a comment
I don't know what is the problem if the array size is >N.
If size of array is m
for(i =1 to m){
a[a[i]]=a[i];
}
when size of array is smaller than N than
for(i=1 to m){
a[a[i]%m]=a[i];
}
<pre lang="java" line="1" title="CodeMonkey50426" class="run-this"> int countVector = 0;
int length = 0;
for(int i = 0 ; i< 10 ; i++)
{
if((countVector & (1<<a[i])) != 0)
continue;
else
{
a[length++] = a[i];
countVector = countVector | 1<<a[i];
}
}
for(int i = 0 ; i<length ; i++)
cout<<a[i];
cout<<endl;</pre><pre title="CodeMonkey50426" input="yes">
</pre>
#include <stdio.h>
int main()
{
int a[]={1,2,2,3,3,3,4,5,6,6,9};
int counter, lastIndex;
int length = sizeof(a)/sizeof(int);
// Set 0 for duplicates
for(counter=1,lastIndex=0;counter < length;counter++)
{
if(a[counter] == a[lastIndex])
{
a[counter] = 0;
continue;
}
lastIndex=counter;
}
// Print the whole array
for(counter=0,lastIndex=0;counter<length;counter++)
printf("%d ", a[counter]);
printf("\n");
// Move the 0(s) to end of the list.
for(counter=1,lastIndex=0;counter<length;counter++)
{
printf("lastIndex=%d counter=%d\n",lastIndex,counter);
if(a[counter] == 0)
{
//Set the lastIndex for the firsttime.
if(lastIndex == 0) lastIndex = counter;
continue;
}
if(lastIndex == 0) continue;
a[lastIndex]=a[counter];
a[counter]=0;
//For the first time lastIndex will be set above..
lastIndex = lastIndex+1;
}
for(counter=0,lastIndex=0;counter<length;counter++)
printf("%d ", a[counter]);
getchar();
return 0;
}
void removeDups (int a[], int n)
{
for (int i = 0; i < n; i++)
{
if (a[i] != i)
{
if (a[a[i]] != a[i])
swap (a[i], a[a[i]]);
}
}
for (int i = 0, j = 0; j < n; i++, j++)
{
while (a[j] != j)
j++;
a[i] = a[j];
}
a[i] = '\0';
}
package org.com.array;
// Removing Duplicate item from the array
public class P3
{
static void duplicate(int a[])
{
int t= a.length;
int index=0;
for(int i=0;i<t;i++)
{
for(int j=i+1;j<t;j++)
if(a[i]==a[j])
index=i;
}
for(int i=index;i<t-1;i++)
{
a[i]=a[i+1];
}
t--;
for(int i=0;i<t;i++)
System.out.print(a[i] + " ");
}
public static void main(String[] args)
{
int a [] ={1,2,3,4,2,6,7,3,9,10,10};
duplicate(a);
}
}
package org.com.array;
// Removing Duplicate item from the array
public class P3
{
static void duplicate(int a[])
{int t= a.length;
int index=0;
for(int i=0;i<t;i++)
{ for(int j=i+1;j<t;j++)
if(a[i]==a[j])
index=i;}
for(int i=index;i<t-1;i++)
{a[i]=a[i+1];}
t--;
for(int i=0;i<t;i++)
System.out.print(a[i] + " ");}
public static void main(String[] args)
{int a [] ={1,2,3,4,2,6,7,3,9,10,10};
duplicate(a);}}
package org.com.array;
// Removing Duplicate item from the array
public class P3
{
static void duplicate(int a[])
{int t= a.length;
int index=0;
for(int i=0;i<t;i++)
{ for(int j=i+1;j<t;j++)
if(a[i]==a[j])
index=i;}
for(int i=index;i<t-1;i++)
{a[i]=a[i+1];}
t--;
for(int i=0;i<t;i++)
System.out.print(a[i] + " ");}
public static void main(String[] args)
{int a [] ={1,2,3,4,2,6,7,3,9,10,10};
duplicate(a);}}
We will traverse array and will make a[abs(a[i])] negative if its +ve.. if a[a[i]] is already negative .. means no is repeated we will replace it with zero and so on .. o(n) ...
- Ashish December 03, 2010in next traversala we will remove all the zeros from the array and will make all -ve values +ve .. o(n)
o(n)+o(n) =o(n)