Bloomberg LP Interview Question
Financial Software DevelopersCountry: United States
Interview Type: Phone Interview
good answer, but what do you mean by changing the address space ? memory is cached by physical addresses (after translation) not virtual, so it won't have any impact on
performance (unless the pages of a process being switched do not reside in system memory)
anyway, on a context switch, register set of a suspended thread must be saved and new register set is to be loaded.
so i think it depends on the problem you wish to solve: e.g. if threads need to communicate a lot, its better to use a single process for that (since IPC is much harder to use). On the other hand, with multiple threads in one process, you get all these nasty things like racing conditions, synchronization which
do not occur with multiple processes
@ kiran : u mention context switch ... well , why process want to context switch if each one have a dedicated processor.
@? : Apart from the register set being restored, every process has a text(code), stack and data area. Each area consists of regions. Every time a context switch occurs, the stack, data and text area needs to be changed. If a particular region of an area is unavailable, then it needs to be loaded in the memory. But, in the case of multiple threads of a single process, all the threads share the same text, stack and data area.
@rukawa : Nothing was mentioned about the processors in the question. Normally the number of processes in a machine would be in hundreds and it isn't a wise idea to have hundreds of processors. So, context switching takes place.
@kiran, well you are right that, on a context switch, some memory regions of a new process might need be
preloaded (if they are not in the memory yet).
However, these regions are not preloaded all at once, when a context switch occurs, but "on demand only", ie. only when processor stumbles upon a memory instruction which refers to memory location which is unavailable yet.
Furthermore, if it's critical for performance, you can also ask the system to allocate some amount of "page-locked" mem (which cannot be paged out).
For example, kernel level code / drivers etc. reside in page-locked memory
For threads from the same process, it can also happen that, during a context switch,
some pages need to be taken from the external memory. Since, even though these threads share the same text/stack/code
regions as you pointed out, they may be accessing different pages at the same time.
@? : Yes. The regions would be loaded on demand.
Apart from these, when a context switch occurs, the virtual address translation should be done to locate the u-area as well as pregion table of the process to be executed from the process control block. So, these things cause an extra overhead for the process when compared to threads.
I think the choice depends upon whether the threads shared data & code or not. If the threads are independent, then multiple processes with single thread would be better.
However,if the threads are inter-related, i would prefer process with multiple threads.
Also, as pointed by rukawa, if the threads are independent & one process with multiple threads are used, then one thread can make the entire process go down.
You can't work around having to use multiple processes if you need to write seriously scalable code, for example over a cluster. Processes can use IPC mechanisms including those that map to a tcp-port, using for example virtual transactional memory to send job results to a collector node/server. Of course these processes will probably have threads so you may combine the two worlds to create something really efficient in a divide&conquer scenario.
At the end of the day, it really depends on what exactly is the problem to solve. There is no single recipe in terms of "this is always the best and that is always the poorest choice" - if that was the case, computer science courses would sport much fewer topics than they normally do i.e. if balanced trees were always and unconditionally the best choice for everything, then why bothering learning linked lists, and so forth. Does this make sense?
advantages of multiple threads in a single process:
1. context switching is fast as only PC, SP and registers are per thread.
2. inter-thread communication is faster using shared memory as threads share the same address space.
advantages of using multiple processes:
1. separate address space, hence one process can't tamper another process's data; more security
2. if one process crashes, all other processes are not affected
(the question of multiple threads on multiple processors is a subset of the points already mentioned)
multi-processes each with single thread.reasons:
1. each process will have dedicated CPU share. Its parallel processing , whereas 1 process with multiple threads is just an illusion of parallel processing.
2. no need for complex syncronization b/w processes , as each having its own address space.
3. one process damage will not halt entire processing.
oh, one more point: in some cases you will make use of third party modules which you have no control of in terms of stability / failure recovery. If one such crash occurs, your application dies. Is this acceptable? Probably not. If you encapsulate such modules in a child process, you will protect the master application from major failure; actually, the application will detect that the module has failed and it will try to manage the situation, eventually respawning the child process and doing whatever it can do to recover as much as possible. Still uncomfortable? Yes, but more acceptable than a total showstopper.
Single Process with multiple threads... Simple reason is that threads are light weight when compared to processes during context switch. Since the threads of a process share the same address space, the address space need not be changed every time a context switch occurs. This is not the same case with different processes.
- Kiran Kumar May 10, 2012