Nomura Interview Question
Applications DevelopersCountry: India
Closure Tables
A closure table gives you the ability to do all the same sorts of "find me all children of X, to depth N" queries as any of the other methods, just by doing a join against the closure table.
But the killer feature of the closure table, is that to add or remove a parent-child relationship, you only need to run *one* SQL query -- a query so simple that even the least-powerful SQL databases can be configured to run it as a trigger on the main table!
Let's take a look at how this works. A closure table is simply a table that maintains the "transitive closure" of the parent-child relationships in the base table. So, let's say you're modelling a directory structure, and you have a "directory" table, with a foreign key "parent_dir" pointing to each row's parent directory.
With this structure, you can only query direct (depth 1) relationships, but by adding a "closure" table with fields for "parent", "child", and "depth", you can represent the hierarchy to whatever depth is present. So, if directory C is a child of directory B, and directory B is a child of A, then the base table would look like this:
id parent_dir name
1 0 A
2 1 B
3 2 C
And the closure table would look like this:
parent child depth
1 1 0
2 2 0
3 3 0
1 2 1
2 3 1
1 3 2
A simple approach would be to use a separate table which will be queried by the external system .. If you have a Table A which is to be used by an external system, make a table B which is exact replica of Table A .. Now every 5-10 secs or 1-2 minutes depending upon your requirements, sync the two table by running sql server scheduled job ... So the external system will query table B and get almost realtime data, and this table B is only used by external system..
There are some ways:
1. data slice: according to the date time span{month, season, year or more detail} or other column value (such as product name: fast food, bread or other standard) to minimize the size of certain table, after this operation, you can separate the load into smaller tables.
2. server structure: using redundant data, according to the location or business to create many mirrors.
3. using dirty data: if the business don't have high demand on data reality, you can using .json or other data formats to download part data to local machine, after the whole service, writing dirty data into database.
I believe there are many other methods, but you should do the right things after you know the real environment.
this can be done by making replica of the main table, with no direct connection between the two tables to avoid the influence. in order to save any changes done(correct changes only), SCD2 can be maintained between the two using ETL.
- Anonymous March 22, 2014