9 Sept 2018

Important Topics

Q: Project Team? Project Risk ? planning and organization ?
pg no(-) L35

Project Team:
  1. Only means to reach to goal.
  2. Vital step - team selection.
  3. special attention to create teams - who respect each other.
  4. size of team depends on complexity of project.
  5. tech knowledge and experience also matter
Project Risks:
  1. project will be at risk if expectation to projects will not controlled or if the results are ill defined or misunderstood.
  2. Before elimination need to identify and address the risk.
  3. Need to discuss with the team managers and end users.
  4. Need to know the cause and took measurement to minimize the risk factor
  5. It can be accessed under 3 categ
    1. business risk
    2. technology risk
    3. project size risk
  6. Control from initial stage. by following standards
  7. train your team
  8. Technical risk when new technology introduced
Q: Explain SPR assessment
pg no(-) L29

  1. Software accessment interview is a group activity for each project going to analyze.
  2. SPR consultant meeting with the manager and 7 technical staff at a time
  3. Participants have questions to give answer.
  4. the official accessment is filled by the SPR consultant after accesssing the group.
  5. in case of differnet opinion all the POV's are recorded
  6. Next these issues resolved.
  7. SPR are structured interviews based on scripted questions.
  8. questionnaire with mcqs 250/ 100 sets  
  9. once the project samples analyzed. the answer input in statistical tool to show stats and standard deviations
  10. then the weak ares are revealed and which is essential to software process improvement
  11. than next client is able to adopt the software process improvement program
  12. Next 6-stage improvement program requires
Six-Stage Improvement Program
  1. Focus on management technologies 
  2. Focus on software processes and methodologies 
  3. Focus on new tools and approaches  
  4. Focus on infrastructure and specialization  
  5. Focus on reusability  
  6. Focus on industry leadership
      
Q: Which stage we should collect data? When to Collect Data
pg no(138-)
  1. data collection planing need to start when product planning started.
  2. It required in many stages 
  3. some data need to collect when project starts while other later.
  4. also collected during inspection and reviews
  5. some data collected when changes applied to enhance the product
  6. data collection should be an essential part of development process
  7. In the same way if process/activity need to analyze .its required to prepare the measuring list.
Q: Ideal Model ? Establishing phase ? explain set context and build sponsorship ?
pg no(32-) L28

  • Ideal Model - It is a life cycle model for software process improvement
    • Based upon CMM
  • it provides usable and understandable approach/ steps in a way to Software Process improvement.
  • it has phases, activities and principles.
  • it provides the disciplined engineered work, which is foundation of long term improvement plan.
  • IDEAL model is named for the five phases it describes: initiating, diagnosing, establishing, acting, and learning
set context and build sponsorship are the steps of initiating phase
Set Context -
  1. Once the need of change identified: the management set context of work to be done.
  2. "setting context" explain where this efforts fit in current business strategy
  3. what business goals it have
  4. who it is affecting the ongoing work.
  5. its benefits?
Build Sponsorship - 
  1. effective sponsorship is required for software improvement
  2. As on the way to improvement system, may organization faces the chaos and uncertainity 
    • So critical management support required at the beginning
  3. sponsers are most effective if they give personal attention and stick with it.
The Establishing Phase
  1.  Purpose of the establishing phase is to develop a detailed work plan 
  2.  Finally, specific actions, milestones, deliverables, and responsibilities are incorporated into an action plan
  3. Step in Establishing Phase
    1. Set Priorities
    2. Develop Approach
    3. Plan Actions
Q: people CMM level 3 /level I
pg no(86-) L36

People CMM
  1. It helps to manage and develop organizations workforce.
  2. it employs the process maturity framework CMM consists of best practices.
  3. It is a tool that helps you successfully address the critical people issues in your organization

Seven Principle of Workforce Management
1. Employment security
2. Selective hiring of new personnel
3. Self-managed teams and decentralization of decision making
4. Comparatively high compensation contingent on organizational performance
5. Extensive training
6. Reduced status distinctions and barriers
7. Extensive sharing of financial and performance information

Process Maturity 5 levels 
  1. continuous process improvement in five stages
  2. it is a process standard and consists of best practices
  3. It integrates the improved practices into staged model & hence guide to cultural transformations
1st level
  1. initially in project there is no consistent way of performing the work.
    • so work processes are ad-hoc
  2. As no well-understood working way so- mangers have no proper way to estimate the project completion
  3. to meet the deadlines. the teams started doing mistakes.
    • which can be easily detected
  4. projects loss control in less-mature organization. as no control of cost schedule and quality.
3rd level
  1. organization identifies the best practices and 
    • integrate them into a common practice
  2. once people followed the common practices. organizations can depict which practice works best in which environment.
  3. prepared documentation to train others following it
  4. critical practices also kept in repo for later anaylsis
Q: List measuring scale n explain nominal n ordinal
pg no(116-) L 40

1. Nominal
2. Ordinal
3. Interval
4. Ratio
5. Absolute

1. Nominal

  1. Define class or categories and 
    1. place each entity in the specific class/category
    2. based on the value of attribute
  2. It is called nominal measurement
  3. classes are not ordered
  4. Example 
    • location of software faults (specification, code, design)  

2. Ordinal

  1. The ordinal scale is useful to augment the nominal classes with information related to classes.
  2. It gives ordering to classes which not possible by nominal scale
    • which further leads to analysis
  3. Now the empricial system consist of classes
    • ordered with respect to its attributes
  4. Complexity of software modules is described as
    • Trivial
    • Simple
    • Moderate
    • Complex
    • Incomprehensible


Q: TSP ? Teambuilding Principles ? TSP BOK ? Team Operational Processes

Team Operational Processes:
  1. it is required for disciplined work by engineers
  2. It precisely defined how the work is to be done.
  3. it is in form script which should be followed when doing work.
  4. The TSP provide operational process to enlist the team building steps
  5. defining it requires skill and effort

TSP Foundations and Fundamentals
  • it outlines the foundation on which TSP built
  • Fundamental concept requires for practitioner to implement it.
5 knowledge areas
  1. knowledge work
  2. TSP Prerequisite Knowledge
  3. TSP Principles
  4. TSP Process Elements and Measures
  5. TSP Quality Practices
Team Foundations
  • team of people work together and that team also have skills and are effective
  • Are capable of doing highly complex nature tasks
  • and they do large-scale task having many interdependence tasks
7 knowledge areas
  1. Teams and Team building
  2. Team Types, Styles, and Dynamics
  3. Team Formation and Membership
  4. Team Member Responsibilities
  5. Team Member Roles
  6. Team Leader Role
  7. Coach Role
Project planning with TSP
  • it begins when organization going to implement new technology following small pilot projects
  • TSP certified coach helps them to select the team and
  • to provide PSP and TSP project planning and tracking methodology
  • once trained they participate in TSP Launch
  • TSP launch is series of team building activites and composed of ten meetings
  • TSP coach work with managers leaders and team members to organize each and every step.
  • team present plan to managament and
  • after acceptance started working on it.
  • Its knowledge areas describes
    • requirements activites and guideposts
    • to implement Pilot PSP Project and launching TSP teams
5 knowledge areas
  1. Change Management Fundamentals
  2. Piloting TSP in an Organization
  3. Preparing Management and Teams for TSP Implementation
  4. The TSP Launch Meetings
  5. The TSP Relaunch
Project Implementation and Tracking with TSP
  • to correct and accurate the data
  • to fix the small problems which becomes bigger
  • using status tracking

5 knowledge areas
  1. Weekly Meetings
  2. Checkpoints
  3. Communicating with Stakeholders
  4. Replanning
  5. Phase, Cycle, and Project Postmortems
Gathering and Using TSP Data
  • PSP relies on individual data
  • TSP methods rely on data collection and analysis 
  • Data provide a baseline for making estimates and plans
5 knowledge areas
  1. Data Recording
  2. Gathering and Using Size Data
  3. Gathering and Using Schedule Data
  4. Gathering and Using Quality Data
  5. Gathering and Analyzing Postmortem Data
Scaling Up the TSP
  • begins with one or two small-scale pilot projects
3 knowledge areas
  1. Organizational Implementation
  2. TSP Process Variations
  3. Large-scale TSP Teams

2 Sept 2018

Imp Question and reference

Q: GQM approach ? procedure for deployment of coding standard ?
pg no(124-28 30) - L40/41

Q: TSP ? Teambuilding Principles ? TSP BOK
pg no(2/3-6/7-17) L 23/24/26

Q: shewart cycle? PDCA ? plan
pg no(24-31) L 27

Q: Benchmarks ? macroeconomic and economic bench marks
pg no(43-44) L 30

Q: Ideal Model ? Establishing phase ? explain set context and build sponsorship ?
pg no(32-) L28

Q: explain iso 12207 ?
pg no(-66) L33

Q: Process format pattern ? types of process pattern ?
pg no(56-) L32

Q: Defect backlog ? Defective fix
pg no(141-42) L43

Q: Project Team? Project Risk ? planning and organization ?
pg no(-) L35

Q: Agile development ?
pg no(51-) L31

Q: Explain SPR assessment
pg no(-) L29

Q: Which stage we should collect data? When to Collect Data
pg no(138-)

Q: people CMM level 3 /level I
pg no(-)

Q: List measuring scale n explain nominal n ordinal
pg no(-) L 40

Q: Schedule monitoring ? Baseline ?
pg no(-)

Q:
pg no(-)

31 Aug 2018

Important Questions - 712

Questions List: 


Q. Differentiate round robin, hash partitioning and range partitioning using diagrams?
Q. NUMA architecture.

Q.Define two classes of NUMA architecture
Q. Transaction replication in brief
Q: compare conservative timestamp ordering and multiversion timestamp ordering
Q. In context of query optimizer, which search strategy is better and in which condition?
Q. Design issues of object database system ?

Q: Advantage and disadvantages of shared memory?

Q. What are the principles to ensure the consistency and reliability of transaction? Level of consistency?

Q.Lets there are Cust(ID, name, contact, location) table. This table horizontally fragmented to three sites like Cust1, Cust2, and Cust3. (pg53)(pg-29)

Q:  Define the consistency and reliability aspects of a Transaction?  

Q:  What are the Instruction / Transactions states? (pg21 q28)
PAJ algorithm?

Question Prepared: 


Q. Differentiate Horizontal and vertical partitions?
Q. Comparison of Pessimistic and optimistic techniques.
Q. Comparison of array processing and multiprocessing with diagram. 
Q. Parallel processing advantages and challenges?
Q. Architecture of parallel database system?
Q: Parallel Databases:

Q. Advantage and disadvantages of shared nothing?  

Q. Define the consistency and reliability aspects of a Transaction?
Q. Replica Transactions?
Q. Transformation rules of rewriting queries?
Q. Types and Classification of Transactions?
Q. Termination conditions of Transactions?
Q. Semantic and Structural heterogeneity?
Q. Schema Translation and schema integrity?
Q. Joining Techniques/what are the two methods of joining relations?


Unanswered

Q7. A query was given and any three possible executions were asked.
Q3: Write sql query as per conditions and give CNF and DNF forms?  

Q4: Fragmentation on given relation based on projects manages by an employee? 

7 Aug 2018

Final Terms - Cs712

Q. Differentiate round robin, hash partitioning and range partitioning using diagrams?

Round RobinWith n partitions, the ith tuple is placed in partition (i mod n). This strategy enables sequential access to a relation to be done in parallel. The direct access to individual tuples, based on predicate, requires accessing the entire relation as shown in
Range PartitioningIt distributes tuples based on value intervals of some attribute as shown in figure 6. It is suitable for exact match and range queries. Range partition can result in high variation in partition size. 
Hash partitioning: It applies a hash function to some attribute which yields the partition number as shown in figure 5. This strategy allows exact-match queries on the selection attribute to be processed by exactly one node and all other queries to be processed by all the nodes in parallel.

Q. NUMA architecture.

Hierarchical Architecture/ Numa
Hierarchical architecture (also called cluster architecture), is a combination of sharednoting and shared- memory. The idea is to build a shared-nothing machine whose nodes
are shared-memory.

Advantage:The advantage of hierarchical architecture is:
  • Combines the benefits of both shared memory and shared-nothing
NUMA Architectures  
  • Non-uniform memory access
  • To provide benefits of shared-memory model in a scaleable parallel architecture.
Q.Define two classes of NUMA architecture
  • Cache coherent-NUMA: divides memory statically among all nodes 
  • Cache only Memory Architecture: converts per-node memory into a large cache of shared address space

Because shared-memory and cache coherency are supported by hardware, remote memory access is very efficient, only several times (typically 4 times) the cost of local access as shown in figure. The strong argument for NUMA is that it does not require any rewriting of application software.

Q. Transaction replication in brief

  • Data replicated as the transaction executes
  • Preferred in higher bandwidth and lower latency
  • Transaction is replicated as it is executed
  • Begins with a snapshot, changes sent at subscribers as they occur or at timed intervals
  • A special Type allows changes at subscribers

Q. In context of query optimizer, which search strategy is better and in which condition?

Query optimization refers to the process of producing a query execution plan (QEP) which represents an execution strategy for the query. The selected plan minimizes an objective cost functions. A query optimizer, the software module that performs query optimization.
Three components:

  1. Search space consists of equivalent query trees 
  2. Search strategy could be static, dynamic or randomized
  3. Cost model sees response and total times... 

Search space
  • The search space is the set of alternative execution plans to represent the input query.These plans are equivalent, in the sense that the same result but they differ on execution order of operations and the way these operations are implemented. Search space consists of equivalent query trees produced using transformation rules. Optimizer concentrates on join trees, since join cost is the most effective.
  • For a complex query the number of equivalent operator trees can be very high. For instance, the number of alternative join trees that can be produced by applying the commutativity and associativity rules is O(N!) for N relations. Query optimizers restrict the size of the search space they consider. Two restrictions are: 
  1. Heuristics
    • Most common heuristic is to perform selection and projection on base relations
    • Another is to avoid Cartesian product
  2. Shape of join Tree
    • Two types of join trees are distinguished:
      • Linear Tree
        • At least one node for each operand is a base relation  
      • Bushy tree
        • May have operators with no base relations as operands (both operands are intermediate relations)  
Search strategy
  • Most popular search strategy is Dynamic Programming
  • That starts with base relations and keeps on adding relations calculating cost
  • DP is almost exhaustive so produces best plan
  • Too expensive with more than 5 relations
  • Other option is Randomized strategy
  • Do not guarantee best
Cost model
An optimizer’s cost model includes cost functions to predict the cost of operators, statistics, and base data and formulas to evaluate the sizes of intermediate results.
  1. Cost function
  2. Database Statistics
  3. Cardinalities of Intermediate Results
Cost function:
  • The cost of distributed execution strategy can be expressed with respect to either the total time or the response time.
  • Total time = CPU time + I/O time + tr time
  • In WAN, major cost is tr time
  • Initially ratios were 20:1 for tr and I/O, for LAN it is 1:1.6
  • Response time = CPU time + I/O time + tr time
  • TCPU = time for a CPU insts
  • TI/O = a disk I/O
  • TMSG = fixed time for initiating and recv a msgs
  • TTR = transmit a data unit from one site to another
QDesign issues of object database system ?
  • Distribution design involves fragmentation and allocation. Distribution design in the object world brings new complexities. Conceptually, objects encapsulate methods together with state. In reality, methods are implemented on types and shared by all instance objects of that type. The location of objects with respect to their types becomes an issue.
  • One has to decide whether fragmentation is performed only on attributes duplicating the methods with each fragment, or whether one can fragment methods as well.
Q: compare conservative timestamp ordering and multiversion timestamp ordering

Timestamp Ordering (TO)
• Basic TO
• Multiversion TO
• Conservative TO
Conservative TO
• Basic TO generate too many restarts
• Like, if a site is relatively calm, then its transactions will be restarted again and again
• Synchronizing timestamps may be very costly
• System clocks can used if they are at comparable speeds
• In con-TO, operations are not executed immediately, but they are buffered
• Scheduler maintains queue for each TM.
• Operations from a TM are placed in relevant queue, ordered and executed later
• Reduces but does not eliminate restarts
Multiversion TO
  • Another attempt to reduce the restarts
  • Multiple versions of data items with largest r/w stamps are maintained.
  • Read operation is performed from appropriate version
  • Write is rejected if any older has read or written a data item
-------------------------------------


Q. What are the principles to ensure the consistency and reliability of transaction? Level of consistency?

Consistency: refers simply to the correctness of a transaction
– A Tr should transform the DB from one consistent state to another consistent state.
– Concern of Semantic Integrity Control and Concurrency Control
– Classification of consistency for Trs uses the term “Dirty Data”; data that has been updated by a Tr before its commitment.

Degree 3: Transaction T sees degree 3 Consistency if:
1- T does not overwrite dirty data of other transactions
2- T does not commit any writes until it completes all its writes (i.e., until end of transaction)
3- T does not read dirty data from other transactions
4- Other transactions do not dirty any data read by T before T completes

Q. Define the consistency and reliability aspects of a Transaction?

The consistency and reliability aspects of transactions are due to following properties: (See Example pg26-27
1- Atomicity:
also known as “all or none” property
• In a transaction involving two or more discrete pieces of information, either all of the pieces are committed or none is committed
2- Consistency:
refers simply to the correctness of a transaction
• A transaction either creates a new and valid state of data, or, if any failure occurs, returns all data to its original state i.e. before the transaction was started.
3- Isolation:
• A transaction cannot reveal its results to other transactions before commitment
• A transaction in process and not yet committed must remain isolated from any other transaction
4- Durability
It refers to that property of transaction which ensures that once a transaction commits, its results are permanent and cannot be erased from the database.


Q. Termination conditions of Transactions?

If transaction can complete its task successfully, we say that the transaction commits. If a
transaction stops without completing its tasks, we say that it aborts when a transaction is
aborted its execution is stopped and all of its already executed actions are undone by
returning the database to the state before their execution. This is also known as rollback.

Q: Advantage and disadvantages of shared memory?
Any processor has access to any memory module or disk unit through a fast interconnect. Examples of shared memory parallel database systems include DBS3 and Volcano.

Advantages:
The two main advantages are:
• Simplicity
• Load balancing is excellent since can be 
dynamic.
Drawbacks:The three main disadvantages are:
• Cost of high interconnect
• Low extensibility
• Low availability


Q. Advantage and disadvantages of shared nothing?  

In shares-nothing approach each processor has exclusive access to its main memory and disk unit(s). Then each node can be viewed as a local site (with its own database and software) in a distributed database system.
Therefore, most solutions designed for distributed database such as database fragmentation, distributed transaction management and distributed query processing may be reused.
Examples of shared-nothing parallel database systems include the Teradata’s DBC and
Tandem’s NonStopSQL products as well as number of prototypes such as BUBBA,
GAMMA and so on.
  


Advantages:
Shared-nothing has following advantages:


  • Cost
  • Availability and
  • extensibility  

 Drawbacks:
Shared-nothing has following disadvantages:
  • More complex than shared memory
  • Difficult load balancing
-------------------------------------

Q: Parallel Databases:


  • Partitioned database is a basis, for "parallel execution" of queries
  • Design of supporting "Parallel algorithms" is another challenge
    • For database operators and DB queries (which combine operators)
  • It is difficult because it required good trade off btw Parallelism and communication cost.
  • "parallel algorithms" for "relational algebra operators" are the building blocks for  parallel query processing.
  • parallel data processing should exploit the intra-operation parallelism.
  • "Distributed join algorithms" designed for high speed networks
  • 3 types - basic "Parallel join algorithm" for partitioned database


Parallel join algo:

  • Simplest algo
  • composes of cartesian product of relations r and s in ||
  • arbitrary complex "join predicates" may also supported

1st phase

  • Each fragment of R is replicate at each node at S
  • with brodcast it may take m message time, otherwise m*n

2nd phase

  • Each S-node locally join the R with the fragment Sj.
  • This is done in || by n nodes.
  • Local join can be done as in Centralized DBMS.
  • depend on localy join algo, join processing may or may not be started, as soon as data recieved

Parallel associative join


  • It applies only for equi-join relation, where 1 relation partitioned on "join attribute"
  • Assume that "egui join predicate" is on attribute A for R and B from S.
  • relation S is partioned based on # function h applied to the attribute B.
  • means that all the tupples of S that have the same value for h(B) are placed at the same node.
  • No knowledge how R is partitoned, is assumed

1st phase

  • relation R is sent to S-nodes applying # function on attribute A
  • this phase is done on ||, where ri exist
  • tuples of R get distributed but not replicated across the S-nodes.

2nd phase

  • each S-node j recieves the subset of R(Rj) and
  • join it locally with the fragment Sj
  • local "join processing" can be done in ||
  • to summarize the "|| associative join" algo replace the operator R ⋈ S by
  • big U(i=1 - n) R ⋈ S

Parallel hash join

  • generalization of "|| associative join"
  • also apply on equi-join but doesn't required the particular  partitioning of operand relation
  • idea is to partitioned the relation R and S into no P of mutually excusive sets * fragments
  • R1,R2,....Rn and S1,S2,....Sn (R ⋈ S)

such that R  S = big U (i=1 - p) R ⋈ S
  • partitioning of R and S can be applied on same # fucn applied to join attribute.
  • each individual join is done in || and join result is produced at P nodes.
  • P node selection may done on run time depends on system load
  • diff with "|| associative join" is that partition of S is necessary
    • and result is produced at P nodes rather than S nodes
===============

Q. Comparison of array processing and multiprocessing with diagram. 15marks.
Parallel Processing basics:-
  1. Multiprocessor Systems
  • Multiprocessing is the use of two or more central processing units (CPUs) within a single computer system.
  • One computer with a global RAM shared by many processors and are tightly coupled.
  • In multiprocessor, thousands of processor is controlled by single operating system, and each processor accessing the same job queue.
Therefore, Synchronization and interprocessor communication is required. Interconnection systems are• Time shared or common bus• Crossbar switches• Multiport-memory systems• Multistage networks  

Associative Processors
They are designed to speed up the search of data items in memory. Check all memory blocks in parallel to search for matching memory blocks.
Nearest neighbor mesh

Array Processors
Set of identical Processors synchronized to perform same instruction simultaneously on different data. They are also called SIMD.

Q. Parallel processing advantages and challenges?
Advantages:-
  1. More fault tolerance
  2. Increases throughput
  3. Better price/performance
Challenges:
  1. image processing
  2. large-scale simulation
  3. Weather forecasting etc.
===============

Q. Differentiate Horizontal and vertical partitions?
Vertical Fragmentation:
Different subsets of attributes are stored at different places, like, Table EMP (eId,
eName, eDept, eQual, eSal)
Interests of the local and head offices may result following vertical partitions of this
table: EMP1 (eId, eName, eDept)
EMP2 (eId, eQual, eSal)
Horizontal Fragmentation:
It is based on the localization of data rows of a table are split on multiple sites, like the
data in CLIENT(cAC#, cName, cAdr, cBal)
Table is placed in different databases based on their location, like from Lahore, Pindi,
Karachi, Peshawar, Quetta

Q. Comparison of Pessimistic and optimistic techniques.
The concurrency control mechanisms can be group into two broad classes:
Pessimistic concurrency control methods and Optimistic concurrency control methods.
  1. Pessimistic algorithms synchronize the concurrent execution of transactions early in their execution life cycle, whereas
  2. Optimistic algorithms delay the synchronization of transactions until their termination.
  • These algorithms can be Basic TO, Multiversion TO, or Conservative TO.

Pessimistic & Optimistic.
Pessimistic approach synchronizes transactions early
  • Locking-based
    • Centralized Locking
    • Primary Copy Locking
    • Distributed Locking
  • Timestamp Ordering (TO)
    • Basic TO
    • Multiversion TO
    • Conservative TO
  • Hybrid
Optimistic do this late in execution life cycle of transactions
  • Locking-based
  • Timestamp ordering-based
Locking based Concurrency Control
Basic idea is that data items accessed by conflicting operations are accessed by one operation at a time
  • Data Items locked by Lock Manager
  • Two major types of locks,
    • read lock
    • write lock
  • Transaction need to apply lock first.
  • For improved accessibility, compatibility of locks to be established
  • Locking is job of DDBMS, not the user
  • Scheduler is the Lock Manager
  • TM and LM interact.
Locking based Concurrency Control Algorithm
  • The locking algorithm will not unfortunately properly synchronize transaction executions.
  • This is because to generate serializable schedules, the locking and releasing operations of transactions also need to be coordinated.
The timestamp ordering (TO) class involves organizing the execution order of transactions is that they maintain transaction consistency. This ordering is maintained by assigning timestamps to both the transactions and the data items that are stored in the database.


Q. Architecture of parallel database system?
Assuming the client server architecture, parallel database system support functions can be divided into three subsystems, which are very similar as the typical RDBMS.
Depending on the architecture, processor can support all or a subset of these subsystems.

  • Session Manager
    1. It acts as a transaction monitor and provides support for client-server interaction. It performs connection and disconnection between the client process and the other two subsystems.
    2. It starts and closes the user session. In the case of an OLTP session, the session manager can trigger the execution of the preloaded transaction code with the data manager module.
  • Request Manager
    1. It receives client requests for query compilation and execution. It has direct access to the database, which contains all the meta information about the data and the program.
    2. The directory itself should be managed as a database in the server. Upon request, it activates various compilation phases, triggers query execution and returns the results and error codes to the client application.
  • Data Manager
    1. It provides all the low level functions needed to run compiled queries in parallel, i.e.
      • Execution of DB operations
      • Transaction management support - Cache management  


Important concepts



Q. Types and Classification of Transactions?

Types of Transactions
Transactions have been classified according to a number of criteria one
criterion is the duration of transaction.
Transaction may be classified as
• On-line (short-life)
• Batch (long-life)
Transactions can be classified according to their structure. Four broad categories
– Flat (or simple) transactions
– Closed nested transactions
– Open nested transactions
– Workflow models
Online transactions are characterized by very short execution/response times and by
access to a relatively small portion of the database. Examples are banking and airline
reservation transactions.
Batch transactions are CAD/CAM databases, statistical applications, report
generation, complex queries and image processing.
Another classification is with respect to the organization of the read write actions if
the transactions are restricted so that all the read actions are performed before any
write action, the transaction is called a two-step transaction.
If transaction is restricted so that a data item has to be read before it can be
updated (written), the corresponding class is called restricted (or read-before-write).
If a transaction is both two-step and restricted, it is called a restricted two-step transaction

Q. Transformation rules of rewriting queries?

Commutativity of binary operations
R × S <=> S × R
⋈ <=>  R
Associativity of binary operations
( R × S) × <=>  R × (S × T)
(⋈ S <=>   (⋈ T) 

Query processing has four different phases and the first phase is the Query Decomposition. The steps of query decomposition are normalization, analysis, simplification and rewriting.

Q. Replica Transactions?

Distributed 2PL (D2PL) expects the availability of lock managers at each site. If the
database is not replicated, distributed 2PL degenerates into primary copy 2PL algorithm.
If data is replicated the transaction implements the ROWA replica control protocol.

(lock request =>  operation ) (<= End of Operation => release lock)


Q. Semantic and Structural heterogeneity?

Two types of heterogeneity:
1) Semantic Heterogeneity: same concepts modeled differently in different
schemas: Example: naming, aggregation, generalization, attribute class,
default value, database identifier, schema
isomorphism, missing values
2) Structural Heterogeneity: differences in the representation of corresponding
schema elements
Examples: Data scaling and precision, attribute integrity constraint, objects
included, domain.
Heterogeneities are resolved by applying the mapping on elements. We cannot
ask/expect the component databases to change. Most common semantic is
naming conflict that could be Synonyms and Homonyms.
Synonyms: different names modeling same thing, like in our example schema
ENGINEER/ EMP and Salary/Sal.
Homonym: same name modeling different things, like title here.

Q. Schema Translation and schema integrity?

Schema Translation: involves translating the component schemas into same
data models. It helps to
• Compare schema elements
• Merge them
Choice of Common Data Model (CDM) is important in Schema Translation.
We should prefer the semantic
data models, like E-R or OO. After Schema Translation we perform Schema Integration
(SI). Schema Integration: involves merging component schemas into a common schema
(the global schema). SI involves identifying corresponding schema elements from different
component databases and merging. Hampered mainly be semantic heterogeneities.

Q. Joining Techniques/what are the two methods of joining relations?

There are two methods:
1)Nested loops
2)Merge join
1) Nested loops:
It composes the product of the two relations. For each tuple of the external
relation, the tuples of the internal
relation that satisfy the join predicate are retrieved one by one to form the
resulting relation. An index on the
join attribute is very efficient access path for internal relation. In the absence of an
index, for relations of n1 and n2 pages resp. this algorithm has a cost proportional
to n1*n2 which may be prohibitive if n1 & n2 are high.
2) Merge join:
If consists of merging two sorted relations on the join attribute as shown in figure 1.
Indices on the join attribute may be used as access paths. If the join criterion is equally
the cost of joining two relations n1 and n2 pages, resp. is proportional to n1+n2. this
method is always chosen when there is an equi join, and when the relations are
previously sorted.