Learning & Entertainment

9 Sept 2018

Important Topics

Q: Project Team? Project Risk ? planning and organization ?
pg no(-) L35

Project Team:

Only means to reach to goal.
Vital step - team selection.
special attention to create teams - who respect each other.
size of team depends on complexity of project.
tech knowledge and experience also matter

Project Risks:

project will be at risk if expectation to projects will not controlled or if the results are ill defined or misunderstood.
Before elimination need to identify and address the risk.
Need to discuss with the team managers and end users.
Need to know the cause and took measurement to minimize the risk factor
It can be accessed under 3 categ

business risk
technology risk
project size risk

Control from initial stage. by following standards
train your team
Technical risk when new technology introduced

Q: Explain SPR assessment
pg no(-) L29

Software accessment interview is a group activity for each project going to analyze.
SPR consultant meeting with the manager and 7 technical staff at a time
Participants have questions to give answer.
the official accessment is filled by the SPR consultant after accesssing the group.
in case of differnet opinion all the POV's are recorded
Next these issues resolved.
SPR are structured interviews based on scripted questions.
questionnaire with mcqs 250/ 100 sets
once the project samples analyzed. the answer input in statistical tool to show stats and standard deviations
then the weak ares are revealed and which is essential to software process improvement
than next client is able to adopt the software process improvement program
Next 6-stage improvement program requires

Six-Stage Improvement Program

Focus on management technologies
Focus on software processes and methodologies
Focus on new tools and approaches
Focus on infrastructure and specialization
Focus on reusability
Focus on industry leadership

Q: Which stage we should collect data? When to Collect Data

pg no(138-)

data collection planing need to start when product planning started.
It required in many stages
some data need to collect when project starts while other later.
also collected during inspection and reviews
some data collected when changes applied to enhance the product
data collection should be an essential part of development process
In the same way if process/activity need to analyze .its required to prepare the measuring list.

Q: Ideal Model ? Establishing phase ? explain set context and build sponsorship ?
pg no(32-) L28

Ideal Model - It is a life cycle model for software process improvement

Based upon CMM

it provides usable and understandable approach/ steps in a way to Software Process improvement.
it has phases, activities and principles.
it provides the disciplined engineered work, which is foundation of long term improvement plan.
IDEAL model is named for the five phases it describes: initiating, diagnosing, establishing, acting, and learning

set context and build sponsorship are the steps of initiating phase

Set Context -

Once the need of change identified: the management set context of work to be done.
"setting context" explain where this efforts fit in current business strategy
what business goals it have
who it is affecting the ongoing work.
its benefits?

Build Sponsorship -

effective sponsorship is required for software improvement
As on the way to improvement system, may organization faces the chaos and uncertainity

So critical management support required at the beginning

sponsers are most effective if they give personal attention and stick with it.

The Establishing Phase

Purpose of the establishing phase is to develop a detailed work plan
Finally, specific actions, milestones, deliverables, and responsibilities are incorporated into an action plan
Step in Establishing Phase

Set Priorities
Develop Approach
Plan Actions

Q: people CMM level 3 /level I
pg no(86-) L36

People CMM

It helps to manage and develop organizations workforce.
it employs the process maturity framework CMM consists of best practices.
It is a tool that helps you successfully address the critical people issues in your organization

Seven Principle of Workforce Management

1. Employment security

2. Selective hiring of new personnel

3. Self-managed teams and decentralization of decision making

4. Comparatively high compensation contingent on organizational performance

5. Extensive training

6. Reduced status distinctions and barriers

7. Extensive sharing of financial and performance information

Process Maturity 5 levels

continuous process improvement in five stages
it is a process standard and consists of best practices
It integrates the improved practices into staged model & hence guide to cultural transformations

1st level

initially in project there is no consistent way of performing the work.

so work processes are ad-hoc

As no well-understood working way so- mangers have no proper way to estimate the project completion
to meet the deadlines. the teams started doing mistakes.

which can be easily detected

projects loss control in less-mature organization. as no control of cost schedule and quality.

3rd level

organization identifies the best practices and

integrate them into a common practice

once people followed the common practices. organizations can depict which practice works best in which environment.
prepared documentation to train others following it
critical practices also kept in repo for later anaylsis

Q: List measuring scale n explain nominal n ordinal
pg no(116-) L 40

1. Nominal
2. Ordinal
3. Interval
4. Ratio
5. Absolute

1. Nominal

Define class or categories and

place each entity in the specific class/category
based on the value of attribute

It is called nominal measurement
classes are not ordered
Example

location of software faults (specification, code, design)

2. Ordinal

The ordinal scale is useful to augment the nominal classes with information related to classes.
It gives ordering to classes which not possible by nominal scale

which further leads to analysis

Now the empricial system consist of classes

ordered with respect to its attributes

Complexity of software modules is described as

Trivial
Simple
Moderate
Complex
Incomprehensible

Q: TSP ? Teambuilding Principles ? TSP BOK ? Team Operational Processes

Team Operational Processes:

it is required for disciplined work by engineers
It precisely defined how the work is to be done.
it is in form script which should be followed when doing work.
The TSP provide operational process to enlist the team building steps
defining it requires skill and effort

TSP Foundations and Fundamentals

it outlines the foundation on which TSP built
Fundamental concept requires for practitioner to implement it.

5 knowledge areas

knowledge work
TSP Prerequisite Knowledge
TSP Principles
TSP Process Elements and Measures
TSP Quality Practices

Team Foundations

team of people work together and that team also have skills and are effective
Are capable of doing highly complex nature tasks
and they do large-scale task having many interdependence tasks

7 knowledge areas

Teams and Team building
Team Types, Styles, and Dynamics
Team Formation and Membership
Team Member Responsibilities
Team Member Roles
Team Leader Role
Coach Role

Project planning with TSP

it begins when organization going to implement new technology following small pilot projects
TSP certified coach helps them to select the team and
to provide PSP and TSP project planning and tracking methodology
once trained they participate in TSP Launch
TSP launch is series of team building activites and composed of ten meetings
TSP coach work with managers leaders and team members to organize each and every step.
team present plan to managament and
after acceptance started working on it.
Its knowledge areas describes

requirements activites and guideposts
to implement Pilot PSP Project and launching TSP teams

5 knowledge areas

Change Management Fundamentals
Piloting TSP in an Organization
Preparing Management and Teams for TSP Implementation
The TSP Launch Meetings
The TSP Relaunch

Project Implementation and Tracking with TSP

to correct and accurate the data
to fix the small problems which becomes bigger
using status tracking

5 knowledge areas

Weekly Meetings
Checkpoints
Communicating with Stakeholders
Replanning
Phase, Cycle, and Project Postmortems

Gathering and Using TSP Data

PSP relies on individual data
TSP methods rely on data collection and analysis
Data provide a baseline for making estimates and plans

5 knowledge areas

Data Recording
Gathering and Using Size Data
Gathering and Using Schedule Data
Gathering and Using Quality Data
Gathering and Analyzing Postmortem Data

Scaling Up the TSP

begins with one or two small-scale pilot projects

3 knowledge areas

Organizational Implementation
TSP Process Variations
Large-scale TSP Teams

2 Sept 2018

Q: GQM approach ? procedure for deployment of coding standard ?
pg no(124-28 30) - L40/41

Q: TSP ? Teambuilding Principles ? TSP BOK
pg no(2/3-6/7-17) L 23/24/26

Q: shewart cycle? PDCA ? plan
pg no(24-31) L 27

Q: Benchmarks ? macroeconomic and economic bench marks
pg no(43-44) L 30

Q: Ideal Model ? Establishing phase ? explain set context and build sponsorship ?
pg no(32-) L28

Q: explain iso 12207 ?
pg no(-66) L33

Q: Process format pattern ? types of process pattern ?
pg no(56-) L32

Q: Defect backlog ? Defective fix
pg no(141-42) L43

Q: Project Team? Project Risk ? planning and organization ?
pg no(-) L35

Q: Agile development ?
pg no(51-) L31

Q: Explain SPR assessment
pg no(-) L29

Q: Which stage we should collect data? When to Collect Data

pg no(138-)

Q: people CMM level 3 /level I
pg no(-)

Q: List measuring scale n explain nominal n ordinal
pg no(-) L 40

Q: Schedule monitoring ? Baseline ?
pg no(-)

Q:
pg no(-)

31 Aug 2018

Important Questions - 712

Questions List:

Q. Differentiate round robin, hash partitioning and range partitioning using diagrams?
Q. NUMA architecture.

Q.Define two classes of NUMA architecture
Q. Transaction replication in brief
Q: compare conservative timestamp ordering and multiversion timestamp ordering
Q. In context of query optimizer, which search strategy is better and in which condition?
Q. Design issues of object database system ?

Q: Advantage and disadvantages of shared memory?

Q. What are the principles to ensure the consistency and reliability of transaction? Level of consistency?

Q.Lets there are Cust(ID, name, contact, location) table. This table horizontally fragmented to three sites like Cust1, Cust2, and Cust3. (pg53)(pg-29)

Q: Define the consistency and reliability aspects of a Transaction?

Q: What are the Instruction / Transactions states? (pg21 q28)
PAJ algorithm?

Question Prepared:

Q. Differentiate Horizontal and vertical partitions?
Q. Comparison of Pessimistic and optimistic techniques.
Q. Comparison of array processing and multiprocessing with diagram.
Q. Parallel processing advantages and challenges?
Q. Architecture of parallel database system?
Q: Parallel Databases:

Q. Advantage and disadvantages of shared nothing?

Q. Define the consistency and reliability aspects of a Transaction?

Q. Replica Transactions?

Q. Transformation rules of rewriting queries?

Q. Types and Classification of Transactions?

Q. Termination conditions of Transactions?
Q. Semantic and Structural heterogeneity?
Q. Schema Translation and schema integrity?
Q. Joining Techniques/what are the two methods of joining relations?

Unanswered

Q7. A query was given and any three possible executions were asked.
Q3: Write sql query as per conditions and give CNF and DNF forms?

Q4: Fragmentation on given relation based on projects manages by an employee?

7 Aug 2018

Final Terms - Cs712

Q. Differentiate round robin, hash partitioning and range partitioning using diagrams?

Round Robin: With n partitions, the ith tuple is placed in partition (i mod n). This strategy enables sequential access to a relation to be done in parallel. The direct access to individual tuples, based on predicate, requires accessing the entire relation as shown in

Range Partitioning: It distributes tuples based on value intervals of some attribute as shown in figure 6. It is suitable for exact match and range queries. Range partition can result in high variation in partition size.

Hash partitioning: It applies a hash function to some attribute which yields the partition number as shown in figure 5. This strategy allows exact-match queries on the selection attribute to be processed by exactly one node and all other queries to be processed by all the nodes in parallel.

Q. NUMA architecture.

Hierarchical Architecture/ Numa
Hierarchical architecture (also called cluster architecture), is a combination of sharednoting and shared- memory. The idea is to build a shared-nothing machine whose nodes
are shared-memory.

Advantage:The advantage of hierarchical architecture is:

Combines the benefits of both shared memory and shared-nothing

NUMA Architectures

Non-uniform memory access
To provide benefits of shared-memory model in a scaleable parallel architecture.

Q.Define two classes of NUMA architecture

Cache coherent-NUMA: divides memory statically among all nodes
Cache only Memory Architecture: converts per-node memory into a large cache of shared address space

Because shared-memory and cache coherency are supported by hardware, remote memory access is very efficient, only several times (typically 4 times) the cost of local access as shown in figure. The strong argument for NUMA is that it does not require any rewriting of application software.

Q. Transaction replication in brief

Data replicated as the transaction executes
Preferred in higher bandwidth and lower latency
Transaction is replicated as it is executed
Begins with a snapshot, changes sent at subscribers as they occur or at timed intervals
A special Type allows changes at subscribers

Q. In context of query optimizer, which search strategy is better and in which condition?

Query optimization refers to the process of producing a query execution plan (QEP) which represents an execution strategy for the query. The selected plan minimizes an objective cost functions. A query optimizer, the software module that performs query optimization.
Three components:

Search space consists of equivalent query trees
Search strategy could be static, dynamic or randomized
Cost model sees response and total times...

Search space

The search space is the set of alternative execution plans to represent the input query.These plans are equivalent, in the sense that the same result but they differ on execution order of operations and the way these operations are implemented. Search space consists of equivalent query trees produced using transformation rules. Optimizer concentrates on join trees, since join cost is the most effective.
For a complex query the number of equivalent operator trees can be very high. For instance, the number of alternative join trees that can be produced by applying the commutativity and associativity rules is O(N!) for N relations. Query optimizers restrict the size of the search space they consider. Two restrictions are:

Heuristics

Most common heuristic is to perform selection and projection on base relations
Another is to avoid Cartesian product

Shape of join Tree

Two types of join trees are distinguished:

Linear Tree

At least one node for each operand is a base relation

Bushy tree

May have operators with no base relations as operands (both operands are intermediate relations)

Search strategy

Most popular search strategy is Dynamic Programming
That starts with base relations and keeps on adding relations calculating cost
DP is almost exhaustive so produces best plan
Too expensive with more than 5 relations
Other option is Randomized strategy
Do not guarantee best

Cost model
An optimizer’s cost model includes cost functions to predict the cost of operators, statistics, and base data and formulas to evaluate the sizes of intermediate results.

Cost function
Database Statistics
Cardinalities of Intermediate Results

Cost function:

The cost of distributed execution strategy can be expressed with respect to either the total time or the response time.
Total time = CPU time + I/O time + tr time
In WAN, major cost is tr time
Initially ratios were 20:1 for tr and I/O, for LAN it is 1:1.6
Response time = CPU time + I/O time + tr time
TCPU = time for a CPU insts
TI/O = a disk I/O
TMSG = fixed time for initiating and recv a msgs
TTR = transmit a data unit from one site to another

Q. Design issues of object database system ?

Distribution design involves fragmentation and allocation. Distribution design in the object world brings new complexities. Conceptually, objects encapsulate methods together with state. In reality, methods are implemented on types and shared by all instance objects of that type. The location of objects with respect to their types becomes an issue.
One has to decide whether fragmentation is performed only on attributes duplicating the methods with each fragment, or whether one can fragment methods as well.

Q: compare conservative timestamp ordering and multiversion timestamp ordering

Timestamp Ordering (TO)

• Basic TO

• Multiversion TO

• Conservative TO

Conservative TO

• Basic TO generate too many restarts

• Like, if a site is relatively calm, then its transactions will be restarted again and again

• Synchronizing timestamps may be very costly

• System clocks can used if they are at comparable speeds

• In con-TO, operations are not executed immediately, but they are buffered

• Scheduler maintains queue for each TM.

• Operations from a TM are placed in relevant queue, ordered and executed later

• Reduces but does not eliminate restarts

Multiversion TO

Another attempt to reduce the restarts
Multiple versions of data items with largest r/w stamps are maintained.
Read operation is performed from appropriate version
Write is rejected if any older has read or written a data item

-------------------------------------

Q. What are the principles to ensure the consistency and reliability of transaction? Level of consistency?

Consistency: refers simply to the correctness of a transaction

– A Tr should transform the DB from one consistent state to another consistent state.

– Concern of Semantic Integrity Control and Concurrency Control

– Classification of consistency for Trs uses the term “Dirty Data”; data that has been updated by a Tr before its commitment.

Degree 3: Transaction T sees degree 3 Consistency if:

1- T does not overwrite dirty data of other transactions

2- T does not commit any writes until it completes all its writes (i.e., until end of transaction)

3- T does not read dirty data from other transactions

4- Other transactions do not dirty any data read by T before T completes

Q. Define the consistency and reliability aspects of a Transaction?

The consistency and reliability aspects of transactions are due to following properties: (See Example pg26-27

1- Atomicity:

also known as “all or none” property

• In a transaction involving two or more discrete pieces of information, either all of the pieces are committed or none is committed

2- Consistency:

refers simply to the correctness of a transaction

• A transaction either creates a new and valid state of data, or, if any failure occurs, returns all data to its original state i.e. before the transaction was started.

3- Isolation:

• A transaction cannot reveal its results to other transactions before commitment

• A transaction in process and not yet committed must remain isolated from any other transaction

4- Durability

It refers to that property of transaction which ensures that once a transaction commits, its results are permanent and cannot be erased from the database.

Q. Termination conditions of Transactions?

If transaction can complete its task successfully, we say that the transaction commits. If a
transaction stops without completing its tasks, we say that it aborts when a transaction is
aborted its execution is stopped and all of its already executed actions are undone by
returning the database to the state before their execution. This is also known as rollback.

Q: Advantage and disadvantages of shared memory?
Any processor has access to any memory module or disk unit through a fast interconnect. Examples of shared memory parallel database systems include DBS3 and Volcano.

Advantages:
The two main advantages are:
• Simplicity
• Load balancing is excellent since can be dynamic.

Drawbacks:The three main disadvantages are:
• Cost of high interconnect
• Low extensibility
• Low availability

Q. Advantage and disadvantages of shared nothing?

In shares-nothing approach each processor has exclusive access to its main memory and disk unit(s). Then each node can be viewed as a local site (with its own database and software) in a distributed database system.
Therefore, most solutions designed for distributed database such as database fragmentation, distributed transaction management and distributed query processing may be reused.
Examples of shared-nothing parallel database systems include the Teradata’s DBC and
Tandem’s NonStopSQL products as well as number of prototypes such as BUBBA,
GAMMA and so on.

Advantages:
Shared-nothing has following advantages:

Cost
Availability and
extensibility

Drawbacks:
Shared-nothing has following disadvantages:

More complex than shared memory
Difficult load balancing

-------------------------------------

Q: Parallel Databases:

Partitioned database is a basis, for "parallel execution" of queries
Design of supporting "Parallel algorithms" is another challenge

For database operators and DB queries (which combine operators)

It is difficult because it required good trade off btw Parallelism and communication cost.
"parallel algorithms" for "relational algebra operators" are the building blocks for parallel query processing.
parallel data processing should exploit the intra-operation parallelism.
"Distributed join algorithms" designed for high speed networks
3 types - basic "Parallel join algorithm" for partitioned database

Parallel join algo:

Simplest algo
composes of cartesian product of relations r and s in ||
arbitrary complex "join predicates" may also supported

1st phase

Each fragment of R is replicate at each node at S
with brodcast it may take m message time, otherwise m*n

2nd phase

Each S-node locally join the R with the fragment Sj.
This is done in || by n nodes.
Local join can be done as in Centralized DBMS.
depend on localy join algo, join processing may or may not be started, as soon as data recieved

Parallel associative join

It applies only for equi-join relation, where 1 relation partitioned on "join attribute"
Assume that "egui join predicate" is on attribute A for R and B from S.
relation S is partioned based on # function h applied to the attribute B.
means that all the tupples of S that have the same value for h(B) are placed at the same node.
No knowledge how R is partitoned, is assumed

1st phase

relation R is sent to S-nodes applying # function on attribute A
this phase is done on ||, where ri exist
tuples of R get distributed but not replicated across the S-nodes.

2nd phase

each S-node j recieves the subset of R(Rj) and
join it locally with the fragment Sj
local "join processing" can be done in ||
to summarize the "|| associative join" algo replace the operator R ⋈ S by
big U(i=1 - n) R ⋈ S

Parallel hash join

generalization of "|| associative join"
also apply on equi-join but doesn't required the particular partitioning of operand relation
idea is to partitioned the relation R and S into no P of mutually excusive sets * fragments
R1,R2,....Rn and S1,S2,....Sn (R ⋈ S)

such that R ⋈ S = big U (i=1 - p) R ⋈ S

partitioning of R and S can be applied on same # fucn applied to join attribute.
each individual join is done in || and join result is produced at P nodes.
P node selection may done on run time depends on system load
diff with "|| associative join" is that partition of S is necessary

and result is produced at P nodes rather than S nodes

===============

Q. Comparison of array processing and multiprocessing with diagram. 15marks.

Parallel Processing basics:-

Multiprocessor Systems

Multiprocessing is the use of two or more central processing units (CPUs) within a single computer system.
One computer with a global RAM shared by many processors and are tightly coupled.
In multiprocessor, thousands of processor is controlled by single operating system, and each processor accessing the same job queue.

Therefore, Synchronization and interprocessor communication is required. Interconnection systems are• Time shared or common bus• Crossbar switches• Multiport-memory systems• Multistage networks

Associative Processors

They are designed to speed up the search of data items in memory. Check all memory blocks in parallel to search for matching memory blocks.

Nearest neighbor mesh

Array Processors

Set of identical Processors synchronized to perform same instruction simultaneously on different data. They are also called SIMD.

Q. Parallel processing advantages and challenges?

Advantages:-

More fault tolerance
Increases throughput
Better price/performance

Challenges:

image processing
large-scale simulation
Weather forecasting etc.

===============

Q. Differentiate Horizontal and vertical partitions?

Vertical Fragmentation:

Different subsets of attributes are stored at different places, like, Table EMP (eId,

eName, eDept, eQual, eSal)

Interests of the local and head offices may result following vertical partitions of this

table: EMP1 (eId, eName, eDept)

EMP2 (eId, eQual, eSal)

Horizontal Fragmentation:

It is based on the localization of data rows of a table are split on multiple sites, like the

data in CLIENT(cAC#, cName, cAdr, cBal)

Table is placed in different databases based on their location, like from Lahore, Pindi,

Karachi, Peshawar, Quetta

Q. Comparison of Pessimistic and optimistic techniques.
The concurrency control mechanisms can be group into two broad classes:
Pessimistic concurrency control methods and Optimistic concurrency control methods.

Pessimistic algorithms synchronize the concurrent execution of transactions early in their execution life cycle, whereas
Optimistic algorithms delay the synchronization of transactions until their termination.

These algorithms can be Basic TO, Multiversion TO, or Conservative TO.

Pessimistic & Optimistic.
Pessimistic approach synchronizes transactions early

Locking-based

Centralized Locking
Primary Copy Locking
Distributed Locking

Timestamp Ordering (TO)

Basic TO
Multiversion TO
Conservative TO

Hybrid

Optimistic do this late in execution life cycle of transactions

Locking-based
Timestamp ordering-based

Locking based Concurrency Control

Basic idea is that data items accessed by conflicting operations are accessed by one operation at a time

Data Items locked by Lock Manager
Two major types of locks,

read lock
write lock

Transaction need to apply lock first.
For improved accessibility, compatibility of locks to be established
Locking is job of DDBMS, not the user
Scheduler is the Lock Manager
TM and LM interact.

Locking based Concurrency Control Algorithm

The locking algorithm will not unfortunately properly synchronize transaction executions.
This is because to generate serializable schedules, the locking and releasing operations of transactions also need to be coordinated.

The timestamp ordering (TO) class involves organizing the execution order of transactions is that they maintain transaction consistency. This ordering is maintained by assigning timestamps to both the transactions and the data items that are stored in the database.

Q. Architecture of parallel database system?

Assuming the client server architecture, parallel database system support functions can be divided into three subsystems, which are very similar as the typical RDBMS.

Depending on the architecture, processor can support all or a subset of these subsystems.

Session Manager

It acts as a transaction monitor and provides support for client-server interaction. It performs connection and disconnection between the client process and the other two subsystems.
It starts and closes the user session. In the case of an OLTP session, the session manager can trigger the execution of the preloaded transaction code with the data manager module.

Request Manager

It receives client requests for query compilation and execution. It has direct access to the database, which contains all the meta information about the data and the program.
The directory itself should be managed as a database in the server. Upon request, it activates various compilation phases, triggers query execution and returns the results and error codes to the client application.

Data Manager

It provides all the low level functions needed to run compiled queries in parallel, i.e.

Execution of DB operations
Transaction management support - Cache management

Important concepts

Q. Types and Classification of Transactions?

Types of Transactions
Transactions have been classified according to a number of criteria one
criterion is the duration of transaction.
Transaction may be classified as
• On-line (short-life)
• Batch (long-life)
Transactions can be classified according to their structure. Four broad categories
– Flat (or simple) transactions
– Closed nested transactions
– Open nested transactions
– Workflow models
Online transactions are characterized by very short execution/response times and by
access to a relatively small portion of the database. Examples are banking and airline
reservation transactions.
Batch transactions are CAD/CAM databases, statistical applications, report
generation, complex queries and image processing.
Another classification is with respect to the organization of the read write actions if
the transactions are restricted so that all the read actions are performed before any
write action, the transaction is called a two-step transaction.
If transaction is restricted so that a data item has to be read before it can be
updated (written), the corresponding class is called restricted (or read-before-write).
If a transaction is both two-step and restricted, it is called a restricted two-step transaction

Q. Transformation rules of rewriting queries?

Commutativity of binary operations
R × S <=> S × R
R ⋈ S <=> S ⋈ R
Associativity of binary operations
( R × S) × T <=> R × (S × T)
(R ⋈ S) ⋈ T <=> R ⋈ (S ⋈ T)

Query processing has four different phases and the first phase is the Query Decomposition. The steps of query decomposition are normalization, analysis, simplification and rewriting.

Q. Replica Transactions?

Distributed 2PL (D2PL) expects the availability of lock managers at each site. If the
database is not replicated, distributed 2PL degenerates into primary copy 2PL algorithm.
If data is replicated the transaction implements the ROWA replica control protocol.
(lock request => operation ) (<= End of Operation => release lock)

Q. Semantic and Structural heterogeneity?

Two types of heterogeneity:
1) Semantic Heterogeneity: same concepts modeled differently in different
schemas: Example: naming, aggregation, generalization, attribute class,
default value, database identifier, schema
isomorphism, missing values
2) Structural Heterogeneity: differences in the representation of corresponding
schema elements
Examples: Data scaling and precision, attribute integrity constraint, objects
included, domain.
Heterogeneities are resolved by applying the mapping on elements. We cannot
ask/expect the component databases to change. Most common semantic is
naming conflict that could be Synonyms and Homonyms.
Synonyms: different names modeling same thing, like in our example schema
ENGINEER/ EMP and Salary/Sal.
Homonym: same name modeling different things, like title here.

Q. Schema Translation and schema integrity?

Schema Translation: involves translating the component schemas into same
data models. It helps to
• Compare schema elements
• Merge them
Choice of Common Data Model (CDM) is important in Schema Translation.
We should prefer the semantic
data models, like E-R or OO. After Schema Translation we perform Schema Integration
(SI). Schema Integration: involves merging component schemas into a common schema
(the global schema). SI involves identifying corresponding schema elements from different
component databases and merging. Hampered mainly be semantic heterogeneities.

Q. Joining Techniques/what are the two methods of joining relations?

There are two methods:
1)Nested loops
2)Merge join
1) Nested loops:
It composes the product of the two relations. For each tuple of the external
relation, the tuples of the internal
relation that satisfy the join predicate are retrieved one by one to form the
resulting relation. An index on the
join attribute is very efficient access path for internal relation. In the absence of an
index, for relations of n1 and n2 pages resp. this algorithm has a cost proportional
to n1*n2 which may be prohibitive if n1 & n2 are high.
2) Merge join:
If consists of merging two sorted relations on the join attribute as shown in figure 1.
Indices on the join attribute may be used as access paths. If the join criterion is equally
the cost of joining two relations n1 and n2 pages, resp. is proportional to n1+n2. this
method is always chosen when there is an equi join, and when the relations are
previously sorted.

Learning & Entertainment

Pages

9 Sept 2018

Important Topics

2 Sept 2018

Imp Question and reference

31 Aug 2018

Important Questions - 712

Questions List:

Q: Advantage and disadvantages of shared memory?

Question Prepared:

7 Aug 2018

Final Terms - Cs712

Q: Parallel Databases:

Parallel associative join

Parallel hash join

Important concepts

Google+ Follow

Blog Archive