28 May 2018

Solution:

1. Write a note on fragmentation. Its advantages and disadvantages.

Note
A  file or  a  table  is broken down  into  smaller parts/sections  called  fragments.  and  those fragments are stored at different  locations. A table can be fragmented horizontally (row-wise) or vertically (column-wise).  Hence we have two major types of fragmentations horizontal and vertical. Different fragmentations of a table are placed at different locations. 
Adv
  • The  basic  objective  of  fragmentation  and  placement  at  different  places  is  to maximize the local access and to reduce the remote access since the later causes cost and delay. 
  • The concept of fragmentation should be kept hidden from the user.
  • Increases the level of concurrency
  • System throughput
  • Localization of Data and Local Autonomy
  • Distribution of data allows fast access of data due to localization
  • Parallel execution of queries
Efficiency:
Data is stored close to where it is most frequently used. In addition, data that is ,not needed by' local applications is not stored.

Parallelism:
With fragments as the unit of distribution, a transaction can be divided into several sub queries that operate on fragments. This should increase the degree of concurrency, or parallelism, in the system, thereby allowing transactions that can do so safely to execute in parallel.

Dis Adv:

  • Difficult to manage in case of non-exclusive Fragmentation (replication)
  • Maintenance of Integrity constraints.
  • Increased time/NW cost of centralized access
  • Increased chances of failure
  • Increased failure damage
Security:
Fragmentation has two primary disadvantages, which we have mentioned previously:

Performance:
The performance of global application that requires data from several fragments located at different sites may be slower.

Integrity:
Integrity control may be more difficult if data and functional dependencies are fragmented and located at different sites.

1.1 Vertical and horizontal fragmentation ?

1.2 Rules for correctness.

1.3.1 If fragments can change DDBMS semantic's ?

1.3.2 Write a note on completeness and reconstruction of VF ?lec: 13 - 20

Explained in 1.2

1.4 How can we verify the properties of completeness and reconstruction of a vertical fragmentation?(5)

1.5 How/What completeness of horizontal fragmentation is verified ?

Completeness

  • The completeness of a PHF is based on the selection predicates used. 
  • The selection predicates are complete; 
    • the resulting fragmentation is guaranteed to be complete as well. 
  • Since the basis of fragmentation algorithm is a 
    • set of complete and minimal predicates, 
  • Pr’, completeness is guaranteed as long as no mistakes are made in defining Pr’. 

  • The completeness of DHF is more difficult to define. 
  • The difficulty is due to the fact that the predicate determining the fragmentation involves two relations. 
    • let R be the member relation of a link whose owner is relation S, 
      • which is fragmented as Fs = {S1, S2, ….., Sn}
    • Let A be the join attribute between R and S then for each tuple t of R, there should be a tuple t’ o S such that t[A] = t’[A]
Reconstruction
Reconstruction of a global relation from its fragments is performed by union operator in both
primary and derived horizontal fragmentation.
Thus for a relation R with fragmentation FR = {R1, R2, …, Rn}
R = U Ri,    a'Ri    FR
Disjointness
It is easier to establish disjointness of fragmentation for primary than for derived horizontal
fragmentation.

  • Disjointness is guaranteed as long as the 
    • minterm predicates determining the fragmentation are mutually exclusive. 
  • In DHF, disjointness can be guaranteed if the join graph is simple. 
    • If it is not simple, it is necessary to investigate actual tuple values.

1.6 About Horizontal Fragmentation, Owner and member relation

Explained in 1.7

1.7 Differentiate between VF and HF OR(PHF or VHF)

Primary Horizontal Fragmentation is a table fragmentation technique in which we fragment a single table and this fragmentation is row-wise and using a set of simple conditions.
  • Given: A relation R, the set of simple predicates Pr
  • Output: The set of fragments of R = {R1, R2,…,Rw} which obey the fragmentation rules. 
  • Preliminaries :
    • Pr  should be complete
    • Pr  should be minimal


A PHF is defined by a selection operation on the owner relations of the database schema
Given a relation R, its horizontal fragments are

An important aspect of simple predicates is their completeness, another is their minimality. A set of
simple predicates Pr is said to be complete if and only if there is an equal probability of access
by every application to any tuple belonging to any minterm fragment that is defined according to
Pr.

DHF - Derived Horizontal Fragmentation
 It is defined on a member relation of a link according to a selection operation specified on
its owner. Two important points are:

  • Each link between owner and member is defined as an equi-join.
  • Equi-join can be implemented by means of semi-joins

So we are interested in defining the partitions of member based on fragmentation of its owner,
but want to see attributes only from member


 

The two fragments PAY1 and PAY2 of PAY and EMP1 and EMP2 of EMP are defined
as follows:

  • PAY1 = ... SAL >30000 (PAY)
  • PAY2 = ... SAL ≤ 30000 (PAY)
  • EMP1 = EMP ⋉ PAY1
  • EMP2 = EMP ⋉ PAY2

The result of this fragmentation is depicted in figure 2.

3. Write a note on need for distributed database. Why it's more beneficial? What is meant by optimality?

Two Most reasons are:
  • large no of users.
  • Users are physically spread across large geographical area. 

4. What are complete predicates? Minterm predicates?

Completeness of Simple Predicates 

A set of simple predicates Pr is said to be complete if and only if

  • The accesses to the tuples of the minterm fragments defined on Pr requires:-
    • That two tuples of the same minterm fragment have 
    • the same probability of being accessed by any application.
Minterm predicates is Boolean combinations of simple predicates.

  • It is conjunction of the simple predicates.
  • It is always possible to transform a Boolean expression into conjunction normal form
  • The use of minterm predicates in the design algorithms does not cause any loss of generality.
Given a set of simple predicates
Pr = {pr1, pr2, …, pm}for a relation R, the set of minterm predicates M is defined as M =
{m1, m2, ……, mz}
 



Consider relation PAY(title, sal). The following are some of the possible simple predicates that
can be defined on PAY.

Minimality of Simple Predicates

  • If a predicate influences how fragmentation is performed, 
    • (i.e., causes a fragment f to be further fragmented into, say, fi and fj)
    • then there should be at least one application that accesses fi and fj differently.
  • In other words, the simple predicate should be relevant in determining a fragmentation.
  • If all the predicates of a set Pr are relevant, then Pr is minimal.

Example :

Pr ={LOC=“Montreal”,LOC=“New York”, LOC=“Paris”, BUDGET≤200000,BUDGET>200000}
is minimal (in addition to being complete). However,
if we add PNAME = “Instrumentation” then Pr  is not minimal.

5. Write the formula to calculate affinity matrix? - lec 18

AM is a single value that is calculated on the basis of positions of elements in AA and
their surrounding elements.


5.1 Draw CA(clustered affinity ) along with Affinity matrix ( lect 19 )

5.2 To make affinity matrix. Describe the contribution of moving attribute A4 between A1 and A2.

5.3 Affinity matrix for given queries, and contribution of moving column 3 between column 1 and column 2.

5.1 to 5.3 explained in 5.4

5.4 Attribute Usage Matrix for given queries

PROJ(jNo, jName, budget, loc)

q1: SELECT BUDGET FROM PROJ WHERE JNO=Value
q2: SELEC JNAME, BUDGET FROM PROJ
q3: SELECT JNAME  FROM PROJ WHERELOC=Value  
q4: SELECTSUM(BUDGET)  FROM PROJ WHERE LOC=Value

Let A1= jNo,  A2= jName, A3= budget,  A4= loc
If Attribute exist in query add 1 against the query otherwise, 0.


Attribute affinity matrix
  • aff(Ai,Aj) = ∑ ∑ refl(qk) accl(qk) 
    • Where refl(qk) is no of accesses to attributes (Ai,Aj) for each execution of qk at site Sl
    • and accl(qk) is application access frequency measures from Si
  • acc1(q1) = 15, acc2(q1) = 20,  acc3(q1) = 10
  • acc1(q2) = 5,   acc2(q2) = 0,    acc3(q2) = 0
  • acc1(q3) = 25, acc2(q3) = 25,   acc3(q3) = 25
  • acc1(q4) = 3,   acc2(q4) = 0,     acc3(q4) = 0
suppose refl(qk) = 1 


It will measure between only two attributes at a time.
Aff value will be calculated only if any query access both attributes at a time.
F:eg
aff(A3, A4)
= ∑k = 4 ∑l =1..3 refl(qk)accl(qk) - here 4means query 4 where both have 1 in left matrix
= 3 *1 + 0 + 0 = 3 - here add all values of query 4 aftr * with refl(qk) value 1
aff(A1, A2) = 0,  Since no qi accesses them both

aff(A2, A2) =  5 * 1 + 0  + 0  = 5
25 * 1 + 25 *1 + 25 * 1 = 75 + 5 = 80

Bond Energy Algorithm
Algo

  • Input AA
  • Output CA

-Begin
CA (•, 1) ← AA (•, 1)
CA (•, 2) ← AA (•, 2)
Index ← 3
--While index ≤ n do - n as no of columns
Begin
For (I=1  I=<index  –1 i++)
---do
       calculate cont(Ai-1, Aindex, Ai)    
---end-for
Calculate cont(Aindex-1, Aindex, Aindex+1)
Loc ← placement given by maximum cont value
For (j=index j<=loc-1 i--)
---do
    CA (•, j) ← AA (•, j-1)
---end-for
 CA (•, loc) ← AA (•, index)
Index ← index + 1
--End-while
Order the rows according to relative order of columns
-End {BEA}

  • Ordering (0-3-1) – means A3 on left of A1
  • Ordering (1-3-2) – means A3 in between A1 and A2
  • Ordering (2-3-4) – means A3 on right of A2
Ordering (0-3-1)

cont (A0, A3, A1) = 2 bond(A0, A3) + 2 bond(A3, A1) – 2bond(A0, A1)
  • bond(A0, A3) = 0
  • bond(A0, A1) = 0 
  • bond(A3, A1) = ∑(z=1 to 4)aff(Az, A3)aff(Az, A1)
    • =
    • aff(A1, A3)aff(A1, A1) + aff(A2, A3) aff(A2, A1) + aff(A3, A3)aff(A3, A1) + aff(A4,A3)aff(A4, A1)
    • = 45 * 45 + 5 * 0 + 53 * 45 + 3 * 0 = 4410 
  • Thus
cont(A0, A3, A1) = 2 bond(A0, A3) + 2 bond(A3, A1) - 2bond(A0, A1)
  • = 2 * 0 + 2 * 4410 – 2 * 0
  • = 8820
Ordering (1-3-2)
cont (A1, A3, A2) = 2 bond(A1, A3) + 2 bond(A3, A2) - 2bond(A1, A2) 
  • bond(A1, A3) = bond(A3, A1) = 4410
  • bond(A3, A2) =0+400+265+225= 890 
  • bond(A1, A2) =0+0+45*5= 225
cont(A1, A3, A2) = 2*4410+2*890–2*225
  • = 8820 + 1780 – 450
  • = 10150
Ordering (2-3-4)
cont (A2, A3, A4) = 2 bond(A2, A3) + 2 bond(A3, A4) - 2bond(A2, A4) bond(A2, A3) = 890
bond(A3, A4) = 0
bond(A2, A4) = 0
cont (A2, A3, A4) = 2 * 890 + 0 + 0 = 1780
  • Ordering (0-3-1) = 8820
  • Ordering (1-3-2) = 10150
  • Ordering (2-3-4) = 1780
Compute the contribution by placing 4th attribute at different places in CA, We need to work out
  • Ordering (0-4-1)
  • Ordering (1-4-3)
  • Ordering (3-4-2)
  • Ordering (2-4-5)

6. Explain (A1, D2, H1) (A1, D0, H1) architectures of DBMS.

Autonomy (0 1 2)

  • Tight hydrogenation
  • semi autonomous
  • total isolation

Distribution

  • 0 No distribution
  • 1 client/server
  • 2 Peer to Peer

Heterogeneity

  • 0 homogeneous
  • 1 Heterogeneous


(A1,D2,H1)
A1(semi autonomous)
D2 (Peer to Peer)
H1 (Heterogeneity)

(A1,D0,H1)
A1 (semi autonomous)
D0 (No distribution)

H1 (Heterogeneity)

7. What is autonomy? What are its types? How it's classified?

Autonomy
Degree to which member databases can operate independently Autonomy  is  how  much  independently  the  member  databases  perform  their  functionality  or make their decision.
Design Autonomy   
Freedom for designer to make decision related to design, e.g., which data model he wants to design the database in, which DBMS he thinks is suitable for the organization.  It may also include the decisions like machine to buy or operating system to be based on.
Communication Autonomy
What part of data we want to share.  It depends on the nature of bond/coupling between member databases. If they are independent or are different organization, then obviously they own their data and can decide independently. However, if there is some strong bond between the member databases or they belong to the same organization then the member DBA does not have much autonomy  and  it  is decided  centrally  somewhere what  to be contributed  from  a member database and what not.
Execution Autonomy
Component  databases  the  authority  for when  to  share  the  data when  to  shut  down  etc. Again in case of same organization, the rules are formulated at the head office level and all member databases have to follow that, otherwise they are autonomous to make this decision.
Classification 
One possible classification of DDBMSs follows:
Tight integration
  • All users get a single common view with a feeling that there are no component databases.  
Semi-Autonomous
  • Component database  are present  and  they  relate  their data  and  at  certain  time  they  share  their data to form a common view.  
Total Isolation
  • All the systems are stand alone.

8. How we remove transitive dependency?

Transitive Dependency: If for a relation R, we have FDs a àb and b à c then it means that a à c, where b and c both are non-key attributes.

9.  Describe software component of replication? ( lect 21 )

Software components used in replication are of 4 types:
  1. Snapshot
  2. Merge
  3. Distribution
  4. Log Reader.
Snapshot:
  • Runs at least once in all reps.
Merge:
  • Meld changes from different servers made since last snapshot.
Distribution:
  • Performs different activities, mainly distributes publication to subscribers.
Log reader:
  • Used in transactional rep, reads log files of all servers involved.
Push/Pull subscriptions; depends where the distributor is running.

9.1  Name the software used for replication (5)


SQL Server multiple instances used as publish-subscribe model. F eg: Personal Edition of SQL Server.
Additionally software used SS Agents and its component:
  • Snapshot
  • Merge
  • Distribution
  • Log Reader.

9.2  Discuss the disadvantages of replication ?

Storing a separate copy of database at each of two or three sites is called replication
Advantages
  • Reliability.
  • Fast response.
  • May avoid complicated distributed transaction integrity routines (if replicated data is refreshed at scheduled intervals.)
  • De-couples nodes (transactions proceed even if some nodes are down.)
  • Reduced network traffic at prime time (if updates can be delayed.)
Disadvantages
  • Additional requirements for storage space.
  • Additional time for update operations.
  • Complexity and cost of updating.
  • Integrity exposure of getting incorrect data if replicated data is not updated simultaneously.
  • Therefore, better when used for non-volatile data.

9.3 Describe SQL Server tools involves In replication.

  • Personal Edition of SQL Server.
  • Replication uses a publish-subscribe model for distributing data.
  • Publication is group of related data and objects to replicate together.
    • Publisher is a server that is the source of data to be replicated.
  • Subscriber is a server that receives the data replicated by the publisher.
    • Subscriber defines a subscription to a particular publication.
  • Distributor is a server that performs various tasks when moving articles from Publishers to Subscribers. 

9.4 Differentiate between fully and partially databases

Same as 9.5

9.5 Differentiate b/w Non-Replicated Databases and Partitioned Database (5 marks)

A non-replicated database (called a partitioned database) contains fragments that are allocated to
sites, and there is only one copy of any fragment on the network.

  • In case of replication, The database exists entirely at each site (fully replicated database) or 
  • Fragments are distributed to the sites in such a way that copies of a fragment may reside in multiple sites (partially replicated database)

10. Why distributed database system’s performance is better than centralized database system.(5)

Centralized database System:
Data management is carried on a single centralized system. However this data is accessed from different machine (clients). All machines are connected with each other through a communication link (network).In the centralized system the data is store and manage on the server,the data is associated with a single site, this site is basically the Server, rest of the machines are accessing data from the Server.

The Distributed Database System:

The data is managed/manipulated at multiple sites in a DDBS. There are many different architectures of a DDBS. There are a number of local DBMSs called local nodes. Each local node works independently serving multiple users that are connected to it, these users are called local nodes. At the same time, the local nodes are connected with each other. A layer is superimposed on top of all these connected local DBMSs and that layer is the DDBMS.

Improved Performance

As the data is located near the site of 'greatest demand', and given the inherent parallelism of distributed DBMSs, speed of database access may be better than that achievable from a remote centralized database. Furthermore, since each site handles only a part of the entire database, there may not be the same contention for CPU and I/O services as characterized by a centralized DBMS.

11. Key differences between LAN and WAN

LAN (Local Area Network):
  • Small geographical area
  • High Band width
  • Low latency 
    • it is optimized to process a very high volume of data messages with minimal delay (latency)
WAN (Wide Area Network):
  • Used inter-city country or even continental
  • gives low bandwidth high latency. (Delay) more delay comparative to other Networks.
  • Can be Broadcast and Point to Point.

12 DDBMS Components their working and Interfacing

Figure 2: Components of a Distributed DBMS (65)
  • User processor and Data processor are two major components of distributed DBMS.
  • One component handles the interaction with users and other deals with the storage. 
User Processor - It consists of 4 elements.
  • The user interface handler is responsible for interpreting user commands and formatting the result data as it is sent to the user.
  • The semantic data controller uses the integrity constraints and authorizations that are defined as part of the global conceptual schema to check if the user query can be processed.
  • The global query optimizer and decomposer determines an execution strategy to minimize a cost function and translates the global queries into local ones using global and local conceptual schema as well as global directory. The global query optimizer is responsible for generating the best strategy to execute distributed join operations.
  • The distributed execution monitor coordinates the distributed execution of the user request. The execution monitor is also called the distributed transaction manager.
Data processor - It consists of 3 elements.
  • The local query optimizer, which actually acts as the access path selector, is responsible for choosing the best path to access any data item.
  • The local recovery manager is responsible for making sure that the local database remains consistent even when failures occur.
  • The run-time support processor physically accesses the database according to the physical commands in the schedule generated by the query optimizer. The run-time support processor is the interface to the operating system and contains the database buffer manager which is responsible for main memory buffers and data accesses.

12.2 Independence of Distributed Database

  1. Local Independence
  2. Fragmentation Independence
  3. Replication Independence
  4. Hardware Independence
  5. Operating System Independence
  6. Network independence
  7. DBMS Independence
1. Local Independence (Location Transparency)
Users should not have to know where data is physically stored, but logically they percieved as the data were all stored at their own local site.

2. Fragmentation Independence (Fragmentation Transparency)
Fragmentation independence refers to the ability of end users to store logically related information at different physical locations.
The user sees only one single logical database. Data fragmentation is transparent to the user. The user does not need to know the name of the database fragment in order to retrieve them.
Note: A system supports data fragmentation if a given Table can be divided up into pieces or fragments for physical storage purposes.
Note: Fragmentation is desirable for performance reasons: Data can be stored at location where it is most frequently used, so that most operations are local and network traffic is reduced.

3. Replication Independence (Replication Transparency)
The user sees only one single logical database.
The DDBMS transparently access the database fragment.
The DDBMS manages all fragments transparently to the user.

Note: the existence of fragment copies can enhance data availability and response time, data copies can help to reduce communication and total query costs.

4. Hardware Independence
The system must run the same DBMS on different hardware platforms.

5. Operating System Independence
The system must run the same DBMS on different operating system Platforms including different operating systems on the same hardware and have (e.g.) a UNIX version and an NT version all participate in the same distributed system.

6. Network independence
The system must support a variety disparate communication networks.

7. DBMS Independence
The system must support any vendor's database product.
The DBMS instances at different sites all support the same interface - they do not necessarily all have to be copies of the same DBMS software.
The ideal distributed system should be heterogeneous (multiple forms) and should provide DBMS independence.

12.3 Write ways of user access data from DDBMS?

In a DDBS environment, three types of accesses are involved:
  • Local access: the access by the users connected to a site and accessing the data from the same site.
  • Remote access: a user connected to a site, lets say site 1, and accessing the data from site 2.
  • Global access: no matter from where ever the access is made, data will be displayed after being collected from all locations.
A user does not know from where he is getting the data. To the user it appears that the data is present on the machine on which he is working.

12.4 Inputs needs for distributed database?

  • Data requirements or querry
  • How the data will be gether

12.5 How the distributed databases are connected?

A distributed database system allows applications to access data from local and remote databases. In a homogenous distributed database system, each database is an Oracle Database. In a heterogeneous distributed database system, at least one of the databases is not an Oracle Database. Distributed databases use a client/server architecture to process information requests.

12.6 Global schema importance in DDBS.

  • The global conceptual schema is a logical description of the whole database. 
  • The conceptual level of the ANSI-SPARC architecture
  • It contains definitions of entities, relationships, constraints, security and integrity information.
  • It provides physical data independence from the distributed environment. 
  • Whereas, the global external schemas provide logical data independence.

13.  Star and bus topology
13.1 Differentiate between Bus and ring Topology

Look at 14

14. classification criteria of computer network.

The classification based on the transmission mode is 
  • Point to point (unicast) 
  • Broadcast (multi-point) 

Point to point 
One or more links between sender and receiver Link may be direct or through switching. 
Broadcast 
Common channel utilized by all nodes. 
Message received by all ownership checked. 
Generally radio or satellite based or microwave.

14.1 classify the computer networks?

Computer network A computer network is a system for communication between two are more computer, where computers are Interconnected and Autonomous. Computers are called nodes, sites, host. Other equipment at nodes. i.e. they may or may not be computers. Equipment connected via links and channels .

Types of Network:
we can classify networks on different bases, following are the criteria according to which we can classify the networks: Interconnection structure ------- topology. Transmission mode. Geographical distribution. Based on the interconnection structure there are following categories of the networks 1) Star 2) Bus 3) Ring 4) Mesh
Star topology: Many home networks use the star topology. A star network features a central connection point called a "hub" that may be an actual hub or a switch. Devices typically connect to the hub with Unshielded Twisted Pair (UTP) Ethernet. Compared to the bus topology, a star network generally requires more cable, but a failure in any star network cable will only take down one computer's network access and not the entire LAN. (If the hub fails, however, the entire network also fails.)
Ring network: In a ring network, every device has exactly two neighbors for communication purposes. All messages travel through a ring in the same direction (effectively either "clockwise" or "counterclockwise"). A failure in any cable or device breaks the loop and can take down the entire network. Rings are found in some office buildings or school campuses.
Bus network: Bus networks use a common backbone to connect all devices. A single cable, the backbone functions as a shared communication medium, that devices attach or tap into with an interface connector. A device wanting to communicate with another device on the network sends a broadcast message onto the wire that all other devices see, but only the intended recipient actually accepts and processes the message.

Meshed network: Mesh topologies involve the concept of routes. Unlike each of the previous topologies, messages sent on a mesh network can take any of several possible paths from source to destination. (Recall that in a ring, although two cable paths exist, messages can only travel in one direction.) Some WANs, like the Internet, employ mesh routing.

15. what is Primary and secondary key n composite n super key difference?

Super Key: An attribute or set of attributes whose value can be used to uniquely identify each record/tuple in a relation is super key. For example, in the EMP table ENo is a super key since its value identifies each tuple uniquely. However, ENo, EName both jointly are also the super key, likewise other combinations of attributes with ENo are all super keys.
A composite key is a combination of two or more columns in a table that can be used to uniquely identify each row in the table when the columns are combined uniqueness is guaranteed, but when it taken individually it does not guarantee uniqueness.
Candidate Key: is the minimal super key, that is a super key whose no proper subset itself is a super key. Like in the above example ENo is a super key and a candidate as well. However, the ENo, EName jointly is super key not the candidate key as it has a proper subset (ENo) that is a super key.
Primary key: The successful/selected candidate key is called the primary key, remaining candidate keys are called alternate keys.
Secondary key is the attribute(s) that are used to access data from the relation but whose value is not necessarily unique, like EName.

16. What is the purpose of integrity rules? How they are enforced on the database?(5 marks)

Integrity rules are applied on data to maintain the consistency of the database, or to make it sure that the data in the database follows the business rules. There are 2 types of integrity rules
Structural Integrity Constraints
  • Depends on data storage 
  • Declared as part of data model 
  • Stored as part of schema 
  • Enforced by DBMS itself 
Behavioral Integrity Constraint:
  • These are the rules/ semantics of the business 
  • Depends on business 
  • Enforced through application program 

17. components of relational data model.

A data model that is based on the concept of relation or table. The concept of relation has been taken from the mathematical relation. In the databases, the same relation represented in a two dimensional way is called table, so relation or table represent the same thing. The RDM has got three components:
  1. Structure à support for storage of data and RDM supports only a single structure and that is a relation or a table.
  2. Manipulation language à The language to access and manipulate data from the database. The SQL (Structured Query Language) has been accepted as the standard language for RDM. The SQL is based on relational algebra and relational calculus.
  3. Support for integrity constraints: The RDM support two major integrity constraints such that they have become a part of the data model itself. It means that any DBMS that is base on the RDM must support these two constraints and they are
    1. Entity integrity constraint
    2. Referential integrity constraint 

Solution:

Q1: Steps of Project Planning. Explain software scope. (p20)

Steps of Project Planning
• Software scope 
• Estimation 
• Risk 
• Schedule 
• Control strategy

Software Scope:
• Understand the problem and the work that must be done – in a nutshell
• Software scope describes the data and control to be processed, function, performance, constraints, interfaces, and reliability
• Project scope must be unambiguous and understandable at the management and technical levels
• A statement of software scope must be bounded – in other words
• At the beginning of a project, things are very hazy and nothing is clear
• Good and open communication is required between developers and customer to define the scope of the project
• Who is behind the request for this work?
• Who will use the solution?
• What will be the economic benefit of a successful solution?
• Is there another source for the solution?

Mid - Important Questions

1. Write a note on fragmentation. Its advantages and disadvantages.
1.1 Vertical and horizontal fragmentation ?
1.2 Rules for correctness.
1.3.1 If fragments can change DDBMS semantic's ?
1.3.2 Write a note on completeness and reconstruction of VF ?
1.4 How can we verify the properties of completeness and reconstruction of a vertical fragmentation?(5)
1.5 How/What completeness of horizontal fragmentation is verified ?
1.6 About Horizontal Fragmentation, Owner and member relation.
1.7 Differentiate between VF and HF OR(PHF or VHF)
1.8 Do PHF of empky table Location wise
3. Write a note on need for distributed database. Why it's more beneficial? What is meant by optimality?
4. What are complete predicates? Minterm predicates?
5. Write the formula to calculate affinity matrix?
5.1 Draw CA(clustered affinity ) along with Affinity matrix ( lect 19 )
5.1.1 Convert AA into CA (10 marks)
5.2 To make affinity matrix. Describe the contribution of moving attribute A4 between A1 and A2.
5.3 Affinity matrix for given queries, and contribution of moving column 3 between column 1 and column 2.
5.4 Attribute Usage Matrix for given queries
5.5 (Case)
Consider the relation Student (stID, stName, stAdr,prName,curSem)
Three application access the relation through following three queries respectively
q1. Select stID, stName, curName, from student
q2. Select stID, stName, stAdr from student where stId = value
q3. Select stName,prName from Student
Find the affinity matrix

6. Explain (A1, D2, H1) (A1, D0, H1) architectures of DBMS.
7. What is autonomy? What are its types? How it's classified?
8. How we remove transitive dependency?
9.  Describe software component of replication? ( lect 21 )
9.1 Name the software used for replication (5)
9.2 Discuss the disadvantages of replication ?
9.3 Describe SQL Server tools involves In replication.
9.4 Differentiate between fully and partially databases
9.5 Differentiate b/w Non-Replicated Databases and Partitioned Database (5 marks)
10. Why distributed database system’s performance is better than centralized database system.(5)
11. Key differences between LAN and WAN
12 DDBMS Components their working and Interfacing
12.2 Independences of Distributed Database
12.3 Write ways of user access data from DDBMS?
12.4 Inputs needs for distributed database
12.5 How the distributed databases are connected.
12.6 Global schema importance in DDBS.
13.  Star and bus topology
13.1 Differentiate between Bus and ring Topology
14.classification criteria of computer network.
14.1 classify the computer networks?
15. what is Primary and secondary key n composite n super key difference?
16. What is the purpose of integrity rules? How they are enforced on the database?(5 marks)
17. Components of relational data model.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.

MCQS

The database environment has all of the following components except: ___Users_____
A table or file is broken down in different parts and stored at different locations is called ___Fragmentation___
The _____Set-difference______ operation, denoted by " − ", allows us to find tuples that are in one relation but are not in another.
Physical or logical arrangement of network is____Topology______
Database architecture can be discussed in ______3________ dimensions.
Which term is used with distributed databases storage __Both Replication and Fragmentation__ ?
For derived horizontal fragmentation _____3____ inputs are required.
Bond energy algorithm has been designed to determine ______Groups_____.
The computation time of the Clustering algorithm is ______On2________.
Hybrid Fragmentation combines _Horizontal, vertical fragmentation_.

Important Topics

Reasons for fragmentation (70) 13

Fragmentation independence refers to the ability of end users to store logically related information at different physical locations. There are two types of fragmentation independence: vertical partitioning and horizontal partitioning. Horizontal partitioning permits different rows of the same table to be stored at different remote sites. This is commonly done by organizations that maintain several branch offices, each with an identical set of table structures. 
Vertical partitioning refers to the ability of a distributed system to fragment information such that the data columns from the same logical tables are maintained across the network. Oracle accomplishes this with Oracle views that hide specific columns and rows in a table.


Generation of clustered affinity matrix (CA) is done in three steps:

1.  Initialization. Place and fix one of the columns of AA arbitrarily into CA. Column 1 was
chosen in the algorithm.

2.  Iteration. Pick each of the remaining n-I columns (where I is the number of columns
already placed in CA) and try to place them in the remaining i+1 positions in the CA
matrix. Continue this step until no more columns remain to be placed. 
3.  Row ordering. Once the column ordering is determined, the placement of the rows should
also be changed so that their relative positions match the relative positions of the
columns. 


COM_MIN Algorithm 
Given: a relation R and a set of simple predicates Pr

Output: a complete and minimal set of simple predicates Pr' for Pr

Rule 1: a relation or fragment is partitioned into at least two parts which are accessed differently by at least one application.

Reasons OF DDBMS

  • Local units want control over data.
  • Consolidate data for integrated decisions
  • Reduce telecommunication costs.
  • Reduce the risk of telecommunication  failures.

Vertical Fragmentation  and Vertical Fragmentation (85)

Distributed Database Design  (68-70)

Mid - Important Questions

Q1: Steps of Project Planning. Explain software scope. 5marks
Q 2: What is the objective of Controlling the Process?
Q 31:What is Process model? What are its variations and types of process model?
Q 35: Describe Process Management?

Q 17: What are objective of Technical Review and which terms are not measured by Technical Review?

Q 11: Explain W, A and U process model? Q: Levels of Software Process Models?


Q9: Give EVTX model of Rework Activity in Inspection. 5 Marks
Q14: Explain “Data recording and report” in inspection activity with ETVX model. 5 Marks 
Q 7: Write the entry criteria of EVTX model of preparation activity in an inspection process?
Q 27: ETVX of rework of software inspection process?

Q10: Explain organization focus.
Q11: What is continuous and staged representations and their advantages.

Q15: What are attributes of process to choose or satisfy the CMMI model? 5 marks

Q6: What is KPA process definition of CMMI? 5 marks 
Q18: What are the attributes of processes according to CMMI?  

Q 1: What are PSP measures?
Q 19: Key Process Areas for PSP?
Q 39: Time management strategies in PSP

Q 15: Describes the basic element of process architecture of unit cell?
Q 16: Explain Deming’s approach?

Q 33: What are benefits of IDEF0?
Q 34: What are the rules of IDEF0?