An Electronic System for Summer Training Students Distribution in Organizations with Comparative Study of Association Rule Algorithms

The training of students is considered as one of the most promising forms of training to inform students with the reality of practical environment and what they require from serious and exact work. It may give the chance to other public sector organizations to be acquainted with the students' abilities and skills, in addition to the benefits of informing youths to join summer vocation. In order to solve the problem of students distribution to organizations and guarantee the equivalency between students desires and the capacity of governmental and privates offices, some algorithms were used to mine up data to uncover essential hidden relationships with huge data, & Distributed Database has been designed for summer training . The data mining were also used to set reports that may refer to the delicate number of students required for training according to the specializations in the four departments of the College of Administration and Economics (application environments) with the number of nominee students for training in these departments using (oracle 11g.).


Introduction
The practicum center of University of Mosul responsible with the placement of students in the industry for the internship program, it is experiencing difficulty in matching organization's requirement with the student profile for several reasons .This situation could lead to a mismatched between organization requirement and students' background , Hence students will face problems in giving good service to the company On the other hand, companies too could be facing difficulties in training the students and assigning them with a project, so we built Database for an integrated Summer Training Students, with integrated computer system that handles all of all summary training students Information .Distributed Database (DDB) technology emerged as merger of two technologies database technology and data communication technology.These systems have started to become the dominant data management tools for highly accessed data.Distributed database (DDB) is a collection of multiple logically related database distributed over a computer network [1] as shown in figure (1), and a distributed database management system as a software system that manages a distributed database [2] while making the distribution transparent to the user.Consequently, an application can simultaneously access and modify the data in several databases in a network.Data may be replicated over a network using horizontal and vertical fragmentation similar to projection and selection operations in Structured Query Language (SQL).

Figure(1):Distributed Database
There are two main types of distributed databases is homogeneous database & is heterogeneous database The proposed program database using Oracle will achieve integration in the data for all section and continuous updating of the query.Oracle Database is the industry foundation for high performance, scalable, and optimized data warehousing.Oracle Expand data Database Machine is a complete hardware and software solution that delivers extreme performance and database consolidation for data warehousing.A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing.It usually contains historical data derived from transaction data, but can include data from other sources [3].In general, the data warehouse is maintained separately from operational databases of the organization for several reasons [4].A data warehouse (DW) is a special database used for storing business-oriented information for future analysis and Decision-making [5].we use data warehouse in order to provide integrated administrative information for planning and reporting purposes, This DW will be suitable to carry out statistical.

Columnar database management system (CDBMS)
A columnar database is a database management system (DBMS) as shown in figure (2) that stores data in columns instead of row [6].Generally speaking, a row-orientated focus is preferable for online transaction processing (OLTP) systems and column-oriented focus is preferable for online analytical processing (OLAP) systems [7][8].

Figure (3): CDBMS
One of the main benefits of a columnar database is that data can be highly compressed.The compression permits columnar operations like MIN, MAX, SUM, COUNT and AVG to be performed very rapidly [10].
Another benefit is that because a column-based DBMSs is self-indexing, it uses less disk space than a relational database management system (RDBMS) containing the same data as shown in figure (4) [11].

OLAP
Dr. E.F.Codd introduced the term OLAP (Online Analytical Processing) in 1993 [12].The objective of the OLAP is to facilitate solving data analysis problems and accurate decision-making.(OLAP) applications and tools are those that are designed to ask "complex queries of large multidimensional collections of data" [13] to provide quick access to strategic information for the purposes of advanced analysis [14].some additional operators that are more common in OLAP tools, called OLAP operators can be describes in [15] : 1) Drill: de-aggregates.2) Roll: aggregates.3) Slice & dice 4) Pivot

OLAP Technology in the Oracle Database
Oracle Database offers the industry's first and only embedded OLAP server.Oracle OLAP provides native multidimensional storage and speed-of-thought response times when analyzing data across multiple dimensions.The database provides rich support for analytics such as time series calculations, forecasting, advanced aggregation with additive and no additive operators, and allocation operators.These capabilities make the Oracle database a complete analytical platform, capable of supporting the entire spectrum of business intelligence and advanced analytical applications [16].The Oracle OLAP option includes the following features [17]: MOLAP systems are much faster in terms of data aggregation and in terms of queries, however, generates large volumes of data hedge.Response time the query is improved because of pre-aggregated summaries of such data and responses to queries are prepared before launching the application [20].The MOLAP data store is built specifically to handle multidimensional queries as shown in figure (5) and offers fast, efficient, and manageable access to multidimensional data [21].optimal for slicing and dicing operations.e) Can perform complex calculations: All calculations have been regenerated when the cube is created.Hence, complex calculations are not only doable, but they return

Association rule algorithms
Association rules are used to find the frequent pattern, association or correlation in a transaction database.Association rule mining can be used in Basket Data Analysis, Educational Data mining, Classification, Clustering etc.The association Rule algorithm is Apriori, sampling, partitioning & Parallel Algorithm.[24].

Apriori Association Rule
The Apriori algorithm was first proposed by Agrawal [25] .It uses prior knowledge of frequent tools for association rule mining.The basic idea of the Apriori Algorithm is to generate frequent item set of a given dataset and then scan the dataset to check if their counts are really large the process is iterative and candidates of any pass are generated by joining frequent item set of the proceeding pass .Apriori is a confidence-based Association Rule Mining algorithm The confidence is simply accuracy to evaluate rules, produced by this algorithm .The rules are ranked according to the confidence value.if two or more rules share the same confidence then they are initially ordered using Their support and secondly the time of discovery.[26] Support : for the association rule Xy is the percentage of transactions in the database that contains X U Y .
Confidence: For the association rule is X  y is the ratio of the number of transactions that contains X U Y to the number of transactions that contain X. the generation of item sets & frequent Item sets where the minimum support count is 2.
To generate the association rule from frequent item set we use the following rule: For each frequent item set L, find all nonempty subsets of L.

Figure (6) generation item sets & frequent item sets
The algorithm makes many searches in database to find frequent item sets where k-itemsets are used to generate k+1-itemsets.Each k-itemset must be greater than or equal to minimum support threshold to be frequency.Otherwise, it is called candidate item sets.In the first, the algorithm scan database to find frequency of 1-itemsets that contains only one item by counting each item in database.The frequency of 1-itemsets is used to find the item sets in 2-itemsets which in turn is used to find 3-itemsets and so on until there are not any more k-itemsets.[28].

Predictive Apriori
In the case of Apriori , every so often we can find rules with higher confidence but low support on respective items of generating rules , sometimes , rules are produced with large support but low confidence [30] introduced this algorithm with the concept of " larger support has to trade against a higher confidence ".Predictive Apriori is also a confidencebased ARM algorithm.But rules ranked by this algorithm are sorted according to "expected predictive accuracy".This interestingness measure of predictive Apriori suits the requirement of a classification task [31] it tries to maximize expected accuracy of an association rule rather than confidence in Apriori .Finding a unique association rule mining algorithm based on data characteristics

5.1
Appendix (1) explains the flowchart about the summer training students system.

Designing database
Database Life Cycle as shown in figure( 8)

Figure (8): Database Life Cycle
A special database has been designed for summer training for the college of Administration and Economics including many tables, the tables as each entity will be a table in the database and special qualities of this entity will become the fields for this table, and identify the relationships between objects (entity) as the entity selection process should clarify the relationships that bind them.The Appendix (2) shows the E-R (Entity Relationship) gives the conceptual model of the world .

Apriori & Predictive Apriori Association rule Results
At this stage, we try to compare the two association rule algorithm in predicting the student placement in the organization, Apriori Association rule and Predictive Apriori Association rule, we need the algorithm where the Association rules consist of "Government" and "Private" , so we compare these results using these two Association rule algorithms.Upon examining table (1), we found that Apriori Algorithm could generate patterns that are believed to be the factors that effect the matching processing process, the data has been grouped into two groups based on the organization category, example of pattern extracted are: -

533
The Appendix (3) represent the result using Apriori Association Rule, this is used in the placement of students in the organization, As we increase the lower support bound, We get the refined rule as shown in these paragraphs, The rules were evaluated based on the confidence and support the best rule were chosen when confidence is 90% and the support also shows 10% good support.Paragraph 2 in the Appendix (3), represented result using predictive Apriori association rule algorithm, this predictive accuracy is used to generate the Apriori association rule, the best rules accuracy start at 0.99329 and decrease to 0.62506.

Conclusion and future work
-we found that the Apriori Association rule algorithm performed best with confidence based ranking and predictive Apriori had performed better on accuracy based ranking. -

Figure ( 2 )
Figure (2): Database Management System Technology (CDBMS)-Well-suited for data warehouses that have a large number of similar data items.A column-based relational database is exactly what its name suggests,

Figure ( 4 )
Figure (4): Self -Indexing OLAP Cube Definition, Storage, and Querying  OLAP API and Metadata  OLAP Cube Materialized Views  Analytic Workspaces  SQL Access to OLAP Cubes 3.2 OLAP Guidelines Dr. E.F.Codd created a list of guidelines and requirements as the basis for selecting OLAP systems [18][19] : 1-Basic Features( Multidimensional analysis , Consistent Performance, Fast response times for interactive queries, Drill-down and roll-up, Navigation in and out of details, Slice-and-dice or rotation, Multiple view modes, Easy Scalability, Time intelligence(year-to-date, fiscal period) 2-Advanced Features(Powerful Calculation, Crossdimensional Calculations, Pre-Calculation or Pre-Consolidation, Drill-through across dimensions or details, Sophisticated presentation & displays, Collaborative decision making, Derived data values through formulas, Application of alert technology, Report generation with agent technology.3.3 Types of OLAP servers ROLAP versus, MOLAP versus, HOLAP MOLAP (Multidimensional OLAP):

Figure( 5 )
Figure(5): Architecture MOLAP The features of MOLAP are [22][23]: a) Store and manage warehouse data in multidimensional DBMS.b) Array based storage structure.c) Direct access to array data structure.d) Excellent performance: MOLAP cubes are built for fast data retrieval, and are optimal for slicing and dicing operations.e)Can perform complex calculations: All calculations have been regenerated when the cube is created.Hence, complex calculations are not only doable, but they return

5 . 3 Forms 1 -
System Student: a student is regarded as one of the training operation elements (the most important element), for the prospects and attitudes he has and the relationship of that with the level of training and prior planning which is the responsibility of the Administration Departments to attain the expected benefits and the reflection of that on the training operation as a definite gain.A special Form has been designed for the student related to the table of the student existing in the database for summer training, and as explained in figure(9).

Figure ( 9 )
Figure (9) personal Information of Student 2-A member of Teaching Staff: A member of teaching staff forms a basic foundation in the field of summer training, starting from the following up of a student at the site of training and drawing the shape of work for him, together with the coordination of Practical supervisor at site and according to a prior prepared programmed in a detailed manner and a follow up at the field of training and stating the weak and strong points in them and ruling out the weak points.And as the university lecturer acquaints himself with the nature and policy of site work which he intends to supervise and by doing that his

Figure ( 12 Figure ( 13 )
Figure (12) Fact Table If student are from the Accounting or from Banking or MIS Department and their Average between 66-70 and Sex=female and their place in the left side of Mosul then the students were placed in Alsalam Hospital in a Government Organization.-If student are from the MIS Department and their Average between 76-80 and Sex=male and their place in the right side of Mosul then the students were placed at the college of computer science in a Government Organization.-If student are from the Admin or from Banking or MIS Department and their Average between 71-75 and Sex=female and their place in the right side of Mosul then students were placed in Medicine in a Private Organization.

): Flow Chart of electronic System for Summer Training Students Appendix(3): Apriori Association Rule & Predictive Apriori algorithm Rule
The distributed database designed for summer training makes it possible to share data by multiple applications or users & reaching the saved data in the database.Data warehouse provides summer training information, highly detailed reports (shows the results of inquiries in multiple formats through figures and charts), analyzes of value and quality, as derived data is formatted through construction processes (extraction, transform, and load) before they are loaded into the warehouse database.As Future Work includes: -Other categories OLAP can be applied such as ROLAP…etc.-OtherAssociation Rule Algorithms can be applied in Distribution of the student placement in organizations.