Statistical Analysis and Data Mining – CSIS 657

CG • Section 8WK • 11/08/2019 to 04/16/2020 • Modified 07/28/2020

Course Description

This course provides an in-depth study of the field of statistical analysis and data mining as it relates to real-world applications. It explores the complexities of data mining algorithms, software tools, and techniques employed in modern analytics and massive data sets. The selection, application, and evaluation of statistical approaches are examined in the context of data mining.

Prerequisites

CSIS 505 and CSIS 525

Rationale

Technology’s role in society continues to expand in application and influence. The data generated through this digital frontier is growing exponentially, creating new challenges as well as exciting opportunities. The ability to sift through the vast amount of information requires a skillset that is part engineer and part artist. Finding data, making appropriate associations between data, constructing ways to communicate relationships of data, and applying business intelligence and analytics are fundamental to business in the digital age. Never before has there been so much information available for companies to strategize with, analyze, consume, and even market. The effective exploitation of data mining and predictive analytics technologies will be an invaluable skillset for the company looking for opportunities to stay competitive while maintaining lean and efficient organizations.

Measurable Learning Outcomes

Upon successful completion of this course, the student will be able to:

  1. Define the data mining process in relation to Christian principles and ethics. (PLO: 1)
  2. Compare data mining algorithms, tools, and techniques. (PLO: 3)
  3. Analyze data and text mining case studies. (PLO: 3)
  4. Select appropriate statistical data mining models for various scenarios. (PLO: 2)
  5. Apply data and text mining algorithms to real-world applications. (PLO: 2)

Course Assignment

Textbook readings and lecture presentations

Course Requirements Checklist

After reading the Course Syllabus and Student Expectations, the student will complete the related checklist found in Module/Week 1.

Discussion Board Forums (4)

Discussion boards are collaborative learning experiences. Therefore, the student is required to create a thread in response to the provided prompt for each forum. Each thread must be 400–500 words and demonstrate course-related knowledge. In addition to the thread, the student is required to reply to the threads of at least 2 classmates. Each reply must be 200–300 words. Each thread and reply must include at least 2 scholarly sources other than the Reading & Study material and 2 contextually appropriate scriptural references. Each thread and reply must follow current APA format.

Tutorials (7)

The student will complete practical exercises (tutorials) designed to (1) create experience with the software used in the course, (2) provide real-world examples of problems facing a variety of business sectors, (3) build understanding of how to methodically approach solving a data mining hypothesis, and (4) foster a greater understanding of the potential value of corporate data and the impact of big data.

Quizzes (4)

The student will complete four short answer quizzes based on readings and lecture presentations. Quizzes are open book/open notes and must be completed in one hour.

Project

The student will incorporate all aspects of the course into an integrated holistic project on data mining. This effort will follow the scientific method outlined in the course textbook and be completed in the following phases. Prior to beginning the project, the student will submit a 2-page summary of the issue being addressed.

Phase I

The student will submit a paper of at least 5 pages that outlines the problem, identifies the sources of data, and explains the student’s initial hypothesis. The paper must be in current APA style and include at least 8 scholarly sources other than the course textbook.

Phase II

The student will submit a 5–8-page paper that outlines the processes implemented for collecting and preparing the data for examination. This phase must be a natural continuation of Phase I, describing in detail how the student prepared the data sets used as well as the analytics selected and reasoning for those selections. Identification of key variables and significant descriptive statistics must be included. The paper must be in current APA style and include at least 8 scholarly sources other than the course textbook.

Phase III

The student will submit a paper of at least 5 pages that builds off the work conducted in Phases I and II. The submitted document will be comprehensive and include the findings, analysis, and next steps recommended as a result of the information generated by the data mining model(s) utilized. The student’s analysis of the data must be presented in a manner that would be appropriate for repeatability by a fellow data miner. The paper must be in current APA style and include at least 5 scholarly sources other than the course textbook.


Phase IV

The student will create a persuasive slide presentation for the purpose of communicating the project (Phases I–III) and the recommendations coming out of this effort. The target audience for this information must be consistent with C-level management and other organization decision makers. Details might include cost, schedule, impact, implementation planning, change management issues, risks, benefits, etc. This presentation must include at least 15 slides. The student will also finalize the report coming from the work from Phases I–III into a cohesive product. The report must be at least 15 pages. The paper must be in current APA style and include at least 10 scholarly sources other than the course textbook.