Problem Statement

  1. Check whether the column rainfall and potential Evapo transpiration in the file gen.csv displays a monotonic function or not

  2. Use the WEKA Explorer and justify the values

    1. MCC
    2. Kappa Stats
    3. ROC Curve Value

    For the different pre-defined datasets present under C:\\Program Files\\Weka-3-8-6\\data\\diabetes.arff

Teams Data

Notes

Data mining techniques ⇒ SEMMA

Data Analysis

  1. collect
  2. measure
  3. analyze
  4. improve
  5. control

Correlation Analysis

The values shall lie between -1 to +1, if anything out of range denoted that the data isn’t processed properly and needs to be redone.

When the function is monotonic, Spearman coefficient variable is considered ⇒ Where the graph of the function is neither increasing or decreasing

When the function is linearly related variable Pearson r correlation coefficient is considered

Market Basket Analysis, 
Simple random forest classifiers 
Correlation Coefficient Type of Relationship Level of Measurement Data Distribution
Pearson’s r Linear
Spearman

Performance Evaluating Parameters

Precision Recall Curve (PRC), ROC Curve, Area under the Curve, MCC (Mathews Correlation Coefficient)in any data set, Kappa Statistic Coefficient

Errors in Structured Data

Mean Squared Error (MSE) and Mean Average Error (MAE) shall be considered for every structured dataset

R studio

> getwd()
> print("hello")
> print('hello')

> rep(c("a", "b"), 2)
[1] "a" "b" "a" "b"

> rep("anushka", 5)
[1] "anushka" "anushka" "anushka" "anushka" "anushka"

> x <- 1:10
> x
 [1]  1  2  3  4  5  6  7  8  9 10
> class(x)
[1] "integer"

# to check the working directory
getwd()

# to set the working directory
setwd("G:/My Drive/Semester_5/Summer_Course/tableau/dayOne")

Basics for the dataset