Introduction

UAHDataScienceSC provides an educational framework for learning supervised classification through hands-on implementation and visualization. The package combines algorithm implementations with interactive learning features, visualization tools, and carefully curated test datasets to facilitate understanding of machine learning concepts.

1. Installing and Loading the Package

If UAHDataScienceSC is on CRAN:

install.packages("UAHDataScienceSC")

Then, load it into your R session:

library(UAHDataScienceSC)

2. Built-in Datasets

The package includes several datasets designed to demonstrate different aspects of machine learning algorithms. Each dataset serves specific educational purposes and highlights particular challenges in data analysis.

Flower Classification

The flower classification dataset (db_flowers) contains measurements of flower characteristics including petal length, petal width, sepal length, and sepal width. These measurements are used to classify flowers into three distinct species (setosa, versicolor, virginica), with additional unknown samples provided for testing purposes. The dataset maintains a balanced distribution of classes, making it particularly suitable for initial classification exercises.

data("db_flowers")
head(db_flowers)
#>   petal_length petal_width sepal_length sepal_width ClassLabel
#> 1     1.516451   0.2382057     4.114058    3.597033     setosa
#> 2     1.667244   0.1508315     4.703633    3.363405     setosa
#> 3     1.508161   0.2566923     5.431131    3.739184     setosa
#> 4     1.134878   0.1310670     5.368672    3.068208     setosa
#> 5     1.600494   0.2159157     5.199039    3.608380     setosa
#> 6     3.940833   1.2706230     6.210460    2.715029 versicolor

Logic Gate Datasets

The logic gate datasets simulate binary classification problems with varying complexity. These variations help illustrate the capabilities and limitations of different classification algorithms.

AND Gate Dataset

The AND gate dataset (db_per_and) demonstrates basic binary classification with three input variables and a single output that follows logical AND rules. This dataset proves especially useful for understanding perceptron training on linearly separable patterns.

data("db_per_and.rda")
#> Warning in data("db_per_and.rda"): data set 'db_per_and.rda' not found
head(db_per_and)
#>   x1 x2 x3 y
#> 1  0  0  0 0
#> 2  0  0  1 0
#> 3  0  1  0 0
#> 4  0  1  1 0
#> 5  1  0  0 0
#> 6  1  0  1 0

OR Gate Dataset

The OR gate dataset (db_per_or) extends the binary classification concept with OR logic.

data("db_per_or.rda")
#> Warning in data("db_per_or.rda"): data set 'db_per_or.rda' not found
head(db_per_or)
#>   x1 x2 x3 y
#> 1  0  0  0 0
#> 2  0  0  1 1
#> 3  0  1  0 1
#> 4  0  1  1 1
#> 5  1  0  0 1
#> 6  1  0  1 1

XOR Gate Dataset

The XOR gate dataset (db_per_xor) presents a more challenging non-linearly separable problem.

data("db_per_xor.rda")
#> Warning in data("db_per_xor.rda"): data set 'db_per_xor.rda' not found
head(db_per_xor)
#>   x1 x2 x3 y
#> 1  0  0  0 0
#> 2  0  0  1 1
#> 3  0  1  0 1
#> 4  0  1  1 0
#> 5  1  0  0 1
#> 6  1  0  1 0

Vehicle Classification

The vehicle classification dataset (db2) presents a real-world application scenario combining categorical and numerical features. The dataset uses license types, wheel counts, and passenger capacity to classify vehicles into categories such as cars, motorcycles, bicycles, and trucks. This mixed-type data structure provides practical experience with handling diverse input features.

data(db2)
head(db2)
#>   CardType WheelAmount PassAmount VehicleType
#> 1        B           4          5         Car
#> 2        A           2          2  Motorcicle
#> 3        N           2          1     Bicicle
#> 4        B           6          4       Truck
#> 5        B           4          6         Car
#> 6        B           4          4         Car

Extended Vehicle Classification

The extended vehicle dataset (db3) builds upon db2 by introducing additional complexity through new vehicle types and relationships, making it particularly suitable for exploring decision tree depth impacts and algorithm scalability.

data(db3)
head(db3)
#>   CardType WheelAmount PassAmount VehicleType
#> 1        B           4          5         Car
#> 2        A           2          2  Motorcicle
#> 3        N           2          1     Bicicle
#> 4        B           6          4       Truck
#> 5        B           4          6         Car
#> 6        B           4          4         Car

Regression Test

The regression test dataset (db1rl) incorporates various mathematical relationships including linear, exponential, logarithmic, and sinusoidal patterns. This diversity allows users to compare the effectiveness of different regression approaches and understand their appropriateness for various data patterns.

data("db1rl")
head(db1rl)
#>   Straight_Line Exponential Logarithmic       Sine Dependent_Variable
#> 1     0.0000000    2.718282    0.000000  0.0000000          0.0000000
#> 2     0.4210526    2.941582    6.267700  0.8371665          0.4210526
#> 3     0.8421053    3.183225    6.959898  0.9157733          0.8421053
#> 4     1.2631579    3.444718    7.365047  0.1645946          1.2631579
#> 5     1.6842105    3.727693    7.652571 -0.7357239          1.6842105
#> 6     2.1052632    4.033913    7.875619 -0.9694003          2.1052632

3. Algorithm Implementations

K-Nearest Neighbors (KNN)

The KNN implementation supports various distance calculation methods to accommodate different data types and relationship patterns. The algorithm can employ Euclidean distance for standard numerical data, Manhattan distance for grid-like patterns, cosine similarity for angular relationships, and specialized metrics like Hamming distance for categorical data. The choice of distance method significantly impacts classification results and should be selected based on data characteristics and problem requirements.

result <- knn(
  data = db_flowers,
  ClassLabel = "ClassLabel",
  p1 = c(4.7, 1.2, 5.3, 2.1),
  d_method = "euclidean",
  k = 3
)
print(result)
#> [1] "versicolor"

The interactive learning mode provides step-by-step visualization of the classification process:

result <- knn(
  data = db_flowers,
  ClassLabel = "ClassLabel",
  p1 = c(4.7, 1.2, 5.3, 2.1),
  d_method = "euclidean",
  k = 3,
  learn = TRUE,
  waiting = FALSE
)
#> 
#> EXPLANATION
#> ________________________________________________________________________________
#> ________________________________________________________________________________
#> 
#> Step 1:
#>     - Calculate the chosen d_method from the value we want to classify to every
#>     other one.
#> Step 2:
#>     - Select the k closest neighbors and get their classes.
#> Step 3:
#>     - Create a scatterplot matrix with the provided values for visualization pur
#>     pose
#> Step 4:
#>     - Select the most repeated class among the k closest neighbors classes.
#> ________________________________________________________________________________
#> ________________________________________________________________________________
#> 
#> Step 1:
#> 
#> Distance from p1 to every other p.
#>  [1] 3.8350487 3.5000299 3.7123827 3.8464081 3.5861906 1.3373557 1.5469332
#>  [8] 0.4332036 1.3151308 1.3111425 1.6646335 1.5125047 0.8410765 1.5494029
#> [15] 2.1865198 1.8132936 2.1275594 3.1125899 2.8527131 3.4039581
#> ________________________________________________________________________________
#> 
#> Step 2:
#> 
#> These are the first k values classes:
#> [1] "versicolor" "virginica"  "versicolor"

#> ________________________________________________________________________________
#> 
#> Step 3:
#> 
#> Plot values.
#> ________________________________________________________________________________
#> 
#> Step 4:
#> 
#> The most represented class among the k closes neighbors is versicolor
#> therefore, that is the new value's predicted class.

Decision Trees

The decision tree implementation offers multiple impurity measures for node splitting decisions. The entropy method bases decisions on information theory principles, while the Gini method considers misclassification probability. The error rate method provides a direct measure of classification accuracy. Each method may produce different tree structures, offering insights into various approaches to data partitioning.

tree <- decision_tree(
  data = db2,
  classy = "VehicleType",
  m = 4,
  method = "gini",
  learn = TRUE
)
#> 
#> EXPLANATION
#> ________________________________________________________________________________
#> ________________________________________________________________________________
#> 
#> Step 0:
#>     - Set the dataframe as parent node. The original dataframe is set as node 0.
#> 
#> Step 1:
#>     - If data is perfectly classified, go to step 4.
#>     - If data is not classified, create all the possible combinations of values
#>     for each variable.
#>       Each combination stablishs the division of the son nodes, being "m"
#>       numbers of divisions performed.
#> Step 2:
#>     - Calculate the information gain for each combination.
#>       The "method" method is used to calculate the information gain.
#> Step 3:
#>     - Select the division that offers the most information gain for each variabl
#>     e.
#>     - Select the division that offers the most information gain among the best o
#>     f each variable.
#>     - For each son of the division add the node to the tree and go to step 1 wit
#>     h the filtered dataset.
#> Step 4:
#>     - This branch is finished. The next one in preorder will be evaluated
#> 
#> Step 5:
#>     - Print results
#> 
#> ________________________________________________________________________________
#> ________________________________________________________________________________
#> 
#>  IMPORTANT!!
#> 
#>     - The objective is to understand how decission trees work. The stopping cond
#>     ition is to have PERFECT LEAFES.
#>       If "data" is not perfectly classifiable, the code WILL NOT FINISH!!
#> 
#>     - It is important to understand that the code flow is recursive,
#>       meaning the tree is traversed in preorder (first, the root node is visited
#>       , then the children from left to right).
#>       So, when the information is categorized in step 1, this order will be foll
#>       owed.
#> 
#> ________________________________________________________________________________
#> ________________________________________________________________________________
#> Press [enter] to continue
#> 
#> 
#> Step 0:
#> 
#> Data:
#>    CardType WheelAmount PassAmount VehicleType
#> 1         B           4          5         Car
#> 2         A           2          2  Motorcicle
#> 3         N           2          1     Bicicle
#> 4         B           6          4       Truck
#> 5         B           4          6         Car
#> 6         B           4          4         Car
#> 7         N           2          2     Bicicle
#> 8         B           2          1  Motorcicle
#> 9         B           6          2       Truck
#> 10        N           2          1     Bicicle
#> Press [enter] to continue
#> 
#> ________________________________________________________________________________
#> 
#> Steps 1 and 2:
#> Combinations for CardType
#>    X1  X2    X3       Gain classifier
#> 1 --- --- B A N 0.00000000   CardType
#> 2 --- A N     B 0.22333333   CardType
#> 3   A   B     N 0.37333333   CardType
#> 4 ---   A   B N 0.09555556   CardType
#> 5 --- B A     N 0.28285714   CardType
#> 
#> Combinations for WheelAmount
#>    X1  X2    X3      Gain  classifier
#> 1 --- --- 4 2 6 0.0000000 WheelAmount
#> 2 --- 2 6     4 0.2828571 WheelAmount
#> 3   2   4     6 0.5000000 WheelAmount
#> 4 ---   2   4 6 0.2600000 WheelAmount
#> 5 --- 4 2     6 0.2150000 WheelAmount
#> 
#> Combinations for PassAmount
#>     X1    X2      X3        X4        Gain classifier
#> 1  ---   ---     --- 5 2 1 4 6 0.000000000 PassAmount
#> 2  ---   --- 2 1 4 6         5 0.073333333 PassAmount
#> 3  --- 1 4 6       2         5 0.106666667 PassAmount
#> 4    1     2     4 6         5 0.273333333 PassAmount
#> 5  1 6     2       4         5 0.190000000 PassAmount
#> 6  1 4     2       5         6 0.180000000 PassAmount
#> 7  ---     1   2 4 6         5 0.173333333 PassAmount
#> 8    1   2 6       4         5 0.206666667 PassAmount
#> 9    1   2 4       5         6 0.246666667 PassAmount
#> 10 --- 2 1 6       4         5 0.154285714 PassAmount
#> 11 2 1     4       5         6 0.273333333 PassAmount
#> 12 --- 2 1 4       5         6 0.165000000 PassAmount
#> 13 ---   2 1     4 6         5 0.240000000 PassAmount
#> 14 ---   1 6     2 4         5 0.130000000 PassAmount
#> 15 ---   1 4     2 6         5 0.080000000 PassAmount
#> 16 ---   ---       2   5 1 4 6 0.054285714 PassAmount
#> 17 ---     1       2     5 4 6 0.256666667 PassAmount
#> 18   1     2       4       5 6 0.306666667 PassAmount
#> 19   1     2     5 4         6 0.273333333 PassAmount
#> 20 ---     2       4     5 1 6 0.120000000 PassAmount
#> 21   2     4     5 1         6 0.190000000 PassAmount
#> 22 ---     2   5 1 4         6 0.106666667 PassAmount
#> 23 ---     2     4 6       5 1 0.156666667 PassAmount
#> 24 ---   1 6       2       5 4 0.156666667 PassAmount
#> 25 ---   1 4       2       5 6 0.180000000 PassAmount
#> 26 ---   ---       1   5 2 4 6 0.120952381 PassAmount
#> 27 ---     1       4     5 2 6 0.146666667 PassAmount
#> 28   1     4     5 2         6 0.206666667 PassAmount
#> 29 ---     1   5 2 4         6 0.173333333 PassAmount
#> 30 ---     1     4 6       5 2 0.173333333 PassAmount
#> 31 ---     1     2 6       5 4 0.173333333 PassAmount
#> 32 ---     1     2 4       5 6 0.246666667 PassAmount
#> 33 ---   ---       4   5 2 1 6 0.065000000 PassAmount
#> 34 ---     4   5 2 1         6 0.154285714 PassAmount
#> 35 ---   1 6       4       5 2 0.090000000 PassAmount
#> 36 ---   2 6       4       5 1 0.090000000 PassAmount
#> 37 ---   2 1       4       5 6 0.273333333 PassAmount
#> 38 ---   --- 5 2 1 4         6 0.073333333 PassAmount
#> 39 ---   1 4     5 2         6 0.080000000 PassAmount
#> 40 ---   2 4     5 1         6 0.130000000 PassAmount
#> 41 ---   2 1     5 4         6 0.240000000 PassAmount
#> 42 ---   ---   1 4 6       5 2 0.006666667 PassAmount
#> 43 ---   ---   2 4 6       5 1 0.056666667 PassAmount
#> 44 ---   ---   2 1 6       5 4 0.120952381 PassAmount
#> 45 ---   ---   2 1 4       5 6 0.165000000 PassAmount
#> 46 ---   ---     2 1     5 4 6 0.223333333 PassAmount
#> 47 ---   ---     2 4     5 1 6 0.060000000 PassAmount
#> 48 ---   ---     2 6     5 1 4 0.006666667 PassAmount
#> 49 ---   ---     1 4     5 2 6 0.020000000 PassAmount
#> 50 ---   ---     1 6     5 2 4 0.056666667 PassAmount
#> 51 ---   ---     4 6     5 2 1 0.120952381 PassAmount
#> 
#> Press [enter] to continue
#> 
#> ________________________________________________________________________________
#> 
#> Step 3:
#> List of best candidates (1 for each variable):
#>    Sons   Gain      Classifier   
#> X1 list,3 0.3733333 "CardType"   
#> X2 list,3 0.5       "WheelAmount"
#> X3 list,4 0.3066667 "PassAmount"
#> 
#> The division with the most information gain is chosen:
#>     - Classifier = WheelAmount
#>     - Information gain = 0.5
#>     - Sons =
#>  X1  X2  X3 
#> "2" "4" "6" 
#> Press [enter] to continue
#> 
#> 
#> Step 0:
#> 
#> Data:
#>   CardType WheelAmount PassAmount VehicleType
#> 1        A           2          2  Motorcicle
#> 2        N           2          1     Bicicle
#> 3        N           2          2     Bicicle
#> 4        B           2          1  Motorcicle
#> 5        N           2          1     Bicicle
#> Press [enter] to continue
#> 
#> ________________________________________________________________________________
#> 
#> Steps 1 and 2:
#> Combinations for CardType
#>    X1  X2    X3 Gain classifier
#> 1 --- --- A N B 0.00   CardType
#> 2 ---   A   N B 0.18   CardType
#> 3   A   B     N 0.48   CardType
#> 4 --- A B     N 0.48   CardType
#> 5 --- A N     B 0.18   CardType
#> 
#> Combinations for WheelAmount
#>   V1 Gain  classifier
#> 1  2    0 WheelAmount
#> 
#> Combinations for PassAmount
#>   X..... X.2.1.       Gain classifier
#> 1    ---    2 1 0.00000000 PassAmount
#> 2      1      2 0.01333333 PassAmount
#> 
#> Press [enter] to continue
#> 
#> ________________________________________________________________________________
#> 
#> Step 3:
#> List of best candidates (1 for each variable):
#>    Sons   Gain       Classifier   
#> X1 list,2 0.48       "CardType"   
#> X2 list,1 0          "WheelAmount"
#> X3 list,2 0.01333333 "PassAmount"
#> 
#> The division with the most information gain is chosen:
#>     - Classifier = CardType
#>     - Information gain = 0.48
#>     - Sons =
#>    X2    X3 
#> "A B"   "N" 
#> Press [enter] to continue
#> 
#> 
#> Steps 1 and 4:
#> Data is classified.
#> 
#> Steps 1 and 4:
#> Data is classified.
#> 
#> Steps 1 and 4:
#> Data is classified.
#> 
#> Steps 1 and 4:
#> Data is classified.
#> ________________________________________________________________________________
#> 
#> Step 5:
#> This is the structure of the decission tree:
#> Height 1 has 3 sons, divided by WheelAmount :
#> 
#> Son 1 (Whose father node is 0 ) filters by " 2 ". It contains:
#>   CardType WheelAmount PassAmount VehicleType
#> 1        A           2          2  Motorcicle
#> 2        N           2          1     Bicicle
#> 3        N           2          2     Bicicle
#> 4        B           2          1  Motorcicle
#> 5        N           2          1     Bicicle
#> 
#> Son 4 (Whose father node is 0 ) filters by " 4 ". It contains:
#>   CardType WheelAmount PassAmount VehicleType
#> 1        B           4          5         Car
#> 2        B           4          6         Car
#> 3        B           4          4         Car
#> 
#> Son 5 (Whose father node is 0 ) filters by " 6 ". It contains:
#>   CardType WheelAmount PassAmount VehicleType
#> 1        B           6          4       Truck
#> 2        B           6          2       Truck
#> 
#> 
#> Height 2 has 2 sons, divided by CardType :
#> 
#> Son 2 (Whose father node is 1 ) filters by " A B ". It contains:
#>   CardType WheelAmount PassAmount VehicleType
#> 1        A           2          2  Motorcicle
#> 2        B           2          1  Motorcicle
#> 
#> Son 3 (Whose father node is 1 ) filters by " N ". It contains:
#>   CardType WheelAmount PassAmount VehicleType
#> 1        N           2          1     Bicicle
#> 2        N           2          2     Bicicle
#> 3        N           2          1     Bicicle
#> 
#> 
print(tree)
#> Height 1 has 3 sons, divided by WheelAmount :
#> Son 1 (Whose father node is 0 ) filters by " 2 ". It contains:
#>   CardType WheelAmount PassAmount VehicleType
#> 1        A           2          2  Motorcicle
#> 2        N           2          1     Bicicle
#> 3        N           2          2     Bicicle
#> 4        B           2          1  Motorcicle
#> 5        N           2          1     Bicicle
#> 
#> Son 4 (Whose father node is 0 ) filters by " 4 ". It contains:
#>   CardType WheelAmount PassAmount VehicleType
#> 1        B           4          5         Car
#> 2        B           4          6         Car
#> 3        B           4          4         Car
#> 
#> Son 5 (Whose father node is 0 ) filters by " 6 ". It contains:
#>   CardType WheelAmount PassAmount VehicleType
#> 1        B           6          4       Truck
#> 2        B           6          2       Truck
#> 
#> 
#> Height 2 has 2 sons, divided by CardType :
#> 
#> Son 2 (Whose father node is 1 ) filters by " A B ". It contains:
#>   CardType WheelAmount PassAmount VehicleType
#> 1        A           2          2  Motorcicle
#> 2        B           2          1  Motorcicle
#> 
#> Son 3 (Whose father node is 1 ) filters by " N ". It contains:
#>   CardType WheelAmount PassAmount VehicleType
#> 1        N           2          1     Bicicle
#> 2        N           2          2     Bicicle
#> 3        N           2          1     Bicicle
#> 
#>

Perceptron

The perceptron implementation includes several activation functions to model different decision boundaries. The step function provides basic binary thresholding, while continuous functions like sine, tangent, and ReLU offer smoother transitions. Advanced functions such as GELU and Swish incorporate modern neural network concepts.

weights <- perceptron(
  training_data = db_per_and,
  to_clasify = c(0, 0, 1),
  activation_method = "swish",
  max_iter = 1000,
  learning_rate = 0.1,
  learn = TRUE
)
#> 
#> EXPLANATION
#> ________________________________________________________________________________
#> ________________________________________________________________________________
#> 
#> Step 1:
#>     - Generate a random weight for each variable.
#> Step 2:
#>     - Check if the weight classify correctly. If they do, go to step 4
#> Step 3:
#>     - Adjust weights based on the error between the expected output and the real
#>      output.
#>     - If max_iter is reached go to step 4. If not, go to step 2.
#> Step 4:
#>     - Return the weigths and use them to classigy the new value
#> ________________________________________________________________________________
#> ________________________________________________________________________________
#> Press [enter] to continue
#> 
#> 
#> Step 1:
#> Random weights between -1 and 1 are generated for each variable:
#> [1]  0.9826067 -0.4158420 -0.4533431
#> Press [enter] to continue
#> 
#> ________________________________________________________________________________
#> 
#> Steps 2 and 3:
#> Weights do not classify correctly so they get adjusted:
#>          x1         x2         x3
#> 4 0.9826067 -0.3901642 -0.4276653
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>         x1         x2         x3
#> 4 1.073691 -0.2990803 -0.3365814
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>         x1         x2         x3
#> 4 1.073691 -0.2990803 -0.3225581
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>          x1         x2         x3
#> 4 0.9936687 -0.2990803 -0.3225581
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>          x1         x2         x3
#> 4 0.9492578 -0.2990803 -0.3669689
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>          x1         x2         x3
#> 4 0.9492578 -0.2764758 -0.3443644
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>          x1         x2         x3
#> 4 0.9492578 -0.2764758 -0.3443644
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>          x1         x2         x3
#> 4 0.9492578 -0.2764758 -0.3300819
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>         x1         x2         x3
#> 4 1.029215 -0.1965185 -0.2501246
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>         x1         x2         x3
#> 4 1.029215 -0.1965185 -0.2501246
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>         x1         x2         x3
#> 4 1.029215 -0.1790923 -0.2326984
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>         x1         x2         x3
#> 4 1.029215 -0.1626832 -0.2162893
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>          x1         x2         x3
#> 4 0.9729011 -0.1626832 -0.2726033
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>          x1         x2         x3
#> 4 0.9729011 -0.1626832 -0.2726033
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>          x1         x2         x3
#> 4 0.9729011 -0.1455823 -0.2555024
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>          x1         x2        x3
#> 4 0.9246895 -0.1455823 -0.303714
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>          x1         x2         x3
#> 4 0.9246895 -0.1280809 -0.2862127
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>          x1          x2         x3
#> 4 0.9927949 -0.05997553 -0.2181073
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>          x1          x2         x3
#> 4 0.9927949 -0.05997553 -0.2181073
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>          x1          x2         x3
#> 4 0.9397649 -0.05997553 -0.2711373
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>         x1           x2         x3
#> 4 1.000347 0.0006061375 -0.2105556
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>          x1           x2         x3
#> 4 0.9460258 0.0006061375 -0.2648764
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>          x1         x2         x3
#> 4 0.9460258 0.01208377 -0.2533987
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>          x1         x2         x3
#> 4 0.9460258 0.01147593 -0.2533987
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>          x1         x2         x3
#> 4 0.9460258 0.01147593 -0.2533987
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>          x1         x2         x3
#> 4 0.8998587 0.01147593 -0.2995659
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>          x1         x2        x3
#> 4 0.8998587 0.02381977 -0.287222
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>          x1         x2        x3
#> 4 0.8358859 0.02381977 -0.287222
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>          x1         x2        x3
#> 4 0.8358859 0.02261459 -0.287222
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>          x1         x2        x3
#> 4 0.8358859 0.02261459 -0.287222
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>          x1         x2       x3
#> 4 0.8993778 0.08610658 -0.22373
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>          x1         x2       x3
#> 4 0.8354481 0.08610658 -0.22373
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>          x1         x2       x3
#> 4 0.8354481 0.08610658 -0.22373
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>          x1         x2       x3
#> 4 0.7695239 0.02018236 -0.22373
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>         x1         x2         x3
#> 4 0.833424 0.08408246 -0.1598299
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>         x1         x2         x3
#> 4 0.833424 0.07970169 -0.1598299
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>          x1         x2         x3
#> 4 0.7682598 0.01453745 -0.1598299
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>          x1         x2         x3
#> 4 0.8277112 0.07398894 -0.1003784
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>          x1        x2          x3
#> 4 0.8723994 0.1186771 -0.05569025
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>          x1        x2          x3
#> 4 0.8108739 0.1186771 -0.05569025
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>          x1        x2         x3
#> 4 0.7594982 0.1186771 -0.1070659
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>          x1        x2          x3
#> 4 0.8067728 0.1659517 -0.05979138
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>          x1        x2          x3
#> 4 0.8067728 0.1659517 -0.05689116
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>          x1        x2          x3
#> 4 0.8067728 0.1569672 -0.05689116
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>          x1         x2          x3
#> 4 0.7370106 0.08720494 -0.05689116
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>          x1         x2          x3
#> 4 0.6797168 0.02991121 -0.05689116
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>          x1         x2          x3
#> 4 0.6797168 0.03124201 -0.05556036
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>          x1         x2         x3
#> 4 0.7365768 0.08810198 0.00129961
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>         x1         x2         x3
#> 4 0.686766 0.08810198 0.00129961
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>         x1         x2         x3
#> 4 0.686766 0.08350296 0.00129961
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>         x1         x2         x3
#> 4 0.686766 0.07915359 0.00129961
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>         x1         x2          x3
#> 4 0.686766 0.07915359 0.001234587
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>         x1         x2          x3
#> 4 0.686766 0.07503936 0.001234587
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>          x1         x2          x3
#> 4 0.6348303 0.02310362 0.001234587
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>          x1         x2          x3
#> 4 0.6348303 0.02193509 0.001234587
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>          x1         x2         x3
#> 4 0.6914805 0.07858533 0.05788483
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>          x1         x2         x3
#> 4 0.6388438 0.02594863 0.05788483
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>          x1         x2         x3
#> 4 0.6388438 0.02158136 0.05351756
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>          x1         x2         x3
#> 4 0.6388438 0.02158136 0.05077009
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>          x1         x2          x3
#> 4 0.5929237 0.02158136 0.004850011
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>          x1         x2          x3
#> 4 0.5929237 0.02024232 0.003510978
#> Press [enter] to continue
#> 
#> Weights do not classify correctly so they get adjusted:
#>          x1         x2         x3
#> 4 0.6528729 0.08019147 0.06346012
#> Press [enter] to continue
#> 
#> 
#> ________________________________________________________________________________
#> 
#> Step 4:
#> Predicted value: 0
#> Final weigths:
#>          x1         x2         x3
#> 4 0.6528729 0.08019147 0.06346012

Regression Analysis

Linear and polynomial regression implementations allow for modeling relationships of varying complexity. The linear regression handles straightforward proportional relationships, while polynomial regression captures more complex patterns through higher-degree equations.

# Linear regression
linear_model <- multivariate_linear_regression(
  data = db1rl,
  learn = TRUE
)
#> 
#> EXPLANATION (for each independent variable)
#> ________________________________________________________________________________
#> ________________________________________________________________________________
#> 
#> Step 1:
#>     - Calculate mean of the dependent and independet variables.
#>     - Calculate covariance and the variance of the dependent variable.
#>       If covariance = 0, print error message.
#> Step 2:
#>     - Calculate the intercept and the slope of the equation.
#> Step 3:
#>     - Calculate the sum of squared residuals and the sum of squared deviations
#>       of the independent variable.
#>     - Calculate the coefficient of determination.
#> Step 4:
#>     - Plot the line equation
#> Press [enter] to continue
#> 
#> ________________________________________________________________________________
#> ________________________________________________________________________________
#> 
#> An empty plot is created with appropiate limits
#> 
#> Press [enter] to continue
#> 
#> The mean of  Dependent_Variable  is 4
#> 
#> ________________________________________________________________________________
#> 
#> Step 1:
#> Straight_Line :
#>      - Mean = 4
#>      - Covariance = 6.205
#>      - Variance = 6.205
#> 
#> Press [enter] to continue
#> 
#> ________________________________________________________________________________
#> 
#> Steps 2 and 3
#> Straight_Line :
#>      - Intercept (a) = 0
#>      - Slope (b) = 1
#>      - Sum of squared residuals (ssr) = 117.895
#>      - Sum of squared deviations of y (ssy) = 117.895
#> They are used to calculate: Coefficient of determination (r^2) = 1
#> 
#> Press [enter] to continue
#> 
#> ________________________________________________________________________________
#> 
#> Step 4
#> Straight_Line :
#> Data is plotted and the equation is represented in the legend
#> Press [enter] to continue
#> 
#> ________________________________________________________________________________
#> 
#> Step 1:
#> Exponential :
#>      - Mean = 6.37
#>      - Covariance = 7.119
#>      - Variance = 8.498
#> 
#> Press [enter] to continue
#> 
#> ________________________________________________________________________________
#> 
#> Steps 2 and 3
#> Exponential :
#>      - Intercept (a) = 1.78
#>      - Slope (b) = 1.147
#>      - Sum of squared residuals (ssr) = 155.195
#>      - Sum of squared deviations of y (ssy) = 273.767
#> They are used to calculate: Coefficient of determination (r^2) = 0.567
#> 
#> Press [enter] to continue
#> 
#> ________________________________________________________________________________
#> 
#> Step 4
#> Exponential :
#> Data is plotted and the equation is represented in the legend
#> Press [enter] to continue
#> 
#> ________________________________________________________________________________
#> 
#> Step 1:
#> Logarithmic :
#>      - Mean = 7.92
#>      - Covariance = 3.445
#>      - Variance = 4.092
#> 
#> Press [enter] to continue
#> 
#> ________________________________________________________________________________
#> 
#> Steps 2 and 3
#> Logarithmic :
#>      - Intercept (a) = 5.699
#>      - Slope (b) = 0.555
#>      - Sum of squared residuals (ssr) = 36.341
#>      - Sum of squared deviations of y (ssy) = 385.053
#> They are used to calculate: Coefficient of determination (r^2) = 0.094
#> 
#> Press [enter] to continue
#> 
#> ________________________________________________________________________________
#> 
#> Step 4
#> Logarithmic :
#> Data is plotted and the equation is represented in the legend
#> Press [enter] to continue
#> 
#> ________________________________________________________________________________
#> 
#> Step 1:
#> Sine :
#>      - Mean = 0
#>      - Covariance = -0.389
#>      - Variance = 0.5
#> 
#> Press [enter] to continue
#> 
#> ________________________________________________________________________________
#> 
#> Steps 2 and 3
#> Sine :
#>      - Intercept (a) = 0.251
#>      - Slope (b) = -0.063
#>      - Sum of squared residuals (ssr) = 0.463
#>      - Sum of squared deviations of y (ssy) = 329.5
#> They are used to calculate: Coefficient of determination (r^2) = 0.001
#> 
#> Press [enter] to continue
#>

#> ________________________________________________________________________________
#> Step 4
#> Sine :
#> Data is plotted and the equation is represented in the legend

# Polynomial regression
poly_model <- polynomial_regression(
  data = db1rl,
  degree = 4,
  learn = TRUE
)
#> 
#> EXPLANATION (for each independent variable)
#> ________________________________________________________________________________
#> ________________________________________________________________________________
#> 
#> Step 1:
#>     - Create an empty plot with the appropiate limits.
#> Step 2:
#>     - Aproximate an equation line that approximates the given values.
#>       using the lm() function. It employs the least squared error method.
#> Step 3:
#>     - Plot the line and the legend.
#> Press [enter] to continue
#> 
#> ________________________________________________________________________________
#> ________________________________________________________________________________
#> 
#> Step 1:
#> 
#> An empty plot is created with appropiate limits
#> 
#> Press [enter] to continue
#> 
#> The aproximations of the following equations to the provided values
#> are done adjusting the coefficients of the line to make it the best-fit possible
#> .
#> 
#> ________________________________________________________________________________
#> Steps 2 and 3:
#> Equation ( degree 4 ) aproximation for Straight_Line -->  f(x) = 0 + 1 x-0x^2+0x
#> ^3-0x^4
#> 
#> Press [enter] to continue
#> 
#> ________________________________________________________________________________
#> Steps 2 and 3:
#> Equation ( degree 4 ) aproximation for Exponential -->  f(x) = 2.719 + 0.505 x+0
#> .052x^2+0.002x^3+0x^4
#> 
#> Press [enter] to continue
#> 
#> ________________________________________________________________________________
#> Steps 2 and 3:
#> Equation ( degree 4 ) aproximation for Logarithmic -->  f(x) = 1.544 + 7.73 x-2.
#> 978x^2+0.467x^3-0.025x^4
#> 
#> Press [enter] to continue
#>

#> ________________________________________________________________________________
#> Steps 2 and 3:
#> Equation ( degree 4 ) aproximation for Sine -->  f(x) = 0.528 - 0.541 x+0.153x^2
#> -0.013x^3-0x^4
#>

Basic Functionality of UAHDataScienceSC