{
    "cells": [
        {
            "cell_type": "markdown",
            "metadata": {
                "nbgrader": {
                    "grade": false,
                    "locked": false,
                    "solution": false
                }
            },
            "source": [
                "## Module 3A: Assignment\n",
                "\n",
                "### Due date: 4 October 2024, 23:59 AEST\n",
                "\n",
                "### Weightage: 30 Marks\n",
                "\n",
                "### Assessment task: \n",
                "\n",
                "You work for an AI image company and one of the departments focuses on research into the accurate and timely diagnosis of heart disease, fertility and breast cancer among residents living in Compassvale City. Their research also looks into the risk factors prevalent in this community as whole. Through a state-funded programme, they collect a huge among of anonymous and de-identified data on its residents. However, they face the challenge of making sense of this data. As an upcoming expert in Machine Learning, your assignment is to help the department to make sense of the data and deliver the following solutions.\n",
                "\n",
                "#### Parts\n",
                "\n",
                "1. Programming Part 1: Write a programme that splits a csv file containing the big heart data, randomly and into folds, once completed, output the lengths of the datasets and folds.\n",
                "\n",
                "2. Programming Part 2: Write a programme that specifies the line of best fit for the fertility rate for workers.\n",
                "\n",
                "3. Programming Part 3: Write a programme that can classify based on radius SE and texture SE of a breast mass and determine whether the diagnosis is mailgnant or benign.\n",
                "\n",
                "4. Programming Part 4: Write a programme that can classify two different types of data points with high accuracy.\n",
                "\n",
                "### Instructions:\n",
                "\n",
                "1. Complete all four parts of the assignments by write the Python codes.\n",
                "2. Run your codes to ensure that the required outputs are delivered.\n",
                "3. Submit the assignment for grading and to get feedback.\n",
                "\n",
                "### Submission:\n",
                "Click on the submit button on the top right after you run the code and all four parts are complete.\n",
                "\n",
                "### Grading:\n",
                "This assignment is automatically graded once the submit button is clicked. Discussion of the auto grader is not permitted, and questions asked will receive no comments or feedback. Scores and automated feedback are typically provided within a minute, though longer waits can occur on occasion. Students can submit the assignment a maximum of three times. The submission with the highest marked submission is counted and assigned to your mark. In circumstances such as technical issues where a submission was not successful but counted, please contact staff members at foai4stem@rmit.edu.au for further assistance.\n",
                "\n",
                "After the due date, all assignments are reviewed by a human marker, who provides additional feedback and possible marks.\n",
                "\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "nbgrader": {
                    "grade": false,
                    "locked": true,
                    "solution": false
                },
                "editable": false,
                "deletable": false
            },
            "source": [
                "### Programming Part 1: Read the file containing the big heart data, split the data randomly and into folds.\n",
                "\n",
                "\n",
                "\n",
                "### Assessment task:\n",
                "\n",
                "Write a programme that splits a csv file containing the big heart data, randomly and into folds, once completed, output the lengths of the datasets and folds. There are 8 steps involved, you will complete the code for steps 2, 4 and 5 only.\n",
                "\n",
                "### Marks:\n",
                "This part is worth 8 Marks. "
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "nbgrader": {
                    "grade": false,
                    "locked": true,
                    "solution": false
                },
                "editable": false,
                "deletable": false
            },
            "source": [
                "#### Step 1\n",
                "\n",
                "Import the libraries."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true
            },
            "outputs": [],
            "source": [
                "from random import seed\n",
                "from random import randrange\n",
                "import random\n",
                "from csv import reader\n",
                "\n",
                "from tabulate import tabulate"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "#### Step 2\n",
                "\n",
                "Load the csv file."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true
            },
            "outputs": [],
            "source": [
                "def load_csv(filename, skip = False):\n",
                "###\n",
                "### YOUR CODE HERE\n",
                "###\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "#### Step 3\n",
                "\n",
                "Print the csv file's content.\n",
                "\n",
                "_Note: This function can be called for other parts in the assignment."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true
            },
            "outputs": [],
            "source": [
                "def print_the_dataset(dataset, contents = True, length = True):\n",
                "    if(contents):\n",
                "        print(tabulate(dataset))\n",
                "        \n",
                "    if(length):\n",
                "        print(len(dataset))"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "#### Step 4\n",
                "\n",
                "Split the csv heart dataset into training and test datasets."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true
            },
            "outputs": [],
            "source": [
                "def train_test_split(dataset, split):\n",
                "    # Create an empty list for the training set\n",
                "    ###\n",
                "    ### YOUR CODE HERE\n",
                "    ###\n",
                "    \n",
                "    # Define the size of the training set\n",
                "    ###\n",
                "    ### YOUR CODE HERE\n",
                "    ###\n",
                "    \n",
                "    # Copy the original dataset to \n",
                "    ###\n",
                "    ### YOUR CODE HERE\n",
                "    ###\n",
                "    \n",
                "    #Loops only to the size of the training set\n",
                "    ###\n",
                "    ### YOUR CODE HERE\n",
                "    ###\n",
                "        \n",
                "        # Populate the training set, by moving the data points from the\n",
                "        # dataset\/test set to the training set\n",
                "        \n",
                "        ###\n",
                "        ### YOUR CODE HERE\n",
                "        ###\n",
                "        \n",
                "    # Return both the training set and test set \n",
                "    ###\n",
                "    ### YOUR CODE HERE\n",
                "    ###\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "#### Step 5\n",
                "\n",
                "Split the csv dataset into k folds for cross validation."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true
            },
            "outputs": [],
            "source": [
                "def k_fold_cross_validation(dataset, k):\n",
                "    n = len(dataset) # Length of the dataset\n",
                "    fold_size = n \/\/ k # Divide the length into smaller folds\n",
                "    folds = [] # Empty list of folds\n",
                "\n",
                "    # Shuffle the dataset\n",
                "    shuffled_dataset = dataset.copy()\n",
                "    random.shuffle(shuffled_dataset)\n",
                "\n",
                "    for i in range(k):\n",
                "        # Assign a start and end variables in respect to the fold size\n",
                "        ###\n",
                "        ### YOUR CODE HERE\n",
                "        ###\n",
                "\n",
                "        # Generate all the test indices for the current fold        \n",
                "        test_indices = []\n",
                "        ###\n",
                "        ### YOUR CODE HERE\n",
                "        ###\n",
                "\n",
                "        # Generate all the train indices for the all other folds        \n",
                "        train_indices = []\n",
                "        ###\n",
                "        ### YOUR CODE HERE\n",
                "        ###\n",
                "\n",
                "        # Create a test set that is randomly populated via the test_indices\n",
                "        test_set = []\n",
                "        ###\n",
                "        ### YOUR CODE HERE\n",
                "        ###\n",
                "            \n",
                "        # Create a train set that is randomly populated via the train_indices\n",
                "        train_set = []\n",
                "        ###\n",
                "        ### YOUR CODE HERE\n",
                "        ###\n",
                "\n",
                "        folds.append((train_set, test_set))\n",
                "\n",
                "    return folds"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "#### Step 6\n",
                "\n",
                "Seed the random value."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true
            },
            "outputs": [],
            "source": [
                "seed(1)"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "#### Step 7\n",
                "\n",
                "Load the big heart csv file and split the data into training (80%) and test (20%) sets."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true
            },
            "outputs": [],
            "source": [
                "filename = 'big_heart.csv'\n",
                "\n",
                "dataset = load_csv(filename, skip = True)\n",
                "print_the_dataset(dataset)\n",
                "\n",
                "training, test = train_test_split(dataset, 0.8)"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true
            },
            "outputs": [],
            "source": [
                "print(len(training))"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true
            },
            "outputs": [],
            "source": [
                "print(len(test))"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "#### Step 8\n",
                "\n",
                "In addition, load the dataset and assign the data into 5 folds."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true
            },
            "outputs": [],
            "source": [
                "k = 5  # Number of folds for cross-validation\n",
                "folds = k_fold_cross_validation(dataset, k)\n",
                "\n",
                "# Print the size of each fold\n",
                "for i, fold in enumerate(folds):\n",
                "    train_set, test_set = fold\n",
                "    print(f\"Fold {i+1}: Training set size: {len(train_set)}, Test set size: {len(test_set)}\")"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true,
                "nbgrader": {
                    "grade": true,
                    "grade_id": "Part 1.1",
                    "locked": true,
                    "points": "2",
                    "solution": false
                },
                "editable": false,
                "deletable": false
            },
            "outputs": [],
            "source": [
                "###\n",
                "### AUTOGRADER TEST - DO NOT REMOVE\n",
                "###\n"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true,
                "nbgrader": {
                    "grade": true,
                    "grade_id": "Part 1.2",
                    "locked": true,
                    "points": "2",
                    "solution": false
                },
                "editable": false,
                "deletable": false
            },
            "outputs": [],
            "source": [
                "###\n",
                "### AUTOGRADER TEST - DO NOT REMOVE\n",
                "###\n"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true,
                "nbgrader": {
                    "grade": true,
                    "grade_id": "Part 1.3",
                    "locked": true,
                    "points": "4",
                    "solution": false
                },
                "editable": false,
                "deletable": false
            },
            "outputs": [],
            "source": [
                "###\n",
                "### AUTOGRADER TEST - DO NOT REMOVE\n",
                "###\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "nbgrader": {
                    "grade": false,
                    "locked": true,
                    "solution": false
                },
                "editable": false,
                "deletable": false
            },
            "source": [
                "## Programming Part 2: Linear Regression\n",
                "\n",
                "Write a programme that specifies the line of best fit for the fertility rate for workers.\n",
                "\n",
                "\n",
                "### Assessment task:\n",
                "\n",
                "Write code to analyse the relationships between the variables in a dataset between the percentage of female workers and the fertility rate of women. There are 14 steps involved, you will complete the code for steps 2, 3, 4, 5, 6, 7, 8 and 9 only.\n",
                "\n",
                "### Marks:\n",
                "\n",
                "This part is worth 8 Marks."
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "nbgrader": {
                    "grade": false,
                    "locked": true,
                    "solution": false
                },
                "editable": false,
                "deletable": false
            },
            "source": [
                "#### Step 1\n",
                "\n",
                "Import the libraries."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true
            },
            "outputs": [],
            "source": [
                "from math import sqrt\n",
                "from matplotlib import pyplot as plot\n",
                "from random import seed\n",
                "from random import randrange\n",
                "from csv import reader"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "nbgrader": {
                    "grade": false,
                    "locked": true,
                    "solution": false
                },
                "editable": false,
                "deletable": false
            },
            "source": [
                "#### Step 2\n",
                "\n",
                "Load the csv file."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true
            },
            "outputs": [],
            "source": [
                "def load_csv(filename, skip = False):\n",
                "\n",
                "    dataset = list()\n",
                "    # Opens the file in read only mode\n",
                "    \n",
                "    ###\n",
                "    ### YOUR CODE HERE\n",
                "    ###\n",
                "    \n",
                "    return dataset"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "nbgrader": {
                    "grade": false,
                    "locked": true,
                    "solution": false
                },
                "editable": false,
                "deletable": false
            },
            "source": [
                "#### Step 3\n",
                "\n",
                "Convert any string column to a float coulm."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true
            },
            "outputs": [],
            "source": [
                "def string_column_to_float(dataset, column):\n",
                " \n",
                "    for row in dataset:\n",
                "        # The strip() function remove white space\n",
                "        # then convert the data into a decimal number (float)\n",
                "        # and overwrite the original data\n",
                "        \n",
                "        ###\n",
                "        ### YOUR CODE HERE\n",
                "        ###\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "nbgrader": {
                    "grade": false,
                    "locked": true,
                    "solution": false
                },
                "editable": false,
                "deletable": false
            },
            "source": [
                "#### Step 4\n",
                "\n",
                "Calculate the mean value of a list of numbers."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true
            },
            "outputs": [],
            "source": [
                "def mean(values):\n",
                "    mean_results = 0.0\n",
                "    ###\n",
                "    ### YOUR CODE HERE\n",
                "    ###\n",
                "    return mean_results"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "#### Step 5\n",
                "\n",
                "Calculate a regularisation value for the parameter."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true
            },
            "outputs": [],
            "source": [
                "def regularisation(parameter, lambda_value=0.01):\n",
                "    ###\n",
                "    ### YOUR CODE HERE\n",
                "    ###\n",
                "\n",
                "    return parameter"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "#### Step 6\n",
                "\n",
                "Calculate least squares between x and y."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true
            },
            "outputs": [],
            "source": [
                "def leastSquares(dataset):\n",
                "\n",
                "    x = list()\n",
                "    y = list()\n",
                "    \n",
                "    for row in dataset:\n",
                "        x.append(row[0])\n",
                "        \n",
                "    for row in dataset:\n",
                "        y.append(row[1])\n",
                "\n",
                "    b0 = 0\n",
                "    b1 = 0\n",
                "\n",
                "    # using the formula to calculate the b1 and b0\n",
                "    numerator = 0\n",
                "    denominator = 0\n",
                "\n",
                "    \n",
                "    ###\n",
                "    ### YOUR CODE HERE\n",
                "    ###\n",
                "    \n",
                "    return [b0, b1]"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "#### Step 7\n",
                "\n",
                "Calculate root mean squared error."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true
            },
            "outputs": [],
            "source": [
                "def root_mean_square_error(actual, predicted):\n",
                "    rmse = 0.0\n",
                "    sum_error = 0.0\n",
                "    ###\n",
                "    ### YOUR CODE HERE\n",
                "    ###\n",
                "    return rmse"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "#### Step 8\n",
                "\n",
                "Make Predictions."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true
            },
            "outputs": [],
            "source": [
                "def simple_linear_regression(train, test):\n",
                "    predictions = list()\n",
                "    b0, b1 = leastSquares(train)\n",
                "    \n",
                "    # Calculate the prediction (yhat)\n",
                "    ###\n",
                "    ### YOUR CODE HERE\n",
                "    ###\n",
                "    return predictions"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "#### Step 9\n",
                "\n",
                "Split the data into training and test sets."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true
            },
            "outputs": [],
            "source": [
                "def train_test_split(dataset, split):\n",
                "    train = list()\n",
                "    test = list(dataset)\n",
                "\n",
                "    ###\n",
                "    ### YOUR CODE HERE\n",
                "    ###\n",
                "    return train, test"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "#### Step 10\n",
                "\n",
                "Evaluate regression algorithm on training dataset."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true
            },
            "outputs": [],
            "source": [
                "def evaluate_simple_linear_regression(dataset, split=0):    \n",
                "    train, test = train_test_split(dataset, split)\n",
                "    test_set = list()\n",
                "    \n",
                "    for row in test:\n",
                "        row_copy = list(row)\n",
                "        row_copy[-1] = None\n",
                "        test_set.append(row_copy)\n",
                "        \n",
                "    predicted = simple_linear_regression(train, test_set)\n",
                "    \n",
                "    actual = [row[-1] for row in test]\n",
                "    \n",
                "    rmse = root_mean_square_error(actual, predicted)\n",
                "\n",
                "    return rmse"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "#### Step 11\n",
                "\n",
                "Visualise the dataset."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true
            },
            "outputs": [],
            "source": [
                "def visualise_dataset(dataset):\n",
                "    test_set = list()\n",
                "    \n",
                "    for row in dataset:\n",
                "        row_copy = list(row)\n",
                "        row_copy[-1] = None\n",
                "        test_set.append(row_copy)\n",
                "    \n",
                "    sizes, prices = [], []\n",
                "    for i in range(len(dataset)):\n",
                "        sizes.append(dataset[i][0])\n",
                "        prices.append(dataset[i][1])\n",
                "        \n",
                "    plot.figure()\n",
                "    plot.plot(sizes, prices, 'x')\n",
                "    plot.plot(test_set, simple_linear_regression(dataset, test_set))\n",
                "    plot.xlabel('Fertility rate')\n",
                "    plot.ylabel('Worker percent')\n",
                "    plot.grid()\n",
                "    plot.tight_layout()\n",
                "    plot.show()"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "#### Step 12\n",
                "\n",
                "Seed the random value."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true
            },
            "outputs": [],
            "source": [
                "seed(1)"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "#### Step 13\n",
                "\n",
                "Load and prepare data."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true
            },
            "outputs": [],
            "source": [
                "filename = 'fertility_rate-worker_percent.csv'\n",
                "dataset = load_csv(filename, skip=True)\n",
                "\n",
                "for i in range(len(dataset[0])):\n",
                "    string_column_to_float(dataset, i)"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "#### Step 14\n",
                "\n",
                "Evaluate algorithm."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true
            },
            "outputs": [],
            "source": [
                "split = 0.6\n",
                "rmse = evaluate_simple_linear_regression(dataset, split)\n",
                "\n",
                "print('Root Mean Square Error: %.3f' % rmse)\n",
                "visualise_dataset(dataset)"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true,
                "nbgrader": {
                    "grade": true,
                    "grade_id": "Part 2.1",
                    "locked": true,
                    "points": "1",
                    "solution": false
                },
                "editable": false,
                "deletable": false
            },
            "outputs": [],
            "source": [
                "###\n",
                "### AUTOGRADER TEST - DO NOT REMOVE\n",
                "###\n"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true,
                "nbgrader": {
                    "grade": true,
                    "grade_id": "Part 2.2",
                    "locked": true,
                    "points": "3",
                    "solution": false
                },
                "editable": false,
                "deletable": false
            },
            "outputs": [],
            "source": [
                "###\n",
                "### AUTOGRADER TEST - DO NOT REMOVE\n",
                "###\n"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true,
                "nbgrader": {
                    "grade": true,
                    "grade_id": "Part 2.3",
                    "locked": true,
                    "points": "4",
                    "solution": false
                },
                "editable": false,
                "deletable": false
            },
            "outputs": [],
            "source": [
                "###\n",
                "### AUTOGRADER TEST - DO NOT REMOVE\n",
                "###\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## Programming Part 3: Logistic Regression\n",
                "\n",
                "Write a programme that can classify a digitized image of a fine needle aspirate (FNA) of a breast mass and determine whether the diagnosis is mailgnant or benign.\n",
                "\n",
                "\n",
                "### Assessment task:\n",
                "\n",
                "Write code to produce classification graphs and determine the accurary. There are 14 steps involved, you will complete the code for steps 3, 4, 7, 8, 10, 11 and 12 only.\n",
                "\n",
                "### Marks:\n",
                "\n",
                "This part is worth 7 Marks."
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "#### Step 1\n",
                "Import the libraries."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true
            },
            "outputs": [],
            "source": [
                "import numpy as np\n",
                "from csv import reader"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "#### Step 2\n",
                "Import extra libraries, only needed for displaying the classification graph."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true
            },
            "outputs": [],
            "source": [
                "import matplotlib.pyplot as plt"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "#### Step 3\n",
                "\n",
                "Load the csv file."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true
            },
            "outputs": [],
            "source": [
                "def load_csv(filename, skip = False):\n",
                "    dataset = list()\n",
                "    ###\n",
                "    ### YOUR CODE HERE\n",
                "    ###\n",
                "    return dataset\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "#### Step 4\n",
                "\n",
                "Convert string diagnosis to number.\n",
                "\n",
                "Encode mailgnant values, M to zero and benign, B values to one."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true
            },
            "outputs": [],
            "source": [
                "def diagnosis_column_to_number(dataset, column):\n",
                "    ###\n",
                "    ### YOUR CODE HERE\n",
                "    ###\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "#### Step 5\n",
                "\n",
                "Extract only the x data."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true
            },
            "outputs": [],
            "source": [
                "def extract_only_x_data(dataset):\n",
                "    if len(dataset) == 0:\n",
                "        return\n",
                "\n",
                "    data = list()\n",
                "\n",
                "    for i in range(0, len(dataset)):\n",
                "        data.append(list())\n",
                "\n",
                "        for j in range(0, len(dataset[i]) - 1):\n",
                "            data[-1].append(float(dataset[i][j]))\n",
                "\n",
                "    return data"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "#### Step 6\n",
                "\n",
                "Extract only the y data."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true
            },
            "outputs": [],
            "source": [
                "def extract_only_y_data(dataset):\n",
                "    if len(dataset) == 0:\n",
                "        return\n",
                "\n",
                "    data = list()\n",
                "\n",
                "    for i in range(0, len(dataset)):\n",
                "        data.append(int(dataset[i][-1]))\n",
                "\n",
                "    return data"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "#### Step 7\n",
                "\n",
                "Define the sigmoid function."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true
            },
            "outputs": [],
            "source": [
                "def sigmoid(z):\n",
                "###\n",
                "### YOUR CODE HERE\n",
                "###\n",
                "    # Return the value of the implemented sigmoid function, do not simply return z\n",
                "    return z"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "#### Step 8\n",
                "\n",
                "Define the loss function."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true
            },
            "outputs": [],
            "source": [
                "def loss(y_hat):\n",
                "    # overwrite the loss value with your own code\n",
                "    loss = 0\n",
                "###\n",
                "### YOUR CODE HERE\n",
                "###\n",
                "    # Return the value of the implemented loss function, do not simply return loss of zero\n",
                "    return loss"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "#### Step 9\n",
                "\n",
                "Define the gradients function."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true
            },
            "outputs": [],
            "source": [
                "def gradients(X, y, y_hat):    \n",
                "    # number of training examples.\n",
                "    number_of_examples = X.shape[0]\n",
                "    \n",
                "    # Gradient of loss weights.\n",
                "    dw = (1\/number_of_examples)*np.dot(X.T, (y_hat - y))\n",
                "    \n",
                "    # Gradient of loss bias.\n",
                "    db = (1\/number_of_examples)*np.sum((y_hat - y)) \n",
                "    \n",
                "    return dw, db"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "#### Step 10\n",
                "\n",
                "Train the dataset."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true
            },
            "outputs": [],
            "source": [
                "def train(X, y, batch_size, epochs, learning_rate):\n",
                "    number_of_examples, number_of_features = X.shape\n",
                "    \n",
                "    print(number_of_examples)\n",
                "    print(number_of_features)\n",
                "    \n",
                "    # Initializing weights and bias to zeros.\n",
                "    weights = np.zeros((number_of_features,1))\n",
                "    bias = 0\n",
                "    \n",
                "    # Reshaping y.\n",
                "    y = y.reshape(number_of_examples,1)\n",
                "    \n",
                "    # Empty list to store losses.\n",
                "    losses = []\n",
                "    \n",
                "    # Training loop.\n",
                "    for epoch in range(epochs):\n",
                "        for i in range((number_of_examples-1)\/\/batch_size + 1):\n",
                "            \n",
                "            # Defining batches. SGD.\n",
                "            start_i = i * batch_size\n",
                "            end_i = start_i + batch_size\n",
                "            xb = X[start_i:end_i]\n",
                "            yb = y[start_i:end_i]\n",
                "            \n",
                "            print(xb)\n",
                "            \n",
                "            # Calculating hypothesis\/prediction.\n",
                "            y_hat = sigmoid(np.dot(xb, weights) + bias)\n",
                "            \n",
                "            # Getting the gradients of loss w.r.t parameters.\n",
                "            dw, db = gradients(xb, yb, y_hat)\n",
                "            \n",
                "            # Updating the parameters.\n",
                "            ###\n",
                "            ### YOUR CODE HERE\n",
                "            ###\n",
                "        \n",
                "        # Calculating loss and appending it in the list.\n",
                "        l = loss(sigmoid(np.dot(X, weights) + bias))\n",
                "        losses.append(l)\n",
                "        \n",
                "    # returning weights, bias and losses(List).\n",
                "    return weights, bias, losses"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "#### Step 11\n",
                "\n",
                "Make the predictions."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true
            },
            "outputs": [],
            "source": [
                "def predict(X, w, b):\n",
                "    \n",
                "    # X Input.\n",
                "    \n",
                "    # Calculating presictions\/y_hat.\n",
                "    preds = sigmoid(np.dot(X, w) + b)\n",
                "    \n",
                "    # Empty List to store predictions.\n",
                "    pred_class = []\n",
                "    \n",
                "    # Delete the following two lines and replace it with your own\n",
                "    for i in preds:\n",
                "        pred_class.append(0)\n",
                "        \n",
                "    # if y_hat >= 0.5 round up to 1\n",
                "    # if y_hat < 0.5 round down to 0\n",
                "    \n",
                "###\n",
                "### YOUR CODE HERE\n",
                "###\n",
                "    \n",
                "    return np.array(pred_class)"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "#### Step 12\n",
                "\n",
                "Obtain the accuracy."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true
            },
            "outputs": [],
            "source": [
                "def accuracy(y, y_hat):\n",
                "    # overwrite the accuracy value with your own code\n",
                "    accuracy = 0\n",
                "###\n",
                "### YOUR CODE HERE\n",
                "###\n",
                "    \n",
                "    return accuracy"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "#### Step 13\n",
                "\n",
                "Output the plot."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true
            },
            "outputs": [],
            "source": [
                "def plot_decision_boundary(X, w, b):\n",
                "    \n",
                "    # X Inputs\n",
                "    # w weights\n",
                "    # b bias\n",
                "    \n",
                "    fig = plt.figure(figsize=(10,8))\n",
                "    plt.plot(X[:, 0][y==0], X[:, 1][y==0], \"g^\")\n",
                "    plt.plot(X[:, 0][y==1], X[:, 1][y==1], \"bs\")\n",
                "    plt.xlim([-2, 2])\n",
                "    plt.ylim([0, 2.2])\n",
                "    plt.xlabel(\"feature 1\")\n",
                "    plt.ylabel(\"feature 2\")\n",
                "    plt.title('Decision Boundary')\n",
                "    \n",
                "    # The Line is y=mx+c\n",
                "    # So, Equate mx+c = w.X + b\n",
                "    # Solving we find m and c\n",
                "    x1 = [min(X[:,0]), max(X[:,0])]\n",
                "    \n",
                "    if(w[1] != 0):\n",
                "        m = -w[0]\/w[1]\n",
                "        c = -b\/w[1]\n",
                "        x2 = m*x1 + c\n",
                "        plt.plot(x1, x2, 'y-')\n",
                "    \n",
                "    plt.show()"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "#### Step 14\n",
                "\n",
                "Evaluate the algorithm."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true
            },
            "outputs": [],
            "source": [
                "filename = 'breast_cancer_data.csv'\n",
                "dataset = load_csv(filename, skip=True)\n",
                "\n",
                "diagnosis_column_to_number(dataset, 2)\n",
                "\n",
                "X_train_data = extract_only_x_data(dataset)\n",
                "y_train_data = extract_only_y_data(dataset)\n",
                "\n",
                "X = np.array(X_train_data)\n",
                "y = np.array(y_train_data)\n",
                "\n",
                "\n",
                "# Training \n",
                "w, b, l = train(X, y, batch_size=100, epochs=1000, learning_rate=0.01)\n",
                "# Plotting Decision Boundary\n",
                "plot_decision_boundary(X, w, b)\n",
                "\n",
                "accuracy(y, y_hat=predict(X, w, b))"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true,
                "nbgrader": {
                    "grade": true,
                    "grade_id": "Part 3.1",
                    "locked": true,
                    "points": "3",
                    "solution": false
                },
                "editable": false,
                "deletable": false
            },
            "outputs": [],
            "source": [
                "###\n",
                "### AUTOGRADER TEST - DO NOT REMOVE\n",
                "###\n"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true,
                "nbgrader": {
                    "grade": true,
                    "grade_id": "Part 3.2",
                    "locked": true,
                    "points": "4",
                    "solution": false
                },
                "editable": false,
                "deletable": false
            },
            "outputs": [],
            "source": [
                "###\n",
                "### AUTOGRADER TEST - DO NOT REMOVE\n",
                "###\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## Programming Part 4: Neural Network\n",
                "\n",
                "Write a programme that can classify two different type of data points with high accuracy.\n",
                "\n",
                "\n",
                "### Assessment task:\n",
                "\n",
                "Write code to develop artificial neural network using the moons dataset. There are 9 steps involved, you will complete the code for steps 2, 3, 5, 6, 7, 8 and 9 only.\n",
                "\n",
                "### Marks:\n",
                "\n",
                "This part is of 7 Marks. "
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "#### Step 1\n",
                "\n",
                "Import the libraries."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true
            },
            "outputs": [],
            "source": [
                "import numpy as np\n",
                "from csv import reader\n",
                "from random import seed\n",
                "from random import randrange"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "#### Step 2\n",
                "\n",
                "Load the csv file."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true
            },
            "outputs": [],
            "source": [
                "def load_csv(filename, skip = False):\n",
                "    dataset = list()\n",
                "    # Opens the file in read only mode\n",
                "###\n",
                "### YOUR CODE HERE\n",
                "###\n",
                "    return dataset"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "#### Step 3\n",
                "Split the dataset into X_train, Y_train, X_test, Y_test sets."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true
            },
            "outputs": [],
            "source": [
                "def train_test_split(dataset, split):\n",
                "###\n",
                "### YOUR CODE HERE\n",
                "###\n",
                "        \n",
                "    return X_train, y_train, X_test, y_test"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "#### Step 4\n",
                "\n",
                "Defining the Perceptron class that contains the weights, bias, learning rate and epochs."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true
            },
            "outputs": [],
            "source": [
                "class Perceptron:\n",
                "    def __init__(self, input_size, bias, learning_rate, epochs):\n",
                "        self.weights = np.zeros(input_size)\n",
                "        self.bias = bias\n",
                "        self.learning_rate = learning_rate\n",
                "        self.epochs = epochs"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "#### Step 5\n",
                "\n",
                "Define the activation function."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true
            },
            "outputs": [],
            "source": [
                "def activation_function(x):\n",
                "###\n",
                "### YOUR CODE HERE\n",
                "###\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "#### Step 6\n",
                "\n",
                "Defining the predict function with the inputs, weights and bias values."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true
            },
            "outputs": [],
            "source": [
                "def predict(inputs, weights, bias):\n",
                "###\n",
                "### YOUR CODE HERE\n",
                "###\n",
                "    return activation_function(weighted_sum)"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "#### Step 7\n",
                "\n",
                "Define the train function."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true
            },
            "outputs": [],
            "source": [
                "def train(X_train, y_train, learning_rate, epochs, weights, bias):\n",
                "    prediction = None\n",
                "    error = None\n",
                "\n",
                "    for _ in range(epochs):\n",
                "###\n",
                "### YOUR CODE HERE\n",
                "###\n",
                "\n",
                "    return weights, bias"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "#### Step 8\n",
                "\n",
                "Define the accuracy for the perceptron."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true
            },
            "outputs": [],
            "source": [
                "def perceptron_accuracy(y, y_hat):\n",
                "    # overwrite the accuracy value with your own code\n",
                "    accuracy = 0\n",
                "###\n",
                "### YOUR CODE HERE\n",
                "###\n",
                "    \n",
                "    return accuracy"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "#### Step 9\n",
                "\n",
                "Implemented the Perceptron Nerual Network."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true
            },
            "outputs": [],
            "source": [
                "# Set the seed\n",
                "seed(1)\n",
                "\n",
                "# Load the csv file\n",
                "\n",
                "filename = 'moons.csv'\n",
                "dataset = load_csv(filename, skip=True)\n",
                "\n",
                "# Configure the perception with the bias, learning rate and epochs\n",
                "\n",
                "# Note the initial values are dummy and must changed for an accurate network\n",
                "\n",
                "# The split value for the training and test sets\n",
                "custom_split = 0\n",
                "\n",
                "# The bias term is a constant value added to the weighted sum of inputs\n",
                "custom_bias = -1\n",
                "\n",
                "# The learning rate controls how much the weights are adjusted during training\n",
                "custom_learning_rate = -1\n",
                "\n",
                "# The number of epochs defines how many times the perceptron will iterate over the training data\n",
                "custom_epochs = -1\n",
                "\n",
                "# Set your values here\n",
                "\n",
                "###\n",
                "### YOUR CODE HERE\n",
                "###\n",
                "\n",
                "# Split the dataset for both training and testing\n",
                "\n",
                "X_train, y_train, X_test, y_test = train_test_split(dataset, split=custom_split)\n",
                "\n",
                "perceptron = Perceptron(input_size=2, bias=custom_bias, learning_rate=custom_learning_rate, epochs=custom_epochs)\n",
                "\n",
                "# Training\n",
                "weights, bias = train(X_train, y_train, perceptron.learning_rate, perceptron.epochs, perceptron.weights, perceptron.bias)\n",
                "\n",
                "# Predictions\n",
                "y_hat = []\n",
                "\n",
                "# Testing\n",
                "for i in range(len(X_test)):\n",
                "    prediction = predict(X_test[i], weights, bias)\n",
                "    y_hat.append(prediction)\n",
                "    print(f\"Input: {X_test[i]}, Predicted: {prediction}, Actual: {y_test[i]}\")\n",
                "\n",
                "# Test for Accuracy\n",
                "perceptron_accuracy(y_test, y_hat)"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true,
                "nbgrader": {
                    "grade": true,
                    "grade_id": "Part 4.1",
                    "locked": true,
                    "points": "2",
                    "solution": false
                },
                "editable": false,
                "deletable": false
            },
            "outputs": [],
            "source": [
                "###\n",
                "### AUTOGRADER TEST - DO NOT REMOVE\n",
                "###\n"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true,
                "nbgrader": {
                    "grade": true,
                    "grade_id": "Part 4.2",
                    "locked": true,
                    "points": "5",
                    "solution": false
                },
                "editable": false,
                "deletable": false
            },
            "outputs": [],
            "source": [
                "###\n",
                "### AUTOGRADER TEST - DO NOT REMOVE\n",
                "###\n"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true
            },
            "outputs": [],
            "source": []
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "collapsed": true
            },
            "outputs": [],
            "source": []
        }
    ],
    "metadata": {
        "kernelspec": {
            "display_name": "Python 3 [3.7]",
            "language": "python",
            "name": "python3"
        },
        "language_info": {
            "codemirror_mode": {
                "name": "ipython",
                "version": 3
            },
            "file_extension": ".py",
            "mimetype": "text\/x-python",
            "name": "python",
            "nbconvert_exporter": "python",
            "pygments_lexer": "ipython3",
            "version": "3.7.5"
        },
        "vscode": {
            "interpreter": {
                "hash": "1d4188c33f36905e678e92ff8f6179d9eb1f8fe291f219f4a358da3f222593f9"
            }
        }
    },
    "nbformat": 4,
    "nbformat_minor": 2
}