Year 1 Sem 2 Review

5 minute read

CZ1007 Data Structures

Offered in: S1, S2

Course Aims: This core programming course aims to develop your understanding in data structures such as linked lists, stacks, queues and trees that are important for building efficient programs in C programming language, and are essential for future programming and software engineering courses.

My learning outcome: The first half of this course focuses on teaching us C programming language in order to implement the data structures in the second part of the course. Here, I learned how to create counter & sentinel controlled loops besides recursive functions in C. I was also introduced to the concept of a pointer, which stores the address of a variable.

The second half of this course focuses on the implementation of data structures in C. Here, I learned that there were different types of data structures such as linked lists, stacks, queues and trees. Each type of data structure serves a purpose for efficiently solving a specific problem. Learning how and when to implement a certain type of data structure is important for building efficient programs with optimized memory allocation.

Assessment :

  • Midterm 1 - 40%
  • Midterm 2 - 40%
  • Assignments - 20%

How I Prepared for Each Assessment:

  1. For Midterm 1, I redid the lab questions and finished all the practice questions provided. I did not memorize the code written but instead remember the pseudocode for them. The practice questions will cover all relevant topics taught in the first half of this course which are the basics of C programming language, arrays, pointers, strings and recursion. In addition, I tried out C practice questions in coding websites such as HackerRank or LeetCode.

  2. For Midterm 2, I redid the lab questions finished all the practice questions provided. I went through the coding questions I redid and remembered the pseudocode. The practice questions will cover all relevant topics taught in the second half of this course which are linked lists, stacks, queues and trees.

  3. For the assignments, I did the questions after the lab sessions each week as starting the assignment after finishing the online lectures can be quite hard. I also did some of the practice questions before starting the assignment as well as the practice questions will gauge my basic understanding of the topic. I also took my time to do the assignment and check that the program works fine on multiple test cases without any errors. If I was unsure how to solve a problem in the assignment, Google was my friend!

Reflections: The lab tutor for my lab session was Prof Hui Siu Cheung. While Prof Hui has a very funny HK accent, I like his style of teaching which was very interactive and hands-on. He was also very friendly and patient when you ask him for help with any questions. I also collaborated with my friends on the assignments that we were given, also when we helped each other out for preparing for both midterms.

CZ1016 Introduction to Data Science

Offered in: S1, S2

Course Aims: This core programming course aims to develop your understanding in data structures such as linked lists, stacks, queues and trees that are important for building efficient programs in C programming language, and are essential for future programming and software engineering courses.

My learning outcome: This course was very comprehensive in terms of breadth, because we learnt all of the basic algorithms for different types of common data science problems. Some of the algorithms that I learnt is

  1. Regression - Linear Regression and Multivariate Regression. Here, I learnt how to create a linear model and how to define and minimize a cost function for the linear model.

  2. Classification - Decision Trees and Random Forest. Here, I learnt how a decision tree functions. In general, a patent node in a decision tree is split to maximize the purity of the child node. I also learnt why random forest is better than an individual decision tree along with the metrics for judging whether a classification model is good or not (i.e. accuracy, true positive rate and false positive rate).

  3. Clustering - K-Means. Here, I learnt how K-Means works along with its drawbacks. I also learnt about the algorithms which can overcome the drawbacks of K-Means(i.e. DBSCAN, Expectation Maximization and Hierachical).

  4. Anomaly Detection - Local Outlier Factor (LOF). Here, I learnt how LOF algorithm works and how it can be used to detect anomalies.

Assessment :

  • Midterm 1 - 40%
  • Midterm 2 - 40%
  • Assignments - 20%

How I Prepared for Each Assessment:

  1. For the LAMS quizzes, I discussed with my friends on certain questions where the answer is not clear cut. Then again, if we were still unsure we rewatched the LAMS.

  2. For the assignment, the topic given was the Don’t Overfit challenge on Kaggle. I started out not knowing how to approach the problem but with the help of the top kernels available, I decided to use a random forest algorithm. As per the title of not overfitting, I used feature importance to determine the top few features before using a grid search to determine the best hyperparameters for the random forest. I also discussed with my friends Vincent and Tanay on our respective approaches for solving this challenge, which was a win-win for all of us as we all improved our score in the end.

  3. For the assignments, try to do the questions after the lab sessions each week as starting the assignment after finishing the online lectures can be quite hard. You can also try out the practice questions before starting the assignment as well as the practice questions will test your basic understanding of the topic. I recommend that you take your time to do the assignment and check that the program works fine on multiple test cases without any errors. If you are unsure how to solve a problem in the assignment, Google is your friend!

Reflections: The lab tutor for my lab session was Dr Sourav. I like the way Dr Sourav makes any topic in this course so interesting by linking how real life problems can be solved by the algorithms that we learnt. He was also very friendly and full of ideas when me and my group mates for the project asked him for pointers when we were stuck at a problem. I learned a lot on random forests and how to tune its performance when working on the assignment Dr Sourav gave which was the Don’t Overfit Challenge on Kaggle. In the project, I googled and looked through a lot of Medium articles to understand and deploy a Convolutional Neural Network (CNN) for digit recognization. Till now, I dare not say I know everything about CNNs but I am proud to say I have used it for the project.