Programme Structure

  • 90 ECTS programme
  • One full year in duration, beginning September and finishing August
  • Comprises:
    • Foundational taught modules (20 ECTS)
    • Advanced taught modules (40 ECTS)
    • Research/Industry Project (30 ECTS)
  • Semester 1 exams: December.
  • Semester 2 exams: April/May.
  • Assessment is based on a combination of assignments and written exams.
  • The final project is submitted in August and assessed through a combination of a written thesis, an oral examination and a demonstration.

Download

Course Syllabi

CT422 Modern Information Management

Information Retrieval and Filtering. Text Retrieval Models: Boolean, Statistical, Linguistic. Vector Space Model, Latent Semantic Indexing, Semantic Networks, Connectionist approaches. Multi-Media Retrieval. 

  • Evaluation: Precision/Recall Measures. Relevance Feedback. 
  • Collaborative Retrieval. Distributed Information Retrieval. 
  • Parallel Information Retrieval. Data Mining. Data Warehousing. 
  • Lexical Analysis. Stemming Algorithms. Machine Learning. Indexing. 
  • HCI and Information Visualisation.

CT475 Machine Learning and Data Mining

Definitions of Machine Learning, Data Mining and the relationship between them; the CRISP Data Mining process model; major tasks including classification, regression, clustering, association learning, feature selection, and reinforcement learning; algorithms for these tasks that may include decision tree learning, instance-based learning, probabilistic learning, support vector machines, linear and logistic regression, and Q-learning; open-source software tools for data mining; practical applications such as sensor data analysis, healthcare data analysis, and text mining to identify spam email; ethical issues and emerging trends in data mining and machine learning.

CT5100 Data Visualisation

This module with teach the fundamentals of data visualization. It will cover basic design principles and the principles underlying human perception, color theory and narrative. It will focus on the use of open standards for the presentation of data on the Web such as HTML, CSS, SVG, JavaScript through the use of libraries such as D3.js, jQuery.js and Dimple.js.

CT5101 Natural Language Processing

Overview of human language, linguistics, NLP. Levels of linguistic analysis, algorithms, evaluation. NLP applications (information extraction, ontology learning, question answering, opinion mining, machine translation, entity linking). NLP toolkits (GATE, UIMA, NLTK). Text analytics, approaches, applications. Future directions (linguistic linked data, linked data architectures for NLP, multilingual semantic web).

CT5102 Programming for Data Analytics

This module will provide an introduction to programming for data analytics using open source programming tools. It will focus on the R programming language and its associated powerful frameworks for data manipulation, analysis and visualisation such as caret and ggplot. Topics will include R programming fundamentals, Data Loading, Data cleaning transformation and merging, Exploratory Data analysis and visualisation, use of machine learning libraries for regression, time series and classification operations.

CT5103 Case Studies in Data Analytics

Case studies will be presented as standalone blocks by practitioners, researchers and academics in areas that are considered relevant to many real world problems that may be encountered by graduates of the programme. Students are expected to use this module to conceptualise the problems involved in their end -of-year project. This module will examine a number of case studies of domains in which data analytics are increasingly applied. For example, energy informatics, medical informatics and in Web personalisation.

CT5104 Web Mining

This module will provide the student with the skills to extract, clean and analyse data from the Web. The focus will be graph and network analytic approaches to Web-mining. Topics include: graph theory, network modeling, social network analysis, community-finding techniques, models of information diffusion, link prediction, evaluation techniques. There will be practical sessions on using graph-data bases and graph visualization tools such as Gephi. The student will learn how to apply Web mining techniques to applications such as recommender systems, adaptive personalisation, authority ranking.

CT5105 Tools and Techniques for Large Scale Data Analytics

Large-scale data analytics is concerned with the processing and analysis of large quantities of data, typically from distributed sources (such as data streams on the internet). This module introduces students to state-of-the-art approaches to large-scale data analytics. Students learn about foundational concepts, software tools and techniques for the scalable storage, processing and predictive analysis of high- volume and high-velocity data, and how to apply them to practical problems. Topics will include: Definition of large-scale computational data analytics; Overview of approaches to the processing and analysis of high volume and high velocity data from distributed sources; Applications of large-scale data analytics; Foundations of cluster computing and parallel data processing; The Hadoop and Spark ecosystems. MapReduce; Advanced programming concepts for large-scale data analytics; Concepts and tools for large-scale data storage; Stream data analytics. Complex Event Processing (CEP); Overview of computational statistics and machine learning in the Hadoop/Spark universe; Techniques and open-source tools for large-scale predictive analytics; Privacy in the context of large-scale data analytics.

CT5107 Advanced Topics in Machine Learning and Information Retrieval

The module will build on students’ knowledge of machine learning methods. The module will over the following topics: Probabilistic approaches to Information Retrieval; Language Modelling; Learning to Rank; Neural Networks, Deep learning; Support Vector Machines; Ensemble Techniques; Principal Component Analysis; Dimensionality Reduction; Emerging Trends. Students will learn how to improve machine learning performance using techniques such as:

  1. feature engineering;
  2. ensemble methods, boosting and bagging.

The module will feature extensive coverage of neural networks and deep learning:

  • Moving from logistic regression to feed-forward neural networks;
  • Auto-encoding neural networks;

Deep learning neural networks: technical details, success stories, and tools for deep learning; Advanced concepts in deep learning: sparse auto-encoders; ReLU; convolutional neural networks;

Students will undertake a substantial assignment to implement a deep neural network learning system from scratch.

CT5108 Data Analytics Project

On successful completion of this module the learner will be able to apply a variety of data analytic techniques to solve a real world problem diagnose a problem and design a data-analytics based solution conduct and report on exploratory analysis of the problem domain produce an in-depth report (thesis) describing the problem, the diagnosis and approaches to solving it demonstrate that they can research, apply and evaluate state-of-the-art techniques in data-analysis. This project requires a demonstration of in-depth analysis, problem solving and reporting of a data analytic problem.

CT561 Systems Modelling and Simulation

Simulation is a quantitative method used to support decision making and predicting system behaviour over time. This course focuses the system dynamics approach. The course covers the fundamentals of simulation, and describes how to design and build mathematical models. Case studies used include: software project management, public health policy planning, and capacity planning.

DER5101 Linked Data

This module will teach fundamentals of Linked Data and related standards, including the main principles distinguishing Linked Data from standard database technology. It will focus on designing linked data applications and students will learn how to design ontologies, produce linked data-sets, generate links between data-sets and explain the overall architecture of data integration systems based on Linked Data. It presents techniques for querying and managing Linked Data that is available on today’s Web. A large part of the module is devoted to query processing in different setups. The module will focus on managing large-scale collections of Linked Data. It will present methods to publish relational data as Linked Data and efficient centralized processing. It then addresses advanced topics, such as efficient reasoning, and query optimisation for large-scale linked data-sets.

EE445 Digital Signal Processing

This module provides and introductory course in digital signal analysis covering topics such as Discrete-time systems, time-domain analysis. The z-Transform. Frequency-domain analysis, Discrete Fourier Transform (DFT). Digital filter structures and implementation. Spectral analysis with the DFT, practical considerations. Digital filter design: IIR, FIR, window methods, use of analogue prototypes.

MA204 Discrete Mathematics

This course deals with elementary enumeration, permutations, combinations, and graphs including eulerian and hamiltonian graphs.

Module Learning Outcomes. On successful completion of this module the learner should be able to:

  1. Distinguish between orderings (permutations) and subsets (combinations).
  2. Count the size of unions and intersections of sets and solve elementary recurrences.
  3. Define and apply Binomial and multinomial coefficients to enumeration problems.
  4. Use tree graphs for enumeration.
  5. Use trees to write algebraic expressions in Polish and Reverse Polish notation.
  6. Define the notion of graph, eulerian, hamiltonian, bipartite and tree graphs.
  7. Define the notion of graph colourings and applications to scheduling problems.

Indicative Content

The course introduces the fundamentals of Discrete Mathematics. How to count, the addition rule, the multiplication rule. the Inclusion-Exclusion formula. Permutations and Combinations. the Binomial coefficients and some identities. Recurrences, the Fibonacci numbers, Derangements, "Tower of Hanoi". Distributions, Multinomial coefficients. Introduction to Graph Theory, Euler and the Koenigsberg Bridges Problem. Eulerian and Hamiltonian graphs. Tree graphs and bipartite graphs. Ordered Rooted trees, Polish and Reverse Polish Notation. Planarity of Graphs. Eulers formula for a connected planar graph. Colouring of Graphs the Welsh-Powell algorithm. Applications to simple scheduling problems.

Module Resources

  • Discrete Mathematics in the Schaum Outline series
  • Discrete Mathematics; N. L Biggs (OUP)

MA215 Mathematical Molecular Biology I

The module begins with a brief overview of some of the key concepts in molecular sequence biology, including DNA and DNA sequencing, the genetic code, the Central Dogma of molecular biology, genome biology, molecular evolution and phylogenetics. Some concepts in graph theory are introduced, followed by a demonstration of graph theoretical methods applied to the genome assembly problem, which consists of assembling collinear genome sequences from short, random fragments of the sequence that are generated in genome sequencing projects. The problem of aligning homologous (related by descent) sequences is introduced and solved using a dynamic programming algorithm. The course covers algorithms to infer evolutionary relationships (i.e. phylogenetic trees), using concepts such as evolutionary parsimony and genetic distance. Transformational grammars are introduced as well as their applications to the description of amino acid sequence motifs and the structure of RNA molecules. Depending on time, the course may include a review of concepts in systems biology and the analysis of biological networks.

MA461 Probabilistic Models For Molecular Biology

This course will cover the application of probabilistic modelling to several important problems in molecular biology and/or systems biology. We will begin with a review of Markov chains, including continuous-time chains and hidden Markov models. Applications of models such as these to key problems in molecular biology include the alignment of molecular sequences, the identification of genes in genomic sequences (gene-finding), finding genomic regions with shared epigenetic features, molecular phylogenetics, and the analysis of genome-wide genotype data (including the inference of population structure and the haplotype phasing problem). We will consider several such applications, moving from textbook examples to more recent developments from the current bioinformatics literature.

MP305 Modelling I

This module introduces the student to modelling techniques for four different real-world problems. The problems cover topics such as network-flow optimisation, activity networks, network analysis and game theory.

ST1100 Engineering Statistics

This course presents an introduction to the basic concepts of probability theory along with the standard techniques for statistical analysis of data (such as calculating parameter estimates and confidence intervals, working with linear regression models) with a focus on methods and data arising in engineering.

ST235 Probability

This is an introductory course to probability theory. Topics include: algebra of events, concepts of conditional probability and independence of events; random variables (rv); discrete and continuous propability distributions; expectation, variance and functions of rv-s; probability and moment generating functions; basic probability inequalities.

Module Learning Outcomes. On successful completion of this module the learner should be able to:

  1. Apply basic laws of probability theory to calculate probabilities of composite events obtained by applying set operations 
  2. Apply correct combinatorial random sampling rules and calculate probabilities 
  3. Use basic properties of probability distributions to calculate derived quantities
  4. Calculate expectations, conditional expecations and variance of a variety of r.v.-s
  5. Prove main theorems and results connecting basic probaility concepts including joint and conditional rv-s 
  6. Understand common properties and differences of discrete and continuous r.v.-s 
  7. Calculate expectations, variances and distributions of functions of rv-s
  8. Apply generating functions to calculate corresponding distributional properties

Indicative Content

This is an introductory course to probability theory. Topics include: algebra of events, probability spaces, conditional probability, independence of events; combinatorics and random sampling; concept of a random variable (rv); discrete and continuous probability distributions (mass, density and cumulative distribution functions); functions of rv-s; properties of expectation and variance; conditional and joint rv-s and probability distributions; probability and moment generating functions; Markov and Chebyshev inequalities; Weak law of large numbers; Central limit theorem.

Module Resources

  1. C. Grinstead and L. Snell, Introduction to Probability, American Mathematical Society (free online copy)
  2. Hoel, Port, Stone, Introduction to Probability Theory, Houghton & Mifflin
  3. Stirzaker, Probability and Random Variables, Cambridge

ST236 Statistics Inference

An introduction to the ideas of statistical inference from a mathematical perspective. Topics covered include: populations and samples, properties of estimators, likelihood functions, principles and methods of point estimation, interval estimates, hypothesis testing and construction of tests.

Module Learning Outcomes. On successful completion of this module the learner should be able to:

  1. Construct a full sampling distribution for a simple, small sample probability model and calculate the properties of standard estimators such as the sample mean and variance;
  2. Derive a likelihood function for random samples from a probability model and under more complex sampling schemes, eg mixed populations, censoring;
  3. Calculate simple unbiased estimators and calculate optimal combinations of estimators;
  4. Find maximum likelihood estimators by solving the score equation and obtain an estimate of precision based on observed and expected information;
  5. Find confidence intervals for simple problems using pivotal quantities;
  6. Calculate the size and power function for a given test procedure;
  7. Obtain a most powerful test of two simple hypotheses using the Neyman Pearson lemma and extend this to a uniformly most powerful test of one-sided alternatives;
  8. Use the likelihood ratio procedure to derive a test of nested hypotheses for some simple statistical models.

Indicative Content

This course provides and introduction to the ideas and mathematics of statistical inference. Topics covered include: 1. Basic notions: populations and samples, sampling distributions, estimates and estimators, the likelihood function.

  1. Point estimation: general concepts, criteria including consistency, unbiasedness, minimum variance; methods of constructing estimators, unbiased estimation and MVUE, method of moments, maximum likelihood.
  2. Interval estimation: confidence intervals, likelihood intervals.
  3. Hypothesis testing: simple and composite hypotheses, type I and type II error, size and power, most-powerful tests, Neyman Pearson Lemma, uniformly most powerful tests, Likelihood ratio tests.

Module Resources

  1. Statistical Inference (2nd Ed) by Casella & Berger, Duxbury.
  2. Introduction to the Theory of Statistics, Mood, Graybill & Boes, McGraw Hill
  3. Probability and statistical inference by Robert V. Hogg Elliot A Tanis, MacMillan

ST312 Applied Statistics II

Methods and applications in applied statistical inference. This module discusses factors for consideration in experiment design and demonstrates methods in the analysis of data emerging from designed experiments. Topics covered include confounding, blocking, a completely randomized design and a randomized block design, two-way ANOVA. The module also demonstrates regression modelling for a qualitative response, i.e. methods in logistic regression and generalized linear models.

ST313 Applied Regression Models

This course gives a basic introduction to regression modelling. The topics covered include:

  1. Populations and samples, correlation and association, response and explanatory variables.
  2. Simple linear regression: estimation using least-squares, properties of estimators, inference on parameter estimates, construction and use of ANOVA table, confidence and prediction intervals, residuals and model diagnostics.
  3. Multiple regression: matrix formulation of general linear model, least-squares estimation, properties of estimators, inference on parameter estimates, ANOVA table, fitted values, residuals, the hat-matrix, predictions, diagnostics and model checking.
  4. Model choice and variable selection: testing of nested models, varaible selection criteria, stepwise and best subsets variable selection methods.
  5. Categorical explanatory variables: use of indicator variables for categoprical variables, test of overall significance, analysis of covariance, interaction.
  6. Practical computer lab sessions: Use of Minitab statistical software to fit regression models, statistical report writing, including a group project and presentation.

ST412 Stochastic Processes

The goal of the course is to introduce the main ideas and methods of stochastic processes with the focus on Markov chains (processes with discrete time index and finite state space). The topics include:

  1. A review of probability theory: Discrete and continuos random variables (r.v.), joint and conditional distributions, expectations, variance, sums of iid r.v.-s, conditional expectation;
  2. probability generating functions, moment generating functions
  3. Random sums of r.v.-s;
  4. Branching processes;
  5. Markov property and Markov chains (MC): Random walk with absorbing barriers (Gambler’s ruin); Classification of states for a finite discrete Markov chain; Stationary and limiting prob. distribution of Markov chains; Random walk in 2 and more dimensions; Mean first passage times;
  6. Poisson process (independent increments formulation; inter-arrival times formulation); (vii) Applications of stochastic processes in finance, bioinformatics, computer science.