Dr Michael Madden's research is focused on new theoretical advances in machine learning and data mining, motivated by important practical applications, on the basis that challenging applications foster novel algorithms which in turn enable new applications. Specific research topics include: Artificial intelligence; Algorithms for classification and numeric prediction; New methods for combining domain knowledge with data mining; Time series data analysis; Reasoning under uncertainty; Bayesian networks; Reinforcement learning; Applications in science, engineering & medicine.

Website:  http://datamining.it.nuigalway.ie/

Mining and Visualization of Historical Census Data

Data from the censuses on Ireland in 1901 and 1911 have recently been digitised and are available on the National Archives website ( http://www.census.nationalarchives.ie/). This has led to renewed interest among people of Irish heritage in finding out about their families¡¯ histories. In our research group, we have unique access to a full copy of this data in structured form, and we have used it to build a map-based exploration tool http://danu2.it.nuigalway.ie/censusmap/, as well as to develop algorithms for record matching.

This project will explore how the data from these census records and others can be organised and visualised in ways that differ from the original simple list of details, and will implement these visualisations. New techniques for record matching will be developed, in order to construct more detailed family trees from the information contained in one census or another. Research will also be conducted into how to apply techniques from social network analysis to this domain.

Probabilistic Model-Based Data Stream Analytics

There is an increasing need to analyse multiple streams of data in real time. The goal of this work is to develop new techniques and tools to respond to this need. There are several characteristics of data streams that distinguish them from other forms of data.

Current techniques for data stream mining are purely data-driven, and do not include background knowledge. Accordingly, the goal of this project is to devise, research, and implement a new approach for data stream analytics: this will be based on probabilistic methods and designed to exploit any available background knowledge. This work will build on our recent research, such as described in Enright, Madden & Madden: ¡°Bayesian networks for mathematical models: Techniques for automatic construction and efficient inference¡±, International Journal of Approximate Reasoning, 2012.

Personalised Intensive Care Medicine Using Probabilistic Model-Based Data Analytics

Much of what we know about the human body is described formally as sets of mathematical equations. While these provide a succinct and accurate description of the general behaviour of the system being modelled, they are not attuned to individuals. Every patient in an Intensive Care Unit has different medical problems, so for example their response to drugs may be quite different from the normal response. Therefore, for medical models to be useful in understanding how individual patients will be respond, model parameters must be ¡°individualised¡±. In the ICU, we can use real-time data from bed-side monitors and lab results to tune models to individual patients, but some results are infrequent and there can be errors in data (for example if leads are disconnected or monitors need recalibration).

In this project, the goal is to build on our existing work in order to create new knowledge-intensive data analysis methods that can improve the provision of modern evidence-based medicine. This will involve interdisciplinary research collaborations with mathematicians and end-user clinical experts. With our clinical collaborators and Ethics approval, we will curate a new dataset of anonymised patient records. This will be used to research and develop new data mining methods in which domain knowledge will be central, and to evaluate their ability to improve physicians¡¯ understanding of critical care data. The result will be a new approach whereby clinical knowledge, even though incomplete and approximate, can be encoded automatically, refined with data from the population level to patient level, and used to generate actionable and meaningful information to support decision-making.

New Techniques for Analysis of Intensive Care Unit Data

In intensive care units (ICUs), computer systems are used to monitor patients continuously, and information systems are also used to record patient histories, lab results, and medicines prescribed. At present, however, these systems are primarily used for record-keeping, rather than to assist doctors in decision-making. In our ongoing work, we are developing new techniques for analysis of these complex time-series, working with our collaborators in University Hospital Galway as well as external collaborators in University of California Berkeley. We have had success with a probability-based technique called the Dynamic Bayesian Network (DBN), as described in: ¡°Clinical Time Series Data Analysis Using Mathematical Models and DBNs¡±, by Enright, Madden, Madden & Laffey, AI in Medicine 2011. The goal of this project will be to extend our existing work, by replacing the system models we use with new ones and by improving the DBN-based analytical methods we have developed. Alternative methods for time series analysis of this data should also be explored.

New Methods for Probabilistic Anomaly Detection in Real Time

There is an increasing need to analyse multiple streams of data in real time. The goal of this work is to develop new techniques and tools to respond to this need. In particular, this project will focus on the task of detecting anomalies/outliers in data in real time. This has applications in medicine, chemistry, engineering and many other disciplines. We have previously developed new methods for one-class classification, which forms the basis for anomaly detection: by characterising one group or class of data well, you can identify when new samples fall outside that group. More recently, we have been working on probabilistic data stream mining methods. This project will build on both of those strands of work, and result in new techniques for one-class classification that can be applied to data streams in real time, and that use probabilistic reasoning (e.g. Bayesian network methods).

Distributed Data Mining With Low Cost Processors and Geolocation

Low-cost ARM based processors, such as found in smart phones and the Raspberry Pi, are an interesting alternative to desktop computers. We wish to develop data mining and machine learning techniques that can operate effectively on such small devices. These would have applications in sensor processing and time series analytics.

There are two possible approaches that could be explored: (1) algorithms that have low computational requirements, and so can be implemented directly on such processors; (2) approaches in which data mining models are built on a more powerful computer and then deployed to a small processor. For example, it may be computationally expensive to inductively learn a Bayesian network, Support Vector Machine or Decision Tree from data, but the resulting network/machine/tree can be represented as a small piece of code. This will build on our previous work in development of techniques and algorithms for time series data mining, and our separate work on distributed data mining. In addition, because mobile devices generally have GPS units or other ways of determining their location, this project will investigate how data from physically distributed sensors can be integrated and analysed as a whole. This could be applied, for example, to develop methods to improve the estimation of arrival/departure times in public transport, or to improve the estimation of power demands in an electricity network.

Artificial Intelligence for Non-Player Characters in Computer Games

In recent years, our research group has been working on new techniques for incorporating machine learning into non-player characters (NPCs) in first-person shooter games. See for example, ¡°DRE-Bot: A Hierarchical First Person Shooter Bot Using Multiple Sarsa(¦Ë) Reinforcement Learners¡± by Glavin & Madden, CGAMES USA 2012. We work in particular on techniques based on reinforcement learning, in which a computer system can learn to play better through trial and error, with the goal of maximising its long-term score. We have developed a hierarchical approach to breaking a game into separate modes that require different strategies, and then using separate reinforcement learners to perform well at each strategy. We have also developed methods to transfer experience gained in simple environments to more complex ones, inspired by how humans learn.

This project will build on our past work in this area, and will involve research into different methods for planning, reasoning, and other forms of artificial intelligence in games, in order to improve the performance of NPCs when playing against other intelligent NPCs and against humans. This work is likely to involve participating in gameplay contests against other human and computer players.

Comparison, Clustering and Search for Chemical Spectroscopy Data

The Machine Learning and Data Mining Group has conducted successful research over many years on applying machine learning methods to the field of chemometrics (statistical analysis of chemical data). This work has led to publications, patents and a spin-out company. Our work to date has focused on mixture analysis and classification tasks. This project will extend our previous work and focus on other tasks of importance to spectroscopy:

  • Comparison: given two spectra, evaluate on a numerical scale how similar they are to each other (from 0 if completely identical to 1 if completely different: having no materials in common)
  • Clustering: given a comparison function, the difference between all pairs of spectra in a database can be computed, and the results used to identify clusters; i.e. groups of entries that are similar to each other and different from the rest
  • Search: given a comparison function and the spectrum of an unknown substance, a ranked list of the most similar substances in the database can be produced; furthermore, if clustering has been performed, the unknown substance can be positioned relative to the clusters.

Although standard methods for these tasks exist, we believe that we can improve on them, as we have done for classification tasks, by applying machine learning methods to them and by developing new methods that have underlying assumptions that are well aligned to the characteristics of the spectroscopy data.

Non-Player Characters in Computer Games that Pass the Turing Test

The Turing Test, proposed by Alan Turing over 60 years ago, is an Artificial Intelligence test that is based on a human judge interacting with an agent that may be controlled by a human or a computer: if the judge cannot reliably distinguish a computer agent from a human, it can be claimed the computer is exhibiting intelligence. This idea has been given a new lease of life by the emergence of non-player characters (NPCs) in modern computer games. In multi-player games, you may end up playing against an NPC or a human, and ideally the experience of playing against an NPC should be as good as playing against a well-matched human.

In recent years, our research group has been working on new techniques for incorporating machine learning into NPCs in first-person shooter games. See for example, ¡°DRE-Bot: A Hierarchical First Person Shooter Bot Using Multiple Sarsa(¦Ë) Reinforcement Learners¡± by Glavin & Madden, CGAMES USA 2012. This project will extend our work in this area, and will focus on machine learning algorithms and computer architectures that enable NPCs to be more human-like and to balance their performance with that of human players, to maximise the quality of the experience for human players. This work is likely to involve participating in gameplay contests against other human and computer players.