Datasets
- BBC Datasets – Two text corpora consisting of news articles, particularly suited to evaluating cluster analysis techniques.
- Stability Topic Corpora – Text corpora for benchmarking stability analysis in topic modeling.
- Multi-View Twitter Datasets – Four pre-processed Twitter datasets, used for evaluating multi-view network analysis methods.
- News Curation Datasets – A collection of pre-processed Twitter datasets for evaluating criteria for Twitter user list curation.
- Irish Blog Network – Text and network data originating from a study of the state of the Irish blogosphere in 2011.
- Irish Economic Sentiment Collection – A sentiment analysis text corpus, compiled from articles published in three Irish online news sources in 2009.
- 3Sources Collection – A multi-view text corpus, constructed from news articles from three online news services.
- 3Sources Collection – Two datasets for evaluating dynamic clustering algorithms, originating from news articles and social bookmarking data.
- Synthetic Multi-view Datasets – A set of synthetic text datasets for the evaluation of multi-view learning algorithms.
- CBR Conference Series Dataset – Network and text data constructed from the publications of the CBR conference series (1993-2008).
- 20 Newsgroups Subsets – A large number of artificially constructed text datasets, originating from the popular 20 Newsgroups corpus.
Software
- Curatr - Python implementation of Curatr, an online platform which provides access to the British Library Digital Collection, developed as part of the VICTEUR Project
- Topic Ensembles - A Python reference implementation of methods for stable ensemble topic modeling with Non-negative Matrix Factorization.
- Dynamic Topic Modeling - A Python implementation of a new approach for Dynamic Topic Modeling via Non-negative Matrix Factorization.
- Topic Stability – A Python implementation of an algorithm for using stability analysis to select the number of topics for topic modelling.
- Unified Graph – A Python implementation of an approach for producing a unified graph from multiple views of a social network.
- Dynamic Community Finding – A C++ reference implementation of an algorithm for dynamic community tracking, published at ASONAM 2010.
Slides
- “Constructing Social Networks of Irish and British Fiction”, presented at Symposium on Digital Culture, Big Data and Society (February 2018) [PDF]
- Tutorial on “Topic modelling with Scikit-learn”, presented at PyData Dublin (September 2017) [PDF] [Code]
- Tutorial on “Practical Social Network Analysis with Gephi” (June 2014) [PDF]
- “Stability Analysis for Topic Models” (May 2014) [PDF]