DICE
Research People Publications News Contact
DICE
Research People Publications News Contact

Research

Explaining document classification decisions to users

Real-world Machine Learning applications often require explainable solutions. This has become more apparent with the introduction of Deep Learning models which are by nature not explainable. We work on two different types of explainability. The first type is explainability by design, where the model learns to extract pieces of input text as justifications — rationales — that are tailored to be short and coherent, but sufficient for accurate prediction (usually document classification). The second type of explainability that we are considering is a post-processing step, which is trying to identify parts of the input that are responsible for the decision of the model, without being designed or trained for this task.
Read more

Regulatory document analysis for compliance applications

Thousands of new regulation documents are published every year. In order to achieve compliance, organisations need to put a lot of manual effort to retrieve all relevant new regulation and understand it. Specifically, the retrieval involves trying different combinations of keywords to query general-purpose regulatory databases and then spending time to go through the results to distill only the most relevant documents. We address this problem by working on novel Information Retrieval methodologies that receive documents as input (ex: documents describing existing controls of an organisation, or legislation documents of a country).
Read more

Analysis of annual financial reports for auditing applications

Annual financial reports play an important role in the financial audit process. Auditors usually check the numbers in financial statements. The text of the reports could also provide valuable information but it is very time-consuming to check. In addition, the introduction of XBRL (eXtensible Business Reporting Language) as a requirement for tagging reported financial values, introduces more challenges for auditors. We aim to automate the analysis and tagging of the text of financial reports using Machine Learning. To this end we focus on the following research challenges: (a) Classification of long texts with imbalanced class distribution (b) Numeric Entity Recognition.
Read more

Classification of documents with missing data

In many practical applications in the financial and the legal domains, thousands of documents need to be annotated with one or more of possibly tens or thousands of labels. In addition to their size, the label sets are frequently updated, making it very impractical to maintain the correct labels per document. Therefore, one would like to train document classifiers that assign labels automatically. Training such classifiers with machine learning methods is a challenge, not only due to the number of the different labels and their volatility but also due to their highly imbalanced distribution. In effect, it is very difficult to get training data that adequately cover all classes. Our research focuses on text classification with few- and zero-shot learning capability to handle rare and unseen classes.
Read more

Document Intelligence Centre of Excellence
dice@iit.demokritos.gr
Patr. Gregoriou E & Neapoleos Str 27,
15341 Agia Paraskevi, Athens, Greece

Privacy Policy Terms & Conditions Contact

We are using cookies to give you the best experience on our website.

You can find out more about which cookies we are using or switch them off in settings.

DICE
Powered by  GDPR Cookie Compliance
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

Strictly Necessary Cookies

Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.

If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.