Skip to content

Explainable Artificial Intelligence (XAI): How Machine Learning Predictions Become Interpretable

by Anna-Sophie Jaeger, MSc

Anna-Sophie Jaeger, MSc wrote her Master Thesis in the area of “Explaining Text Classification for ICD-Coding”.

It helps us choose our clothes, facilitates communication across language barriers, assists us in driving and simplifies navigation. Artificial intelligence (AI) permeates all these areas and has become an indispensable part of our everyday lives. Artificial intelligence is largely due to deep, artificial neuronal networks that are modelled on the human brain. Which factors are decisive for the predictions usually remains in the dark. Due to their architecture, models from the field of deep learning are often visible, but not interpretable (so-called black boxes). This means that model predictions and decisions derived from them cannot be understood by humans without additional software. In certain situations, however, it is necessary to be able to explain the decision-making basis of the prediction.

For example, if a machine learning (ML) model is used to decide whether to approve a loan, the reasons for or against the approval are quite important. Another area where transparency of AI systems is required is medicine. Experts should not blindly rely on the predictions of an ML model, because they are not always correct. However, trust in the model can be strengthened if the factors that were significant for the decision of the model can be disclosed and thus traceability or transparency is given (see also the technical article “Trust in Artificial Intelligence“). The technical methods that can be used to improve the interpretation of model decisions are shown in this technical article using the example of a text classifier in the medical field.

Table of contents

  • Use case: AI as a support in finding a diagnosis
  • What does the explanation of a prediction look like?
    • LIME
    • SHAP
  • Welche Erklärungsmethode ist besser?
  • Conclusion
  • Sources
  • Contact

Use case: AI as a support in finding a diagnosis

In the medical field, diagnosing a disease is an important part of the daily workflow. The International Statistical Classification of Diseases and Related Health Problems (ICD) is used worldwide to clearly categorise diagnoses. ICD codes are organised hierarchically and are divided into 19 disease chapters. Each chapter consists of several categories, which in turn are subdivided into subcategories. When a report is made on a diagnosis, an ICD code must be assigned to it. The assignment of ICD codes requires extensive domain knowledge due to the complexity and diversity and is therefore done by persons with medical expertise. This process is time-consuming and expensive, but can be made more efficient with the help of machine learning techniques. For this purpose, a model is specifically trained to predict an ICD code based on a textual description (e.g. medical history or discharge report). In a further step, the driving factors for the model decision are to be identified, e.g. by marking the most meaningful parts of the text. There are several ways to generate such an explanation. It is not always clear which technique is best suited for the specific application. The model used also plays a decisive role here. Explanatory methods can be divided into two categories for this purpose: model-agnostic explanatory methods can explain the prediction of any model, model-specific ones are specialised in explaining the prediction of specific models.

What does the explanation of a prediction look like?

An explanation can take different forms and depends on the data used. For example, if the prediction for an image is to be explained, the relevant image areas are marked. In the case of tabular data, the strength of the contribution is evaluated by the importance of the columns (so-called “features”). In the concrete use case for interpreting diagnostic predictions based on text data, the explanation lies in the weighting of individual text parts or words – the more decisive a word was for the prediction of the model, the higher the weight.

Two of the most commonly used model-agnostic explanation methods are LIME and SHAP. The procedure for weighting the input data is different for LIME and SHAP and is presented below as an example.

brain cube


LIME stands for Local Interpretable Model-Agnostic Explanations [1]. “Local” in this context means that LIME does not explain the entire model, but only a prediction of the model. The explanation generated by LIME is thus not valid for any input value, but must be explicitly generated for each input value. “Interpretable” means that the explanation generated by LIME – compared to the internal states of the underlying model – is more interpretable for humans. “Model agnostic” refers to an explanation method that can generate an explanation for any predictive model, regardless of architecture. “Explanations” stands for the explanation provided by LIME.

LIME creates new, slightly modified instances of the input data based on the given input data. The original input data is pertubated (i.e. modified by removing certain words or parts of words randomly or according to a predefined scheme). For example, if the input text is “the patient suffers from severe headaches”, a pertubated sample would be “the patient suffers from headaches” or “patient suffers headaches”. This pertubated input data is then used to create a simpler model, called a white-box model, which can explain the prediction of the black-box model. Figure 1 shows a simplified illustration of how LIME works.


Figure 1: For each pertubated input value, a prediction is generated by means of the ML model to be interpreted (black box model). The ML model calculates a probability value per class (p0 and p1) for an example. LIME uses these probability values to create the white box model. The goal is for the white box model to “learn” the probability values of the pertubated input values as well as possible. Simply put, the white box model contains a weight for each possible input value. These weights and the corresponding input values form the explanation.

Visualisation example of LIME's explanation of a medical diagnosis text.

Figure 2: Visualisation example of LIME’s explanation of a medical diagnosis text.

The left side of the graph in Figure 2 shows the prediction probabilities of the black box model for the chapters with the highest values. The most likely chapter is chapter 3, “Endocrine, nutritional and metabolic diseases and disorders of the immune system” with a value of 0.95.

On the right side of the class probabilities is a legend for the highlighted words in the text at the bottom of the plot, which were identified by the white box model. An orange (blue) highlighted word symbolises that this word increases (decreases) the classification as Chapter 3. The darker the word is highlighted, the more it argues for or against the classification as Chapter 3. The word with the highest positive contribution in this example is “ketoacidosis”, a severe metabolic derailment with insulin deficiency, which is strongly decisive for the categorisation as Chapter 3.


SHAP uses a different approach to explain a prediction. The basis for this is the so-called Shapley values by Lloyd S. Shapley [2] from game theory. The aim is to be able to assign a share to players in a game, measured by the contribution of the respective player to the game. For example, it is possible that two players A and B together can achieve a very good profit. However, if player A and C play together, the profit is lower. In order to calculate the individual contribution of the players, every possible combination is tested. This procedure can also be used to predict an ML model. A player is a part of the input values, the profit generated is the change in the prediction probability. However, the calculation for every possible combination of parts of the input values is not possible within a reasonable runtime. How this problem was solved can be found in the original paper “A Unified Approach to Interpreting Model Predictions” by Lundberg and Lee [3].

The following two diagrams represent the prediction of SHAP for the text example already presented in the LIME section.

Representation of positive (red) and negative (blue) Shapley values

Figure 3: Representation of positive (red) and negative (blue) Shapley values

For the explanation that SHAP generates, the input text is divided into text parts. Figure 3 shows a number line on which the text parts of the input text are arranged. A Shapley value is calculated for each text part. The calculation starts with the “base value”. The “base value” is the Shapley value if there is no input. The red sections of the number bar mark text parts with a positive Shapley value. A positive Shapley value increases the prediction probability for the respective class. On the right side of the number line are text parts marked in blue. These text parts have a negative Shapley value and reduce the probability of predicting the given class.

More detailed presentation of the Shapley values according to influence strength

Figure 4: More detailed representation of the Shapley values according to influence strength

Figure 4 shows the input text. The input text is divided into the same text parts as in figure 3, but additionally shows the strength of the influence via the intensity of the colour. Text parts with the highest or lowest Shapley values are highlighted in a darker colour. The text part with the highest Shapley value is “ketoacidosis most recently in INCOMPLETE”.

Which explanation method is better?

An assessment of which explanation method is “better” can only ever be evaluated in the context of a specific use case. Explanation methods can be compared with each other by consulting people with expertise in the respective domain who evaluate the generated explanations. However, there are also approaches to compare explanatory methods without the need for expert personnel [4], as this is not always available. For this purpose, different properties are evaluated, including the following:

  • The efficiency of an explanation method can be evaluated by measuring the time needed to generate an explanation.
  • Furthermore, the generated explanations should be stable. Stability in this context means that if the explanation method generates several explanations with the same input value, these explanations should be as similar as possible.
  • Perhaps most exciting is the question of whether the generated explanation is meaningful. In other words, how effective is the explanation of the explanation method?

Our experience on the ICD classifier use case has shown that in terms of efficiency, stability and effectiveness, SHAP is more efficient, stable and effective compared to LIME. LIME is good at finding the “most important” word in a text. A disadvantage is that the explanation generated by LIME varies greatly due to the pertubation of the input text and is therefore not always reliable. In addition, the runtime of SHAP is shorter compared to LIME.


An explanation of what happens in ML models can be useful in certain areas and necessary in others. LIME and SHAP offer the possibility to gain insight into how ML models work. Although the explanation methods are not always complete and accurate, in many cases they can help end users to check ML predictions by simple highlighting in text or images.


    Mag. Stefanie Kritzinger-Griebler, PhD

    Head of Unit Logistics Informatics