Development Blog 5

(31.01.2022 - 13.02.2022)

Bot Testing

We generated a file to validate and test dialogues end-to-end by running through test stories. This helps us evaluate that the bot behaves as expected. Testing TaPas on test stories is the best way to have confidence in how the assistant will act in certain situations.

Test stories allow us to mimic an entire conversation and test that in certain situations and user inputs, the model will behave in the expected manner.

Test stories are similar to the stories embedded in the training data, but include the user message as well.

Here is some examples:

Screenshot 2022-03-08 at 16.21.48.png

Clinical BERT

Clinical BERT is a BERT model trained for the clinical domain by using all note types of electronic records from ICU patients. It is initialised from the standard BERT-Base model from Google, then trained and tuned with the data and notes prepared. [Clinical BERT Embeddings (Online) https://arxiv.org/pdf/1904.03323.pdf]

As our chatbot project involves questioning and answering on clinical tabular data, the TaPas models available now are all in the general domain. This is due to the training on all types of tables extracted from Wikipedia. Thus, as part of experiment and exploration, we were suggested by the clients to try integrating or converting the clinical BERT into a TaPas model. This experiment is a step to test if the TaPas accuracy on clinical tabular data could be improved further when it is made to specialise in the clinical domain.

However, due to the limited resources and information, the experiment was not successful as the clinical BERT embeddings and configurations are different from the TaPas version. When we carried out evaluations and predictions, error occurs due to the difference in type vocab size, where clinical BERT has a value of 2 and TaPas has a value list of [3, 256, 256, 2, 256, 256, 10]. Furthermore, the maximum position embeddings of TaPas could go up to 1024 but clinical BERT has a maximum of 512 only. Apart from that, the structure of TaPas is embedded to resemble a table whereas clinical BERT is designed to support text and paragraphs. The major difference for table and text tokenizing and embedding might be the obstacle for the integration and conversion.

Adapting the clinical BERT training approach, possible future experiment includes preparing a huge amount of reliable clinical tabular data of all types and training the clinical TaPas from the TaPas-Base model. Following the pre-training process, fine tuning tasks could be done to allow support and increase accuracy for SQA, WikiSQL, WTQ and TabFact. Other than that, another experiment which could be done includes by concatenating tables into appropriate sentences such as “Header” + “Grid”, then passing the resulting sentence into the clinical BERT model. Both of the approaches stated above have yet to be experimented and therefore, the final outcome remains uncertain.

Screenshot 2022-03-08 at 16.23.34.png

Synthea

Synthea is a Synthetic Patient Population Simulator. The goal is to output synthetic, realistic (but not real), patient data and associated health records in a variety of formats. This generator can generate different kinds of patient data such as prescription, operations, surveys, labs. It also supports generating the files in different formats such as HL7 FHIR, C-CDA, CSV.

The goal of using synthea was to generate reliable data for the purpose of further fine-tuning the TaPas model. Since Synthea can generate files in CSV files, so tabular format and with specific medical information it is a perfect tool to make TaPas familiar with the clinical vocabulary. Unfortunately, fine-tuning is a complicated, detailed and time consuming process. During the time of our collaboration with Infosys we did not manage to explore this option enough, but we believe it can be used in further work.