Development Blog 6

(21.02.2022 - 06.03.2022)

Database

To support our post-processing pipeline, we created a database in Django to store possible tests and their respective example values and range information if applicable. This is for us to retrieve the database and cross-check the returned answer of TaPas to ensure its accuracy and reliability.

We have a model in our database called ‘Labtest’, where each object instance represents one test and its related information. The details of each entity are as shown below:

name: test name
example_val: an example of accepted value
min_val: lower bound of the accepted range
max_val: upper bound of the accepted range

A diagram illustrating the model is attached below:

Post-Processing Logic

Our post-processing logic had to ensure every response that went back to the user passed certain checks. This ensures the user only sees responses we feel are accurate enough.

We begin by cross-checking the datatype of the TaPas response with the type we expect and have stored in the database as ‘example_val’.

If the type is a Decimal and the response and expected value types match, we then do a second check to see if the response falls within a given range between ‘min_val’ and ‘max_val’.

For instance, we know that glucose levels should fall anywhere between 80 to 300mg/dl (milligrams per deciliter). If the TaPas response falls outside this range we prevent the answer from going back to the user as an error may have occurred.

Aggregation with WTQ model

To extend the capability of TaPas, we choose to use WTQ (WikiTable Questions). The model is pre-trained using MLM and an intermediate pre-training phase and then fine-tuned using SQA, WikiSQL, and WTQ in a chain. The pre-trained model

is fine-tuned by adding a cell selection head and an aggregation head and then jointly training these randomly initialised classification heads. It utilises relative position embeddings, which means that the position index is reset in each table cell.