Tutorial on Fuzzy String Matching with DeezyMatch

Workshop ID:

WT-13

Workshop Title:

Tutorial on Fuzzy String Matching with DeezyMatch

Organizers:

Mariona Coll Ardanuy, Kasra Hosseini, Federico Nanni, Valeria Vitale

Time (in JST and UTC):

July 25 23:00-July 26 2:30 (JST)

July 25 14:00-17:30 (UTC)

Maximum number of participants:

20

Description:

Fuzzy string matching is a common challenge of linking data in many digital humanities projects, which often deal with noisy, historical, or non-standard text. In this tutorial, we will introduce DeezyMatch, an open-source, user-friendly Python library for fuzzy string matching and candidate ranking for entity linking that has been developed in the Living with Machines project. DeezyMatch is a tool that integrates recent deep learning advances, and has been specifically designed to be flexible, user-friendly and fast, and therefore ready to be used in real entity linking scenarios.

Aim of the workshop/tutorial:

In this tutorial, we will introduce DeezyMatch, an open-source, user-friendly Python library for fuzzy string matching and candidate ranking that has been developed in the Living with Machines project. We will show how to create string pair datasets that can be used to train and test a DeezyMatch model, and how DeezyMatch models can be used to retrieve candidate entities from a knowledge base. By way of motivation, we will provide and discuss some real digital humanities examples which require fuzzy string matching and will show how DeezyMatch can be used to mitigate the problem of name variation in noisy, historical, or non-standard data.

Outline:

This is a half-day tutorial which will cover the following core content:

Part 1: Introduction to DeezyMatch and motivation [60 min]

  • Introduction to fuzzy string matching and entity linking

  • Description of case studies and data obtaining and preparation

  • Overview of DeezyMatch

Part 2: Interactive hands-on session [1h20 min]

  • Demo 1: candidate ranking using a pre-trained model

  • Hands-on exercise

  • Demo 2 and hands-on session: DeezyMatch training and candidate ranking

  • Hands-on exercise

Part 3: Discussion and feedback [40 min]

  • How to adapt DeezyMatch for your project

  • Questions