Fuzzy Matching Tool

A GUI-based tool for fuzzy string matching using RapidFuzz, developed using CustomTkinter. Taking as input two datasets of strings to match, this tool returns a cleaned dataset containing match scores according to the specified algorithm and output type.

Features

Data formats

Data is input as two datasets to match across, each containing at minimum a unique ID for each row and the corresponding string to be matched. Matching takes place across these datasets, and the result is export as a single dataset. The import and export filetypes currently supported are text (.csv), Excel (.xlsx) and Stata (.dta)

Output Types

The results can be calculated and output in three ways:

Matching Algorithms

Set Ratio

Compares based on unique/common words, ignoring extra and repeated words.

In [1]: fuzz.token_set_ratio("I love competition economics", "competition economics")
Out[1]: 100
Sort Ratio

Sorts words in strings before comparison.

In [2]: fuzz.token_sort_ratio("I love competition economics", "economics competition I love")
Out[2]: 100
Max(Set Ratio, Sort Ratio)

Uses the higher score of Set Ratio and Sort Ratio.

In [1]: fuzz.token_set_ratio("I love competition economics", "competition economics")
Out[1]: 100
In [2]: fuzz.token_sort_ratio("I love competition economics", "competition economics")
Out[2]: 85.71428571428572
In [3]: fuzz.token_ratio("I love competition economics", "competition economics")
Out[3]: 100
QRatio

Calculates the standard InDel similarity ratio, pre-processing strings for speed.

In [1]: fuzz.QRatio('I love competition economics', 'competition economics')
Out[1]: 85.71428571428572
The above example contains two strings of length 28 an 21 respectively. To reach the first string from the second requires 7 insertions and the ratio is thus 6/7.

Additional Settings

Beyond the toggle for theme (selecting between light and dark mode), it is also possible to toggle the visibility of several additional features:

Preview

By default the window displays in dark theme with advanced options hidden, as below:

Fuzzy Matching Dark Theme

Toggling the advanced options toggle displays the additional settings, as below. Toggling theme switches the GUI from dark mode to light, and vice versa.

Fuzzy Matching Dark Theme