← Atlas

Project Record

Reconciliation of Financial Instruments

Book-to-book reconciliation framework for instruments, quantities, prices, and currencies.

reconciliationpost-tradecontrols

Purpose

Match internal and external records while surfacing breaks beyond configurable tolerances.

Context

During platform or custodian migrations, internal books (e.g. Front Arena, order management) and external sources (prime brokers, custodians, fund admins) often disagree on positions, quantities, or prices. Manual reconciliation is error-prone and does not scale. This framework was built to support a probabilistic instrument-matching pipeline so that breaks could be identified and resolved before cutover.

Problem

Records from different systems use different identifiers, rounding, and timing. Simple key-based joins miss valid matches (e.g. same ISIN, different lot sizes or price decimals) and produce false breaks. The system must tolerate small numerical differences while still flagging material discrepancies.

Matching Rule

Candidates are grouped by ISIN and currency, then matched on minimum quantity+price distance. Distance is normalized so that tolerance thresholds can be expressed in basis points or units.

Implementation

Trade-offs

Probabilistic matching improves recall but can hide systematic data issues if tolerances are too loose. We keep an audit trail of all matches and expose parameters so compliance can tune sensitivity.


Use case: inferring cross-system instrument ID mappings from trade logs

A natural extension of the reconciliation framework is statistical record linkage over logs and entities: treat each system’s instrument ID as an entity, each trade or position record as a log, and use probabilistic record linkage to score and match ID pairs based on how their logs co-occur and agree on attributes. The goal is to learn a mapping between instrument IDs across systems with controllable precision, using only routine trade/position logs—no master golden mapping required up front.

Working title (for a methods + use case piece):
Inferring Cross-System Instrument ID Mappings from Trade Logs via Probabilistic Record Linkage

Core idea in one line:
Treat each system’s instrument ID as an “entity,” each trade/position record as a “log,” then use probabilistic record linkage to score and match ID pairs based on how their logs co-occur and agree on attributes.

Problem & setting

Method (end-to-end pipeline)

A. Candidate generation (blocking)
Join in time windows (e.g. ±Δt) and coarse buckets (currency, venue, price bands, trading day). Keep pairs with at least k co-occurring events. Use recordlinkage indexers or Splink blocking rules to shrink comparisons while retaining likely matches.

B. Features (per candidate ID pair)

C. Scoring model

D. One-to-one assignment
Convert pair scores to a bipartite graph; maximum-weight matching (Hungarian / Kuhn–Munkres). Enforce 1:1 or relax to 1:many for legacy aliases.

E. Thresholding & calibration
Choose score threshold τ for target precision (e.g. 99.5%) on a validation slice; report recall/coverage; add a “gray zone” for human review.

Evaluation

Results (concise)

Operational notes

Conclusion & reuse

General recipe: logs → features → Fellegi–Sunter score → graph match. Reusable for users↔devices, products↔catalogues, etc. Publish synthetic dataset + code (e.g. Zenodo DOI) for reproducibility.

Abstract (drop-in)

We present a practical method to infer cross-system mappings between instrument identifiers using only routine trade/position logs. Our approach frames the problem as probabilistic record linkage: we (i) generate candidate ID pairs via temporal and categorical blocking, (ii) compute agreement features from co-occurring events (time, price, size, venue, description), (iii) estimate match probabilities with the Fellegi–Sunter model, and (iv) enforce global one-to-one consistency via maximum-weight bipartite matching. On realistic synthetic data reflecting clock skew, rounding, and corporate actions, the method attains >99% precision at useful coverage with minutes-scale runtimes. We provide an open-source implementation and synthetic dataset to support reuse across domains where entities must be reconciled from passive logs.

Minimal reproducible code plan

Blog-post companion (non-confidential)

Tooling to mention

Related Work