Emergent

Master data management for suppliers, sites, assets and emissions sources

Share this post

Across Africa, organisations face intensifying pressure—from investors, regulators, customers and lenders—to disclose climate and sustainability performance with the same rigour as financials. New global standards (for example IFRS S2) and fast‑evolving national rules (such as South Africa’s carbon tax regime and Kenya’s climate law amendments) are pushing companies to move from ad‑hoc spreadsheets to robust, auditable systems. At the heart of those systems sits master data management (MDM): the discipline of creating a single, governed source of truth for the core entities that underpin ESG reporting—suppliers, sites, assets and emissions sources. Get MDM right and your carbon inventory closes on time, value‑chain data are traceable and credible, and decarbonisation decisions are based on facts rather than estimates. Get it wrong and you risk restatements, lost investor confidence and regulatory exposure.

This article sets out a pragmatic, Africa‑aware blueprint for ESG MDM: why it matters, what “good” looks like, the identifiers and standards to use, a reference data model, operating model and change roadmap—plus pitfalls to avoid.

1) Why MDM now? The disclosure and assurance context

Three shifts have made high‑quality master data non‑negotiable:

  • Global baseline rules. IFRS S2 (issued June 2023) requires entities to disclose climate‑related risks and opportunities; it explicitly expects information about business model and value chain, demanding consistent identification of facilities, assets and counterparties over time. Where organisations depend on supplier data or logistics activity, the quality and traceability of those inputs becomes a governance issue, not just an IT detail.
  • National/regional regulation in Africa. South Africa’s carbon tax (in force since 1 June 2019) links emissions quantification to tax liability, creating a direct financial incentive for accurate plant‑level and source‑level data; regulators also align reporting with mandatory emissions inventories. Kenya’s 2023 Climate Change (Amendment) Act establishes the regulatory basis for carbon markets and tighter reporting. Nigeria has adopted the IFRS sustainability standards with a phased timeline for corporate reporting—again raising the bar on data quality.
  • Scope 3 pressure. Value‑chain emissions now dominate many footprints. The GHG Protocol’s scope definitions, adopted widely, bring suppliers and logistics inside the boundary. New frameworks targeting Scope 3 practices (e.g., VCMI’s 2025 guidance) underscore the need for traceable activity data and clear attribution.

Implication: If suppliers, sites, assets and emissions sources are not consistently identified and governed, the organisation cannot produce assured, decision‑useful disclosures.

2) What is “master data” in ESG?

Master data are the stable reference entities used repeatedly across processes and systems. In sustainability, the core domains are:

  • Suppliers (counterparties). Legal entities providing goods/services, and (for Scope 3) often the source of activity data and emission factors.
  • Sites (locations). Physical places where economic activity occurs—owned, leased or operated (e.g., mines, factories, depots, farms, stores, logistics hubs).
  • Assets. Equipment and infrastructure whose operation drives energy use, emissions, water abstraction, waste, and safety incidents—from boilers and gensets to vehicles, conveyors and flares.
  • Emissions sources. The auditable, uniquely identified points or activities that generate emissions—fixed combustion units, mobile sources, process vents, refrigeration circuits, fugitives, purchased electricity meters, logistics legs, and so on.

Good master data form the anchor points to which you attach transactions (fuel issues, meter readings), observations (sensor telemetry), factors (emission factors), and accounting (scope allocation, ownership, consolidation method). Without persistent identifiers and controlled reference lists, you cannot prove completeness or avoid double counting.

3) Principles of high‑integrity ESG master data

1. Uniqueness. Each supplier, site, asset and emissions source has a globally unique, immutable identifier.

2. Verifiability. You can evidence the lineage: who created/changed the record, when, and based on what sources (W3C PROV is a useful model for this).

3. Quality by design. Apply ISO 8000 data quality concepts—syntactic (format), semantic (meaning) and pragmatic (fitness for purpose)—with measurable rules and stewardship.

4. Standards‑based. Use external identifiers and vocabularies where possible (e.g., LEI for legal entities; GS1 GLN for locations; ISO/IEC 81346 and ISO 14224 for asset classification; IPCC guidelines for emission methods).

5. Geospatially precise. Sites and sources must carry coordinates with CRS noted (e.g., WGS 84 / EPSG:4326) to support mapping, proximity analysis and climate hazard overlays.

6. Audit‑ready. Evidenced controls, change histories, and reference to the versions of factors or methods used (e.g., IPCC 2006 + 2019 Refinement).

7. Security and ethics. Handle supplier data and site locations responsibly (confidentiality, community sensitivities), aligned to local laws and stakeholder expectations.

4) The identifiers and standards that make ESG MDM work

Suppliers (counterparties).
Prioritise authoritative IDs:

  • LEI (Legal Entity Identifier, ISO 17442)—a 20‑character code with ownership linkage (“who is who” and “who owns whom”). Adopt LEI where available; otherwise capture national registration numbers and maintain cross‑references.
  • Category taxonomy. Classify supplier products/services with UNSPSC (or local equivalent), enabling spend‑based Scope 3 screening and hotspot analysis.

Sites (facilities and hubs).
Use identifiers that travel across partners and systems:

  • GS1 GLN (Global Location Number) for legal entities and physical locations; pair with precise coordinates (WGS 84 / EPSG:4326). GLN helps avoid duplicate site records when names vary.
  • Record location type (plant, warehouse, farm, office, construction site, retail outlet, logistics hub), jurisdiction, licence IDs (e.g., South African SAAELIP/NAEIS facility registration where applicable).

Assets (equipment and infrastructure).
Create a consistent asset hierarchy with:

  • ISO/IEC 81346 for reference designation—structuring systems and objects consistently across projects and operations.
  • ISO 14224 (petroleum, petrochemical, natural gas) for reliability and maintenance classes and failure modes—useful beyond oil & gas when heavy industrial process equipment is involved.
  • Map asset classes to emissions categories; ensure each emissions source links to a parent asset (or is itself an asset).

Emissions sources.
Treat sources as first‑class master data:

  • Assign a Source ID and Source Type (e.g., fixed combustion, mobile, process, vent, fugitive, purchased electricity meter, logistics leg).
  • Store the method and factor set used (e.g., IPCC 2006/2019; national factors; supplier‑specific primary data) with versioning.
  • For IoT‑connected sources (meters, sensors, continuous emissions monitoring systems), standardise metadata using OGC SensorThings API to simplify integration and provenance.

5) A reference data model you can implement

Below is a practical, technology‑agnostic schema you can implement in a modern MDM hub or data lakehouse. Keep it small but complete; add detail where it increases auditability or automation value.

5.1 Core entities

Supplier

  • supplier_id (internal, immutable)
  • lei (if available), local_registration_number, country
  • legal_name, trading_name
  • parent_supplier_id (corporate tree)
  • category_unspsc (primary; allow multiple)
  • esg_contact_email
  • assurance_status (e.g., none, reviewed, assured)
  • data_sharing_agreement (yes/no; link to artefact)
  • last_due_diligence_date

Site

  • site_id (internal, immutable), gln (if used)
  • legal_entity_id (links to Supplier or your own entity)
  • name, site_type (enumeration)
  • latitude, longitude, crs (e.g., EPSG:4326)
  • country, region, jurisdiction
  • permit_ids (list; e.g., emissions licence), regulator_portal_id (e.g., NAEIS facility code)
  • grid_connection (yes/no), onsite_generation (diesel PV etc.)
  • operational_status (active, mothballed, closed), start_date, end_date

Asset

  • asset_id (immutable), reference_designation (IEC 81346), class (ISO 14224 where relevant)
  • site_id (parent), owner (entity)
  • commissioned_date, decommissioned_date
  • energy_carrier (diesel, HFO, grid electricity, LPG, biomass, coal, etc.)
  • rated_capacity (kW, m³/h, etc.), utilisation_metric
  • maintenance_system_id (link to EAM/CMMS)

EmissionsSource

  • source_id (immutable), parent_asset_id (nullable), site_id
  • scope (1, 2, 3), source_type (fixed combustion, mobile, purchased electricity, process, vent, refrigerant leak, waste, transport leg, etc.)
  • method_reference (IPCC 2006 vX; 2019 Refinement; ISO 14064‑1; ISO 14067 for products; GLEC v3 for logistics), factor_library (e.g., IPCC EFDB record), factor_version
  • activity_unit (e.g., litres, kWh, tonne‑km), measurement_type (metered, calculated, estimated, default)
  • sensor_id (if IoT), sensorthings_endpoint (if used), data_frequency
  • valid_from, valid_to

Link tables and hierarchies

  • SupplierSite (for leased/contract‑operated facilities)
  • AssetHierarchy (parent/child)
  • SourceToFactor (binding to a specific factor record with GWP version)
  • SupplierProductCategory (for multi‑category suppliers)

5.2 Reference and code lists

  • Country codes (ISO 3166), units (SI), energy carriers, waste categories, refrigerants (incl. GWP values by IPCC assessment report), transport modes and load units (align to GLEC Framework for logistics).

5.3 Provenance and governance fields (all entities)

  • created_by, created_at, source_system
  • last_modified_by, last_modified_at
  • provenance (W3C PROV serialisation or link to lineage record)
  • quality_status (e.g., gold, silver, bronze), with rule results

6) Data quality: rules that matter

Leverage ISO 8000’s syntactic/semantic/pragmatic lens to write crisp, testable rules. Examples:

  • Suppliers
    • Syntactic: lei must be 20 characters if present (ISO 17442).
    • Semantic: country must match local_registration_number issuing jurisdiction.
    • Pragmatic: esg_contact_email is mandatory for critical tier suppliers.
  • Sites
    • Syntactic: latitude ∈ [‑90, 90]; longitude ∈ [‑180, 180]; crs = EPSG:4326.
    • Semantic: permit_ids present when site_type ∈ {plant, mine, refinery}.
    • Pragmatic: Sites with grid_connection = yes must have a purchased_electricity emissions source of type Scope 2.
  • Assets
    • Syntactic: reference_designation conforms to IEC 81346 structure.
    • Semantic: energy_carrier aligns to allowed list (diesel/LPG/etc.).
    • Pragmatic: Critical assets (e.g., boilers > 1 MW) must have a mapped emissions source.
  • Emissions sources
    • Syntactic: scope ∈ {1,2,3}; method_reference points to an approved library (IPCC 2006/2019 or jurisdictional guidance).
    • Semantic: activity_unit matches source type (e.g., litres for diesel combustion; kWh for electricity).
    • Pragmatic: If measurement_type = metered, meter telemetry must be available at least monthly; if estimated, keep an estimation note and variance threshold.

Use ISO 8000‑61 as a process reference model for data quality management: define processes, outcomes and activities; assign data stewardship and escalation paths.

7) Emission calculation integrity: factors, methods and versions

Your MDM must bind every emissions record back to an explicit method and factor version. In practice:

  • Methods. For organisational inventories, follow ISO 14064‑1 and the GHG Protocol; for product footprints, ISO 14067; for national inventory alignment and default factors, IPCC 2006 Guidelines with 2019 Refinement. Store the method name, version, and a URI or document reference.
  • Factors. Use reputable libraries and track their IDs and versions. The IPCC Emission Factor Database (EFDB) is a global reference; in logistics, apply the GLEC Framework (v3 aligns with ISO 14083) to ensure consistent tonne‑km calculations across modes.
  • GWP versions. Record the IPCC assessment report used for global warming potentials (e.g., AR5 or AR6) and ensure factors and GWPs are time‑aligned to avoid mixing bases. (The 2019 Refinement updates inventories but is used with the 2006 Guidelines.)
  • Formulae. Always store the calculation recipe along with results. A typical stationary combustion CO₂ calculation is Emissions = Activity_Data × Emission_Factor × Oxidation_Fraction, with methane and nitrous oxide added as needed and converted with GWP. Anchor the source of each parameter.

8) Scope‑by‑scope: what MDM must capture

Scope 1 (direct).
Every emitting activity on your sites must be represented as a source with a coherent parent asset/site and fuel or process data:

  • Fixed combustion (boilers, gensets): link to fuel issues or meters; track fuel type and density.
  • Mobile (fleets): manage vehicle master data (VIN, fuel type, average load), route or odometer data.
  • Process emissions (e.g., calcination in cement): encode process parameters and stoichiometric factors.
  • Fugitives (refrigerants, methane): register circuits, charge sizes, leak inspections and gas types (with GWP).

Scope 2 (purchased energy).
Each point of electricity purchase (meter, account, or site‑level contract) should be an emissions source with meter IDs, supplier, tariff, location‑based vs market‑based method, and the emission factor set/date. (Many African grids lack granular location‑based factors; where unavailable, document proxies transparently.)

Scope 3 (value chain).
Your supplier and logistics master data now determine credibility:

  • Purchased goods/services: classify spend with UNSPSC; where material, collect supplier primary data or model with activity proxies (mass, energy content).
  • Logistics: standardise carrier, mode, lane, distance and load data to GLEC; store allocation method (mass‑, volume‑, cost‑ or energy‑based).
  • Capital goods, waste, business travel, employee commuting: model against documented activity data and factors, not only spend.

Some emerging frameworks allow limited use of high‑quality credits to address residual Scope 3 while reduction plans mature; governance requires transparent data on “retired credits” bound to the reporting year and project IDs. Reuters

9) Technology: a lean, standards‑first reference architecture

  • MDM hub (golden records). A central service where supplier, site, asset and source records are mastered; supports survivorship rules, matching/merging, versioning and data quality rules.
  • Data catalogue + lineage. Catalogue master entities and their attributes; store W3C PROV relationships or integrate with lineage tooling to show source‑to‑report traceability.
  • Integration fabric.
    • Upstream: ERP (suppliers, purchasing), EAM/CMMS (assets), GIS (sites), HSE systems (permits), telemetry platforms.
    • IoT: expose sensor metadata and observations via OGC SensorThings for consistent ingestion.
  • Calculation engine. Separates master data from methods/factors; supports versioning, unit conversion and audit logs; reads master data by ID.
  • ESG reporting layer. Produces IFRS S2, jurisdictional submissions (e.g., NAEIS‑related reporting in South Africa) and customer disclosures.

Small and mid‑market tip: If budgets are tight, you can still implement the schema and rules in a lakehouse with open‑source matching and a modest workflow tool—what matters is the identifiers, reference lists and governance, not the brand‑name platform.

10) Operating model: who does what

  • Executive sponsor (CFO/COO/Chief Sustainability Officer): owns policy, funds the operating model.
  • Data owners (Procurement for suppliers; Operations for sites and assets; HSE/Environment for sources).
  • Data stewards in each domain: approve changes, monitor data quality metrics, manage deduplication.
  • Sustainability accounting team: defines methods/factors and aligns with IFRS S2/ISO/GHG Protocol.
  • Internal audit/risk: tests controls, sample‑checks lineage, confirms regulatory compliance (e.g., South Africa carbon tax linkages).

Establish an MDM Change Control Board meeting fortnightly: approve new classes, sources, or rule changes; review quality dashboards; decide on sunset of legacy IDs.

11) An Africa‑aware implementation roadmap (12 months)

Phase 0 (Weeks 0–4): Mobilise and baseline

  • Confirm disclosure scope (IFRS S2, customer requests, local laws).
  • Map systems and spreadsheets where master data live today; quantify duplicates and gaps.
  • Lock the canonical ID scheme and reference data (UNSPSC, GLN, LEI, ISO 14224 classes, CRS).

Phase 1 (Months 2–4): Foundations

  • Stand up an MDM hub (or minimal lakehouse equivalent) with the schema above.
  • Implement priority data quality rules; instrument lineage (W3C PROV).
  • Pilot two sites and one Scope 1 source class (e.g., diesel gensets), plus Scope 2 electricity.

Phase 2 (Months 5–8): Scale to value‑chain and logistics

  • Onboard top 50 suppliers by spend and emissions; collect activity data where material.
  • Standardise transport activity to GLEC, focusing on export corridors (ports, rail spurs).
  • Align factor libraries (IPCC EFDB default vs jurisdictional) and freeze versions for the reporting year.

Phase 3 (Months 9–12): Assurance‑readiness

  • Extend to all sites; close gaps on refrigerants/fugitives and waste.
  • Reconcile to tax/regulatory submissions (e.g., South Africa carbon tax, NAEIS where applicable) to ensure data flows match legal filings.
  • Dry‑run IFRS S2 disclosures with an audit trail; lock change window before year‑end close.

12) Practical examples (African contexts)

A Kenyan tea processor with smallholder suppliers.

  • Suppliers: Each cooperative gets a supplier_id; where available, capture legal registry number; classify inputs (fertilisers, packaging) with UNSPSC.
  • Sites: Factories and leaf collection centres assigned GLNs and coordinates to model flood/heat risks; power meters defined as Scope 2 sources.
  • Assets & sources: Biomass boilers as assets with source_type = fixed combustion; document moisture content methods and factor versions (IPCC energy volume).
  • Logistics: Tea transport to port measured in tonne‑km under GLEC, enabling consistent emissions per container and per customer.
  • Regulatory: Climate market provisions under the 2023 Act inform potential inset/offset programmes—tracked with provenance.

A Nigerian cement plant preparing for IFRS S2.

  • Sites & assets: Kiln lines and mills mastered with ISO 14224 classing (for failure/maintenance analytics) and meter hierarchies; calcination process sources documented.
  • Suppliers: Clinker and fuel suppliers onboarded with LEI or national IDs; contracts tagged to Scope 3 categories.
  • Disclosure: A data model linking plant‑level emissions to IFRS S2 strategy, risk and resilience disclosures accelerates assurance. Nigeria’s adoption timeline informs sequencing.

A South African mining company subject to carbon tax.

  • Sites: Each shaft and processing facility has coordinates, permits and NAEIS IDs.
  • Sources: Diesel for pit equipment (mobile), grid electricity (Scope 2), and process sources (venting, flaring). Methods and factors versioned; tax‑relevant emissions summarised by legal entity.

13) Controls and metrics: how you know it’s working

Track both data quality KPIs and business outcomes:

  • Completeness: ≥ 99% of active sites with coordinates and licence identifiers; ≥ 95% of active sources bound to a factor set.
  • Uniqueness: Duplicate rate < 0.5% for suppliers; < 0.2% for sites.
  • Timeliness: 95% of meters with monthly reads posted within 10 business days.
  • Lineage coverage: 100% of emissions figures traceable to method/factor version and activity data artefact.
  • Close time: Carbon inventory closed within X days of period end.
  • Assurance results: Zero significant findings tied to master data.
  • Decision utility: % of capex projects assessed with site/asset emissions baselines.

Publish a quarterly MDM & Emissions Data Quality Scorecard to the sustainability steering committee.

14) Common pitfalls (and how to avoid them)

1. Multiple names for the same supplier. Use LEI (or authoritative national identifiers) and survivorship rules; avoid relying solely on fuzzy matching.

2. “Sites” drifting with organisational changes. Decouple site identifiers from org structure; GLN + coordinates remain stable even as ownership changes.

3. Unversioned factors. Store factor IDs (e.g., EFDB record) and GWP basis; never overwrite last year’s values.

4. Sensor sprawl without metadata. Adopt SensorThings metadata from day one; a sensor without a source and site is unauditable.

5. Asset taxonomies improvised per site. Use ISO 14224 classes (or sectoral equivalents) to ensure comparability.

6. Scope 3 shortcuts. Spend‑based only approaches are fine for screening but not for mature disclosure; progress to activity data and GLEC‑aligned logistics.

7. Ignoring national systems. Where jurisdictions require registration (e.g., NAEIS and SAAELIP in South Africa), integrate IDs to prevent reconciliation pain.

15) Data governance essentials for ESG MDM

  • Policy: A board‑approved data policy stating that supplier, site, asset and emissions‑source data are controlled master data, subject to defined quality thresholds and change control.
  • RACI: Clear accountability: Procurement owns supplier master; Operations owns sites and assets; HSE/Environment owns emissions sources and methods.
  • Change management: Requests logged, impact assessed (downstream reports, tax returns, regulatory submissions), change window enforced near reporting close.
  • Training: Short steward training on identifiers (LEI, GLN), CRSs, factor versioning, and evidence‑keeping.
  • Audit trail: Use W3C PROV to encode “entity–activity–agent” for major changes (e.g., site re‑geocoding, factor updates).

16) Tailoring to different African realities

  • Connectivity and power reliability. Design for intermittent sync (edge buffering of meter reads, SMS‑based fallbacks for activity capture).
  • Informal supplier ecosystems. When LEIs are unavailable, collect national IDs and proof of registration; onboard via low‑tech forms and incrementally enrich records.
  • On‑ and off‑grid mixes. Track generator assets rigorously; many facilities will have multiple parallel sources (grid, diesel, solar).
  • Regulatory diversity. Build an extensible “Regulatory ID” structure per site; today it might be NAEIS/SAAELIP, tomorrow a different registry in another country.

17) Frequently asked questions

Q: Do we really need global identifiers like LEI and GLN?
A: You can run without them, but you will spend more on matching and reconciliation—and you’ll carry higher audit risk. LEI and GLN reduce duplicates and ambiguity, especially in cross‑border value chains.

Q: How do we integrate asset MDM with maintenance (EAM/CMMS)?
A: Keep the asset_id consistent across both; the ESG MDM holds classification, ownership and energy/emissions‑relevant attributes, while the EAM holds maintenance histories. ISO 14224 classing gives you a common language across plants.

Q: Which geospatial system should we standardise on?
A: Use WGS 84 / EPSG:4326 for latitude/longitude, explicitly recorded in the master data. This ensures compatibility with most GIS, hazard models and web maps.

Q: What about supplier Scope 3 data when suppliers won’t share?
A: Start with spend‑ or activity‑based estimates using reputable factors (IPCC EFDB and sectoral sources); prioritise engagement for material categories; use GLEC for logistics; document assumptions and plan to improve year‑on‑year.

18) The pay‑off: from compliance to value creation

With robust MDM:

  • Close faster, assure easier. Clean master data and documented factor versions cut reporting cycles and reduce audit findings under IFRS S2.
  • Target decarbonisation precisely. Asset‑level performance baselines identify the cheapest emissions to abate; standard classes (ISO 14224) support benchmarking and reliability gains.
  • Unlock finance and customers. Credible site/supplier data accelerates access to sustainability‑linked finance and meets multinational buyers’ data demands.
  • Tax certainty. For South Africa in particular, the carbon tax linkage means accurate master data directly influences liabilities and offsets compliance.

19) A concise checklist to start this quarter

1. Approve the canonical schema (suppliers, sites, assets, sources) and the ID strategy (LEI, GLN, internal IDs).

2. Publish the code lists: UNSPSC, energy carriers, units, CRS, GWP basis.

3. Stand up a minimum viable MDM (even in a spreadsheet + SQL database) with validation rules.

4. Choose and freeze factor/method versions for the current reporting year (IPCC 2006 + 2019 Refinement; sectoral where applicable).

5. Onboard top 10 sites and top 20 sources by emissions.

6. Map top 50 suppliers by spend to UNSPSC and collect primary data where material.

7. Instrument lineage and approvals using a lightweight W3C PROV representation (even a simple log with entity–activity–agent is a start).

Closing thought

In the coming years, African businesses that treat master data for suppliers, sites, assets and emissions sources as strategic infrastructure—not a compliance afterthought—will be the ones that close their books confidently, attract capital, and decarbonise at least cost. The building blocks are known, the standards are available, and the steps are achievable. Start small, standardise early, and build trust in your data—because in sustainability, as in finance, trust is your most valuable asset.

Contact Emergent Africa for a more detailed discussion or to answer any questions.