The Detailed Guide to Fetching Daily Mutual Fund NAVs with Python (MUFAP Example)

2025-08-31 • Finance

Compliance & Educational Disclaimer
This article is for education and internal prototyping. Data sources may have terms of use that restrict scraping/redistribution. For commercial or external use, obtain rights and use official APIs/feeds from the data provider or licensed vendors. Always follow your organization’s legal and compliance guidance.

Executive summary

Finance teams often need daily Net Asset Values (NAVs) to support reporting, attribution, liquidity checks, and investor updates. While many providers distribute data via APIs or licensed feeds, some publishers present tables on public webpages.

This post shows a repeatable, low-friction Python workflow to extract NAV tables from a public site and save them to Excel/CSV for analysis. We use MUFAP as a concrete example; the same approach applies to other sites that publish HTML tables.

What we’ll build

A short Python script that:
1. Fetches a web page containing NAV tables
2. Locates the correct table (robust header detection)
3. Parses into a clean pandas DataFrame
4. Saves to CSV and XLSX for downstream tools (Excel, Power BI, warehouse)

Libraries we use (and why)

requests — Reliable HTTP client to fetch web pages (timeouts, headers).
beautifulsoup4 — HTML parser; turns page markup into a tree so we can locate tables/rows.
pandas — Tabular data powerhouse; we’ll assemble rows into a DataFrame and export to CSV/Excel.
lxml (optional but recommended) — High-performance parser that makes BeautifulSoup faster and more permissive on messy HTML.

Install once:

pip install requests beautifulsoup4 pandas lxml

Example source (MUFAP)

Daily industry page:
https://www.mufap.com.pk/Industry/IndustryStatDaily?tab=3

We’ll fetch this page and extract the main NAV table.

Step 1 — Fetch the page

import requests

URL = "https://www.mufap.com.pk/Industry/IndustryStatDaily?tab=3"

resp = requests.get(
    URL,
    headers={"User-Agent": "Mozilla/5.0 (finance-nav-fetcher)"},
    timeout=60
)
resp.raise_for_status()
html = resp.text
print("HTTP:", resp.status_code, "| bytes:", len(html))

Tip: Some sites serve different HTML to bots. A realistic User-Agent and reasonable timeout help reduce transient failures.

Step 2 — Extract the NAV table into a DataFrame

We look for <table> elements and detect headers that contain “Fund” and “NAV” within the first few rows—this makes the code resilient if banners or layout tables appear above the real content.

from bs4 import BeautifulSoup
import pandas as pd

def extract_nav_table(html: str) -> pd.DataFrame:
    soup = BeautifulSoup(html, "html.parser")  # or "lxml" if installed
    tables = soup.find_all("table")
    candidates = []

    for tbl in tables:
        # read all rows
        raw_rows = []
        for tr in tbl.find_all("tr"):
            cells = tr.find_all(["th", "td"])
            if not cells:
                continue
            texts = [c.get_text(" ", strip=True) for c in cells]
            raw_rows.append(texts)

        if not raw_rows:
            continue

        # heuristic: top rows should include "fund" and "nav"
        header_idx = None
        for i, row in enumerate(raw_rows[:5]):
            row_lower = [x.lower() for x in row]
            if "fund" in row_lower and "nav" in row_lower and len(row) >= 6:
                header_idx = i
                break
        if header_idx is None:
            continue

        headers = raw_rows[header_idx]
        ncols = len(headers)
        body = [r for r in raw_rows[header_idx + 1:] if len(r) == ncols]
        if not body:
            continue

        # optional: normalize to a familiar schema if lengths match
        expected = [
            "Fund", "Category", "Inception Date", "Offer", "Repurchase", "NAV",
            "Validity Date", "Front-end", "Back-end", "Contingent", "Market", "Trustee"
        ]
        if len(headers) == len(expected):
            headers = expected

        df = pd.DataFrame(body, columns=headers)
        candidates.append(df)

    if not candidates:
        raise RuntimeError("No matching NAV table found.")

    return pd.concat(candidates, ignore_index=True).drop_duplicates()

Run it on the page we fetched:

df = extract_nav_table(html)
print("Rows:", len(df))
print(df.head())

Step 3 — Save outputs for Excel/BI

Save both CSV and Excel so downstream tools can consume easily.

df.to_csv("navs_latest.csv", index=False)
df.to_excel("navs_latest.xlsx", index=False)
print("Wrote: navs_latest.csv, navs_latest.xlsx")

Optional: machine-friendly columns

If you prefer standardized column names for join keys or automation:

df2 = df.copy()
df2.columns = (
    df2.columns.str.strip()
               .str.lower()
               .str.replace(r"[^a-z0-9]+", "_", regex=True)
)
df2.to_csv("navs_latest_clean.csv", index=False)

Production notes & guardrails

Prefer licensed APIs for commercial use. Screen-scraped pages can change, and redistribution may be restricted.
Set monitoring/retries. Wrap requests with retries and alerting; hold last good file if the page fails.
Document lineage. Store the fetch date/time and source URL alongside your file for audit.
Access control. Keep outputs in a controlled store if they flow into investor materials.

Adapting to other sources

If another site publishes NAVs in HTML:

Update the URL.
Adjust the header heuristic (e.g., look for “Scheme” or “NAV per unit”).
Normalize columns to your internal schema and export.

That’s it—your finance team gets repeatable, auditable NAV files for analysis with minimal friction, while staying mindful of policy and licensing.

Automation is the analyst’s best friend. The sooner you embrace it, the more time you’ll have for actual analysis and decision-making.

Support Mudric Lab

If a post saved you time, you can support our work.

Buy me a coffee

X LinkedIn Facebook Reddit

Comments

Be professional. No promos. Off-topic may be removed.