The Detailed Guide to Fetching Daily Mutual Fund NAVs with Python (MUFAP Example)
2025-08-31 • Finance
Compliance & Educational Disclaimer
This article is for education and internal prototyping. Data sources may have terms of use that restrict scraping/redistribution. For commercial or external use, obtain rights and use official APIs/feeds from the data provider or licensed vendors. Always follow your organization’s legal and compliance guidance.
Executive summary
Finance teams often need daily Net Asset Values (NAVs) to support reporting, attribution, liquidity checks, and investor updates. While many providers distribute data via APIs or licensed feeds, some publishers present tables on public webpages.
This post shows a repeatable, low-friction Python workflow to extract NAV tables from a public site and save them to Excel/CSV for analysis. We use MUFAP as a concrete example; the same approach applies to other sites that publish HTML tables.
What we’ll build
- A short Python script that:
- Fetches a web page containing NAV tables
- Locates the correct table (robust header detection)
- Parses into a clean
pandas
DataFrame - Saves to
CSV
andXLSX
for downstream tools (Excel, Power BI, warehouse)
Libraries we use (and why)
requests
— Reliable HTTP client to fetch web pages (timeouts, headers).beautifulsoup4
— HTML parser; turns page markup into a tree so we can locate tables/rows.pandas
— Tabular data powerhouse; we’ll assemble rows into a DataFrame and export to CSV/Excel.lxml
(optional but recommended) — High-performance parser that makes BeautifulSoup faster and more permissive on messy HTML.
Install once:
pip install requests beautifulsoup4 pandas lxml
Example source (MUFAP)
- Daily industry page:
https://www.mufap.com.pk/Industry/IndustryStatDaily?tab=3
We’ll fetch this page and extract the main NAV table.
Step 1 — Fetch the page
import requests
URL = "https://www.mufap.com.pk/Industry/IndustryStatDaily?tab=3"
resp = requests.get(
URL,
headers={"User-Agent": "Mozilla/5.0 (finance-nav-fetcher)"},
timeout=60
)
resp.raise_for_status()
html = resp.text
print("HTTP:", resp.status_code, "| bytes:", len(html))
Tip: Some sites serve different HTML to bots. A realistic User-Agent
and reasonable timeout
help reduce transient failures.
Step 2 — Extract the NAV table into a DataFrame
We look for <table>
elements and detect headers that contain “Fund” and “NAV” within the first few rows—this makes the code resilient if banners or layout tables appear above the real content.
from bs4 import BeautifulSoup
import pandas as pd
def extract_nav_table(html: str) -> pd.DataFrame:
soup = BeautifulSoup(html, "html.parser") # or "lxml" if installed
tables = soup.find_all("table")
candidates = []
for tbl in tables:
# read all rows
raw_rows = []
for tr in tbl.find_all("tr"):
cells = tr.find_all(["th", "td"])
if not cells:
continue
texts = [c.get_text(" ", strip=True) for c in cells]
raw_rows.append(texts)
if not raw_rows:
continue
# heuristic: top rows should include "fund" and "nav"
header_idx = None
for i, row in enumerate(raw_rows[:5]):
row_lower = [x.lower() for x in row]
if "fund" in row_lower and "nav" in row_lower and len(row) >= 6:
header_idx = i
break
if header_idx is None:
continue
headers = raw_rows[header_idx]
ncols = len(headers)
body = [r for r in raw_rows[header_idx + 1:] if len(r) == ncols]
if not body:
continue
# optional: normalize to a familiar schema if lengths match
expected = [
"Fund", "Category", "Inception Date", "Offer", "Repurchase", "NAV",
"Validity Date", "Front-end", "Back-end", "Contingent", "Market", "Trustee"
]
if len(headers) == len(expected):
headers = expected
df = pd.DataFrame(body, columns=headers)
candidates.append(df)
if not candidates:
raise RuntimeError("No matching NAV table found.")
return pd.concat(candidates, ignore_index=True).drop_duplicates()
Run it on the page we fetched:
df = extract_nav_table(html)
print("Rows:", len(df))
print(df.head())
Step 3 — Save outputs for Excel/BI
Save both CSV and Excel so downstream tools can consume easily.
df.to_csv("navs_latest.csv", index=False)
df.to_excel("navs_latest.xlsx", index=False)
print("Wrote: navs_latest.csv, navs_latest.xlsx")
Optional: machine-friendly columns
If you prefer standardized column names for join keys or automation:
df2 = df.copy()
df2.columns = (
df2.columns.str.strip()
.str.lower()
.str.replace(r"[^a-z0-9]+", "_", regex=True)
)
df2.to_csv("navs_latest_clean.csv", index=False)
Production notes & guardrails
- Prefer licensed APIs for commercial use. Screen-scraped pages can change, and redistribution may be restricted.
- Set monitoring/retries. Wrap requests with retries and alerting; hold last good file if the page fails.
- Document lineage. Store the fetch date/time and source URL alongside your file for audit.
- Access control. Keep outputs in a controlled store if they flow into investor materials.
Adapting to other sources
If another site publishes NAVs in HTML:
- Update the
URL
. - Adjust the header heuristic (e.g., look for “Scheme” or “NAV per unit”).
- Normalize columns to your internal schema and export.
That’s it—your finance team gets repeatable, auditable NAV files for analysis with minimal friction, while staying mindful of policy and licensing.
Automation is the analyst’s best friend. The sooner you embrace it, the more time you’ll have for actual analysis and decision-making.
Support Mudric Lab
If a post saved you time, you can support our work.
Be professional. No promos. Off-topic may be removed.