What Is Open Data?
Open data is government information published in formats that anyone can access, use, and redistribute without restrictions. It's the raw material of accountability — budget spreadsheets, procurement records, legislative votes, environmental monitoring, crime statistics, and much more.
The open data movement has transformed civic engagement. Journalists use procurement data to uncover corruption. Researchers use health data to identify disparities. Developers build apps that help citizens navigate government services. The possibilities are limited only by what governments publish and in what format.
Open Data Portals by Country
62 countries now operate national open data portals. Here are the most comprehensive and well-maintained ones:
Top-Rated Government Data Portals
| Country | Portal | Datasets | Formats | API Access | Quality |
|---|---|---|---|---|---|
| United States | data.gov | 300,000+ | CSV, JSON, XML, API | Yes | 93 |
| United Kingdom | data.gov.uk | 55,000+ | CSV, JSON, RDF, API | Yes | 91 |
| France | data.gouv.fr | 45,000+ | CSV, JSON, API | Yes | 89 |
| Canada | open.canada.ca | 95,000+ | CSV, JSON, XML, GeoJSON | Yes | 88 |
| Australia | data.gov.au | 100,000+ | CSV, JSON, API | Yes | 86 |
| South Korea | data.go.kr | 70,000+ | CSV, JSON, XML, API | Yes | 87 |
| Germany | govdata.de | 60,000+ | CSV, JSON, RDF | Partial | 82 |
| Estonia | avaandmed.eesti.ee | 12,000+ | CSV, JSON, API | Yes | 85 |
| India | data.gov.in | 320,000+ | CSV, JSON, XML, API | Yes | 74 |
| Brazil | dados.gov.br | 12,000+ | CSV, JSON | Partial | 71 |
International Open Data Sources
Beyond national portals, international organizations publish some of the most valuable governance datasets:
Essential International Datasets
- World Bank Open Data: Economic indicators, development data, and financial statistics for 217 countries. Excellent API with consistent historical time series. The gold standard for comparative economic data.
- IMF Data: Government finance statistics, fiscal data, exchange rates, and economic forecasts. Essential for understanding public finance across countries.
- UN Data: Population, health, education, environment, and human development indicators. Covers nearly every country with standardized methodology.
- OECD.Stat: Detailed governance, economic, and social statistics for 38 member countries. Excellent for comparing policies among developed nations.
- EU Open Data Portal: Datasets from all EU institutions covering legislation, budgets, trade, agriculture, and regional development.
- OpenAid / IATI: International aid flows — who gives what to whom, for what purpose. Critical for tracking development spending.
Key Dataset Categories
Budget & Financial Data
The most impactful category for accountability. Look for:
- Budget documents: Revenue projections, spending allocations, actual expenditures
- Procurement records: Government contracts — who won, for how much, through what process
- Aid flows: International development aid received and distributed
- Public debt: Government borrowing, bond issuance, and debt service costs
- Tax revenue: Revenue by source, collection rates, and tax expenditures (deductions/exemptions)
Legislative Data
- Bills and laws: Full text of proposed and enacted legislation
- Voting records: How each legislator voted on each bill
- Committee proceedings: Meeting minutes, hearing transcripts, and reports
- Lobbying registries: Who is lobbying whom, on what issues, and for how much
- Campaign finance: Political donations, spending, and donor information
Public Services & Performance
- Healthcare: Hospital performance, wait times, health outcomes, disease surveillance
- Education: School performance, graduation rates, spending per student
- Crime: Reported crimes, clearance rates, police activity
- Infrastructure: Road conditions, public transit performance, utility reliability
- Environmental: Air/water quality, emissions data, permit compliance
Data Formats Explained
| Format | Use Case | Machine-Readable | Difficulty |
|---|---|---|---|
| CSV | Tabular data, spreadsheets | Yes | Easy — opens in Excel/Sheets |
| JSON | Structured data, APIs | Yes | Moderate — best with code |
| XML | Structured documents | Yes | Moderate — verbose format |
| Reports, documents | No | Easy to read, hard to analyze | |
| GeoJSON/Shapefile | Geographic/spatial data | Yes | Needs GIS software |
| API | Real-time/dynamic data | Yes | Requires programming |
Beware: PDF Is Not Open Data
Many governments publish data as PDF documents and call it "open data." PDFs are designed for reading, not analysis. Extracting data from PDFs requires special tools and often introduces errors. True open data is published in machine-readable formats (CSV, JSON, XML) that can be directly imported into analysis tools. If your government publishes budgets only as PDFs, that's a red flag for transparency.
How to Work With Open Data
For Non-Technical Users
- Download CSV files and open them in Microsoft Excel, Google Sheets, or LibreOffice Calc
- Use pivot tables to summarize and explore large datasets
- Create charts to visualize trends and patterns
- Use Google Data Studio (free) to create interactive dashboards
- Try Datawrapper (free tier) to create publication-ready charts and maps
For Technical Users
- Python (pandas): The most popular tool for data analysis. pandas library handles CSV, JSON, Excel, and SQL. Combine with matplotlib/seaborn for visualization.
- R: Excellent for statistical analysis and visualization. The tidyverse ecosystem (dplyr, ggplot2) makes data wrangling intuitive.
- SQL: For working with large datasets. Load data into PostgreSQL or SQLite and query efficiently.
- Jupyter Notebooks: Combine code, analysis, and narrative in shareable documents. Great for reproducible research.
- APIs: Use Python's requests library or R's httr to pull real-time data directly from government APIs.
Open Data Quality Checklist
Not all open data is created equal. Evaluate datasets against these criteria:
- Timeliness: Is the data current? Budget data from 3 years ago has limited accountability value.
- Completeness: Are there gaps, missing fields, or unexplained null values?
- Granularity: Is the data detailed enough? Aggregate spending by department is less useful than line-item expenditures.
- Machine-readability: Can you import it directly into analysis tools without manual cleanup?
- Documentation: Is there a data dictionary explaining each field, its format, and any codes used?
- Licensing: Is the data truly open? Check for restrictive licenses that limit redistribution or commercial use.
- Consistency: Does the government use consistent formats and identifiers across datasets and over time?
Start Small
Don't try to analyze everything at once. Pick one question you want to answer — "How much does my city spend on police overtime?" or "Which contractors receive the most government business?" — and find the specific dataset that answers it. Starting with a focused question is far more productive than browsing data portals aimlessly.