Mateen Kiani
Published on Sun Jul 27 2025·4 min read
Spreadsheets often store critical data that powers reports, dashboards, and data pipelines. Yet, when it comes to reading XLSX files with Python, developers sometimes overlook cell types like dates, formulas, and styles. How can you handle these varied data types seamlessly when parsing an XLSX workbook in your scripts?
The answer lies in choosing the right Python library and using its features effectively. By understanding how openpyxl, pandas, and xlrd treat cell types and memory usage, you can prevent data loss and speed up your workflows. In the next sections, we’ll uncover practical tips for each approach, ensuring you make informed decisions and avoid common surprises.
Working with Excel files is common in business, science, and education. Python scripts can automate data intake, cleanup, and analysis, saving hours of manual work. An XLSX reader helps extract rows, columns, and metadata without opening Excel itself.
Beyond basic cell values, spreadsheets often include dates, formulas, merged cells, and custom number formats. A good XLSX reader preserves these nuances so your automation respects the original document. That means your scripts can handle financial reports with correct currency symbols, log dates for time-series analysis, and update cells without breaking formulas.
There are three popular libraries for reading XLSX in Python: openpyxl, pandas, and xlrd. Each has its strengths:
Library | Memory Use | Cell Types | Speed |
---|---|---|---|
openpyxl | Medium | Full support | Moderate |
pandas | High (DataFrame) | Values only | Fast |
xlrd | Low (legacy) | Values only | Fastest |
openpyxl reads and writes .xlsx files with full styling, formulas, and data types. pandas uses openpyxl or xlrd under the hood to load sheets into DataFrames quickly but drops styling and formulas. xlrd is optimized for speed and low memory but only supports .xls by default—newer versions no longer handle .xlsx.
To get started, install openpyxl. Then load a workbook and iterate rows:
from openpyxl import load_workbookwb = load_workbook('data.xlsx', data_only=True)ws = wb.activefor row in ws.iter_rows(min_row=2):values = [cell.value for cell in row]print(values)
data_only=True
returns calculated values instead of formulas.cell.is_date
to detect date cells and format them.ws.iter_rows(read_only=True)
for large files.If you prefer working with dictionaries, you can map headers to values by combining lists or see examples on iterating dictionaries.
pandas makes reading sheets into DataFrames a breeze:
import pandas as pddf = pd.read_excel('data.xlsx', sheet_name='Sheet1', engine='openpyxl')print(df.head())
You can:
dtype
or converters
to control types.usecols
to limit columns and save memory.nrows
and skiprows
.Once loaded, pandas lets you filter, group, and export data. To save your DataFrame as JSON, refer to writing JSON examples.
Beyond basic reads, you can access formulas, comments, and styles with openpyxl:
data_only=False
and read cell.value
as the formula string.ws.comments
to extract author and text.cell.font
, cell.fill
, and cell.number_format
for formatting details.Use these features to audit complex workbooks, validate cell formulas, or generate styled reports automatically. Handling these advanced parts ensures your scripts interact fully with Excel's power.
Working with large spreadsheets can hit memory and speed bottlenecks. Try:
read_only=True
in openpyxl to stream rows instead of loading all cells.usecols
in pandas.chunksize
for out-of-core processing.converters
.Tip: For files over 50,000 rows, streaming modes often halve memory usage and speed up iteration significantly.
Even seasoned developers run into issues:
xlrd
which no longer supports .xlsx. Always set engine='openpyxl'
.openpyxl
and pandas
are installed in your environment.None
or NaN
values you need to handle.Handling these pitfalls early saves debugging time and keeps your data pipelines robust.
Reading XLSX files in Python unlocks powerful automation and data analysis. By selecting the right library—openpyxl for full-featured reads, pandas for fast DataFrame processing, or xlrd for legacy .xls support—you control performance, accuracy, and memory use. Using openpyxl’s advanced APIs helps preserve formulas and styles, while pandas can slice and dice large sheets quickly.
Armed with these tools, you’re ready to build scripts that load, transform, and export spreadsheets into your workflows. Next time you face an Excel file, pick the method that fits your needs and tweak the parameters covered here. That way, you’ll avoid surprises and deliver reliable results every time.
Read XLSX files in Python using openpyxl, pandas, and xlrd with practical code examples and tips.