The Time Series Plot or Line Plot with Pandas is an essential tool for visualizing temporal data in Python. Pandas, a widely used data analysis library, provides powerful functions for handling and plotting time-indexed data. By utilizing line plots, you can observe trends, seasonality, and patterns in data over time, which is particularly helpful in financial analysis, forecasting, and scientific studies.
Time series data refers to data points collected or recorded at specific time intervals. These datasets are usually indexed with date or time stamps, and each record represents a snapshot at a specific point in time.
Before creating a Time Series Plot or Line Plot with Pandas, ensure you have the necessary libraries installed:
import pandas as pd import matplotlib.pyplot as plt
Let’s begin with generating a simple dataset with a date range as index:
date_range = pd.date_range(start='2023-01-01', end='2023-01-10') data = {'Temperature': [30, 32, 31, 29, 28, 35, 36, 34, 33, 32]} df = pd.DataFrame(data, index=date_range) print(df)
Pandas offers a built-in plotting method for line plots using the plot() function. This works seamlessly with time-indexed data.
df.plot(title='Temperature Over Time', figsize=(10, 5)) plt.xlabel('Date') plt.ylabel('Temperature (°C)') plt.grid(True) plt.show()
This will generate a line graph showing temperature trends over the given date range.
You can customize the plot to better visualize your data:
df.plot( title='Customized Temperature Plot', figsize=(12, 6), style='--o', color='green', linewidth=2, marker='x' ) plt.xlabel('Date') plt.ylabel('Temperature (°C)') plt.grid(True) plt.show()
When dealing with multiple variables, you can easily plot multiple lines on the same plot:
df['Humidity'] = [65, 70, 72, 68, 67, 69, 75, 73, 74, 71] df.plot(title='Temperature and Humidity Over Time', figsize=(10, 5)) plt.xlabel('Date') plt.ylabel('Values') plt.legend() plt.grid(True) plt.show()
Pandas allows you to resample time series data to a different frequency. This is useful for aggregating data.
# Resample to every 2 days and compute mean resampled_df = df.resample('2D').mean() resampled_df.plot(title='2-Day Average Plot', figsize=(10, 5)) plt.xlabel('Date') plt.ylabel('Average Values') plt.grid(True) plt.show()
Here is an example using a real CSV file with a time series column:
df_real = pd.read_csv('weather.csv', parse_dates=['Date'], index_col='Date') df_real['Temperature'].plot(title='Weather Temperature Over Time', figsize=(10, 5)) plt.xlabel('Date') plt.ylabel('Temperature') plt.grid(True) plt.show()
You can save the plot as an image file:
fig = df.plot(title='Exported Plot').get_figure() fig.savefig('timeseries_plot.png')
The Time Series Plot or Line Plot with Pandas offers a convenient and powerful way to visualize trends over time. Whether you're analyzing sales data, weather patterns, or financial records, Pandas makes it easy to create and customize line plots. With features like resampling, multiple variable plotting, and styling, it’s a vital tool in any data scientist’s toolbox.
Copyrights © 2024 letsupdateskills All rights reserved