Stream Stock Market Data from a CSV file Using Python
This lab demonstrates how to use a Python Kafka producer to stream data from a CSV file into a Redpanda topic. The script simulates real-time stock market activity by pushing JSON formatted messages into a topic.
{"Date":"10/22/2013","Close/Last":"$40.45","Volume":"8347540","Open":"$39.95","High":"$40.54","Low":"$39.80"}
This script allows you to loop through data continuously, reverse the order of data for different viewing perspectives, and manipulate date columns for time-series analysis.
In this lab, you will:
-
Run the producer that streams data from a CSV file directly into a Redpanda topic.
-
Discover methods to alter the data stream, such as reversing the data sequence or looping through the data continuously for persistent simulations.
-
Adjust date fields dynamically to represent different time frames for analysis.
Run the lab
-
Clone this repository:
git clone https://github.com/redpanda-data/redpanda-labs.git
bash -
Change into the
clients/stock-market-activity/python/
directory:cd redpanda-labs/clients/stock-market-activity/python
bash -
Set the
REDPANDA_VERSION
environment variable to the version of Redpanda that you want to run. For all available versions, see the GitHub releases.For example:
export REDPANDA_VERSION=24.3.9
bash -
Set the
REDPANDA_CONSOLE_VERSION
environment variable to the version of Redpanda Console that you want to run. For all available versions, see the GitHub releases.For example:
export REDPANDA_CONSOLE_VERSION=2.8.5
bash -
Start a local Redpanda cluster:
docker compose -f ../../../docker-compose/single-broker/docker-compose.yml up -d
bash -
Create a virtual environment:
python3 -m venv .env source .env/bin/activate
bash -
Install the required dependencies:
pip3 install --upgrade pip pip3 install -r requirements.txt
bash -
Start the producer:
python producer.py --brokers localhost:19092
bashYou should see that the producer is sending messages to Redpanda:
Message delivered to market_activity [0] offset 0
-
Open Redpanda Console at localhost:8080.
The producer sent the stock market data in the CSV file to the market_activity
topic in Redpanda.
Options
The script supports several command-line options to control its behavior:
python producer.py [options]
Option | Description |
---|---|
|
Display the help message and exit. |
|
Specify the path to the CSV file to be processed. Defaults to |
|
Specify the topic to which events will be published. Defaults to the name of the CSV file (without its extension). |
|
Comma-separated list of the host and port for each Redpanda broker. Defaults to |
|
Specify the column in the CSV file that contains date information. By default, the script converts these dates to ISO 8601 format. If the looping option ( |
|
Read the file into memory and reverse the order of data before sending it to Redpanda. When used with the |
|
Continuously loop through the file, reading it into memory and sending data to Redpanda in a loop. When combined with the |
Clean up
To exit the virtual environment:
deactivate
To shut down and delete the containers along with all your cluster data:
docker compose -f ../../../docker-compose/single-broker/docker-compose.yml down -v