Stream Stock Market Data from a CSV file Using Python

This lab demonstrates how to use a Python Kafka producer to stream data from a CSV file into a Redpanda topic. The script simulates real-time stock market activity by pushing JSON formatted messages into a topic.

{"Date":"10/22/2013","Close/Last":"$40.45","Volume":"8347540","Open":"$39.95","High":"$40.54","Low":"$39.80"}

This script allows you to loop through data continuously, reverse the order of data for different viewing perspectives, and manipulate date columns for time-series analysis.

In this lab, you will:

  • Run the producer that streams data from a CSV file directly into a Redpanda topic.

  • Discover methods to alter the data stream, such as reversing the data sequence or looping through the data continuously for persistent simulations.

  • Adjust date fields dynamically to represent different time frames for analysis.

Prerequisites

Before running the lab, ensure you have the following installed on your host machine:

Run the lab

  1. Clone this repository:

    git clone https://github.com/redpanda-data/redpanda-labs.git
  2. Change into the clients/stock-market-activity/python/ directory:

    cd redpanda-labs/clients/stock-market-activity/python
  3. Set the REDPANDA_VERSION environment variable to the version of Redpanda that you want to run. For all available versions, see the GitHub releases.

    For example:

    export REDPANDA_VERSION=24.2.10
  4. Set the REDPANDA_CONSOLE_VERSION environment variable to the version of Redpanda Console that you want to run. For all available versions, see the GitHub releases.

    For example:

    export REDPANDA_CONSOLE_VERSION=2.7.2
  5. Start a local Redpanda cluster:

    docker compose -f ../../../docker-compose/single-broker/docker-compose.yml up -d
  6. Create a virtual environment:

    python3 -m venv .env
    source .env/bin/activate
  7. Install the required dependencies:

    pip3 install --upgrade pip
    pip3 install -r requirements.txt
  8. Start the producer:

    python producer.py --brokers localhost:19092

    You should see that the producer is sending messages to Redpanda:

    Message delivered to market_activity [0] offset 0
  9. Open Redpanda Console at localhost:8080.

The producer sent the stock market data in the CSV file to the market_activity topic in Redpanda.

Options

The script supports several command-line options to control its behavior:

python producer.py [options]
Option Description

-h, --help

Display the help message and exit.

-f, --file, --csv

Specify the path to the CSV file to be processed. Defaults to ../data/market_activity.csv.

-t, --topic

Specify the topic to which events will be published. Defaults to the name of the CSV file (without its extension).

-b, --broker, --brokers

Comma-separated list of the host and port for each Redpanda broker. Defaults to localhost:9092.

-d, --date

Specify the column in the CSV file that contains date information. By default, the script converts these dates to ISO 8601 format. If the looping option (-l) is enabled, the script will increment each date by one day for each iteration of the loop, allowing for dynamic time series simulation.

-r, --reverse

Read the file into memory and reverse the order of data before sending it to Redpanda. When used with the -l option, data is reversed only once before the looping starts, not during each loop iteration.

-l, --loop

Continuously loop through the file, reading it into memory and sending data to Redpanda in a loop. When combined with the -d option, it modifies the specified date column by incrementally increasing the date with each loop iteration, simulating real-time data flow over days. When used with -r, the data order is reversed initially, and then the loop continues with the reversed data set.

Clean up

To exit the virtual environment:

deactivate

To shut down and delete the containers along with all your cluster data:

docker compose -f ../../../docker-compose/single-broker/docker-compose.yml down -v