Preparation#

This page shows you how to install PyFlink using pip, conda, installing from the source, etc.

Python Version Supported#

PyFlink Version

Python Version Supported

PyFlink 1.16

Python 3.6 to 3.9

PyFlink 1.15

Python 3.6 to 3.8

PyFlink 1.14

Python 3.6 to 3.8

You could check your Python version as following:

python3 --version

Create a Python virtual environment#

Virtual environment gives you the ability to isolate the Python dependencies of different projects by creating a separate environment for each project. It is a directory tree which contains its own Python executable files and the installed Python packages.

It is useful for local development to create a standalone Python environment and also useful when deploying a PyFlink job to production when there are massive Python dependencies. It’s supported to use Python virtual environment in your PyFlink jobs, see PyFlink Dependency Management for more details.

Create a virtual environment using virtualenv#

To create a virtual environment using virtualenv, run:

python3 -m pip install virtualenv

# Create Python virtual environment under a directory, e.g. venv
virtualenv venv

# You can also create Python virtual environment with a specific Python version
virtualenv --python /path/to/python/executable venv

The virtual environment needs to be activated before to use it. To activate the virtual environment, run:

source venv/bin/activate

That is, execute the activate script under the bin directory of your virtual environment.

Create a virtual environment using conda#

To create a virtual environment using conda (suppose miniconda), run:

# Download and install miniconda, the latest miniconda installers are available in https://repo.anaconda.com/miniconda/

# Suppose the name of the downloaded miniconda installer is miniconda.sh
chmod +x miniconda.sh
# install miniconda
./miniconda.sh -b -p miniconda

# Activate the miniconda environment
source miniconda/bin/activate

# Create conda virtual environment under a directory, e.g. venv
conda create --name venv python=3.8 -y

The conda virtual environment needs to be activated before to use it. To activate the conda virtual environment, run:

conda activate venv

Check the installed package#

You could then perform the following checks to make sure that the installed PyFlink package is ready for use:

curl -L https://raw.githubusercontent.com/apache/flink/master/flink-python/pyflink/examples/table/word_count.py -o word_count.py
python3 word_count.py
# You will see outputs as following:
# Use --input to specify file input.
# Printing result to stdout. Use --output to specify output path.
# +I[To, 1]
# +I[be,, 1]
# +I[or, 1]
# +I[not, 1]
# +I[to, 1]
# +I[be,--that, 1]
# ...

If there are any problems, you could perform the following checks.

Check the logging messages in the log file to see if there are any problems:

# Get the installation directory of PyFlink
python3 -c "import pyflink;import os;print(os.path.dirname(os.path.abspath(pyflink.__file__)))"
# It will output a path like the following:
# /path/to/python/site-packages/pyflink

# Check the logging under the log directory
ls -lh /path/to/python/site-packages/pyflink/log
# You will see the log file as following:
#  -rw-r--r--  1 dianfu  staff    45K 10 18 20:54 flink-dianfu-python-B-7174MD6R-1908.local.log

Besides, you could also check if the files of the PyFlink package are consistent. It may happen that you have installed an old version of PyFlink before and multiple PyFlink versions exist at the same time for some reason.

# List the jar packages under the lib directory
ls -lh /path/to/python/site-packages/pyflink/lib
# It will output a list of jar packages as following:
#  -rw-r--r--  1 dianfu  staff   190K 10 18 20:43 flink-cep-1.15.2.jar
#  -rw-r--r--  1 dianfu  staff   475K 10 18 20:43 flink-connector-files-1.15.2.jar
#  -rw-r--r--  1 dianfu  staff    93K 10 18 20:43 flink-csv-1.15.2.jar
#  -rw-r--r--  1 dianfu  staff   110M 10 18 20:43 flink-dist-1.15.2.jar
#  -rw-r--r--  1 dianfu  staff   171K 10 18 20:43 flink-json-1.15.2.jar
#  -rw-r--r--  1 dianfu  staff    20M 10 18 20:43 flink-scala_2.12-1.15.2.jar
#  -rw-r--r--  1 dianfu  staff    10M 10 18 20:43 flink-shaded-zookeeper-3.5.9.jar
#  -rw-r--r--  1 dianfu  staff    15M 10 18 20:43 flink-table-api-java-uber-1.15.2.jar
#  -rw-r--r--  1 dianfu  staff    35M 10 18 20:43 flink-table-planner-loader-1.15.2.jar
#  -rw-r--r--  1 dianfu  staff   2.9M 10 18 20:43 flink-table-runtime-1.15.2.jar
#  -rw-r--r--  1 dianfu  staff   203K 10 18 20:43 log4j-1.2-api-2.17.1.jar
#  -rw-r--r--  1 dianfu  staff   295K 10 18 20:43 log4j-api-2.17.1.jar
#  -rw-r--r--  1 dianfu  staff   1.7M 10 18 20:43 log4j-core-2.17.1.jar
#  -rw-r--r--  1 dianfu  staff    24K 10 18 20:43 log4j-slf4j-impl-2.17.1.jar

Please make sure that the versions of all the Flink jar packages are consistent, e.g. 1.15.2 in the above example.