airflow的官网地址,在上面查看安装教程也可以的
http://airflow.apache.org/
安装命令:yum install gcc
安装完成后,怎么看gcc的版本:gcc -v
在root/目录下新建文件夹setuptools:
命令:mkdir setuptools
下载setuptools,在setuptools目录下执行:
命令:wget https://pypi.python.org/packages/source/s/setuptools/setuptools-19.6.tar.gz#md5=c607dd118eae682c44ed146367a17e26
然后解压这个压缩包:
命令:tar -zxvf setuptools-19.6.tar.gz
解压完成后,进入到解压后的目录,看截图
然后在这个目录下,执行编译、安装的命令
编译命令:python setup.py build
安装命令:python setup.py install
在root/目录下新建文件夹pip
命令:mkdir pip
在pip文件夹下执行安装
命令:wget https://pypi.python.org/packages/11/b6/abcb525026a4be042b486df43905d6893fb04f05aac21c32c638e939e447/pip-9.0.1.tar.gz#md5=35f01da33009719497f01a4ba69d63c9
执行完后,显示如下;
解压缩这个pip的包
命令:tar -zxvf pip-9.0.1.tar.gz
执行完后,显示如下;
进入到这个解压完后的文件夹
然后在这个目录下,执行编译、安装的命令
编译命令:python setup.py build
安装命令:python setup.py install
命令:pip install paramiko
命令:yum -y install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gdbm-devel db4-devel libpcap-devel xz-devel
命令:pip install apache-airflow(比较新的版本)
pip install airflow(1.8的版本,安装的时候报错如下)
显示成如下的信息,则表示安装成功
Successfully installed Babel-2.8.0 Flask-Babel-1.0.0 Flask-JWT-Extended-3.24.1 Flask-OpenID-1.2.5 Flask-SQLAlchemy-2.4.1 Mako-1.1.2 MarkupSafe-1.1.1 PyJWT-1.7.1 PyYAML-5.3 WTForms-2.2.1 alembic-1.4.1 apache-airflow-1.10.9 apispec-1.3.3 argcomplete-1.11.1 attrs-19.3.0 cached-property-1.5.1 cattrs-0.9.0 certifi-2019.11.28 chardet-3.0.4 click-7.1.1 colorama-0.4.3 colorlog-4.0.2 configparser-3.5.3 croniter-0.3.31 defusedxml-0.6.0 dill-0.3.1.1 docutils-0.16 flask-1.1.1 flask-admin-1.5.4 flask-appbuilder-2.2.4 flask-caching-1.3.3 flask-login-0.4.1 flask-swagger-0.2.13 flask-wtf-0.14.3 funcsigs-1.0.2 future-0.16.0 graphviz-0.13.2 gunicorn-19.10.0 idna-2.9 importlib-metadata-1.5.0 iso8601-0.1.12 itsdangerous-1.1.0 jinja2-2.10.3 json-merge-patch-0.2 jsonschema-3.2.0 lazy-object-proxy-1.4.3 lockfile-0.12.2 markdown-2.6.11 marshmallow-2.19.5 marshmallow-enum-1.5.1 marshmallow-sqlalchemy-0.22.3 numpy-1.18.1 pandas-0.25.3 pendulum-1.4.4 prison-0.1.2 psutil-5.7.0 pygments-2.6.1 pyrsistent-0.15.7 python-daemon-2.1.2 python-dateutil-2.8.1 python-editor-1.0.4 python3-openid-3.1.0 pytz-2019.3 pytzdata-2019.3 requests-2.23.0 setproctitle-1.1.10 six-1.14.0 sqlalchemy-1.3.13 sqlalchemy-jsonfield-0.9.0 sqlalchemy-utils-0.36.1 tabulate-0.8.6 tenacity-4.12.0 termcolor-1.1.0 text-unidecode-1.2 thrift-0.13.0 typing-3.7.4.1 typing-extensions-3.7.4.1 tzlocal-1.5.1 unicodecsv-0.14.1 urllib3-1.25.8 werkzeug-0.16.1 zipp-3.1.0 zope.deprecation-4.4.0
命令(设置变量):export AIRFLOW_HOME=~/airflow
命令(初始化数据库):airflow initdb
airflow initdb的命令是在/usr/local/python3/bin的目录下执行的
airflow initdb的命令会在~/airflow 目录下生成配置文件
(如果想设置成永久变量,可以在/etc/profile里添加,添加后记得source /etc/profile)
命令:airflow webserver
启动成功后展示的内容如下;
启动调度器
命令:airflow scheduler
在浏览器中输入: IP:8080端口进行访问
我的开始是访问不了的,原因是centos7中没有关闭防火墙,我把防火墙关闭后,在浏览器中可以正常访问
airflow本身配置的是连接sqllite的数据库,我们一般实际开发都是MySQL数据库,所以我们需要改一下连接数据库的配置
修改airflow的配置文件,airflow.cfg
这一行像我这么配置
executor = LocalExecutor
这一行配置你自己的用户名,密码
sql_alchemy_conn = mysql://root:123456@localhost:3306/airflow
初始化数据库,提示如下,告诉你没有airflow这个库
命令:airflow initdb
然后根据提示去你的MySQL数据库中创建一个数据库名为airflow的库
命令:create database airflow
查看所有的库
命令:show databases;
然后初始化数据库
命令:airflow initdb
上面的错误,只需要修改MySQL的my.cnf文件就可以了
命令:vi /etc/my.cnf
在最后一行加上:explicit_defaults_for_timestamp = 1
然后重新启动MySQL
命令:service mysqld restart
报错如下;
上面的报错是因为MySQL的严格模式的问题,把下图中的参数去掉就可以了,查询参数的命令:select @@sql_mode;
把上面的那个参数去掉,修改一下my.cnf文件就可以了
修改完以后,重新启动MySQL
再执行初始化数据库的命令
初始化数据库后,再去airflow的库中查询一下,显示如下,生成一堆新表,初始化成功
这个插件很重要,当你写的dag非常复杂的时候,只是在命令行中操作十分不方便,所以需要安装一个页面来进行操作,这样会极大的提高开发效率
插件安装完的效果如图
2
3,看着是不是很爽
下面说下这个插件怎么安装,不截图了,当时安装的时候十分痛苦,你懂的
https://github.com/lattebank/airflow-dag-creation-manager-plugin
因为该插件还集成了安全认证,但使用的flask-login模块与当前的airflow自动下载的模块版本不匹配,先卸载原来的flask-login
卸载命令:pip uninstall flask-login
安装命令:pip install flask-login 0.2.11
安装某个版本的插件都是这样弄,用这个插件解压完成后,首先把plugins下面的内容全部拷贝到你安装airflow的目录下,新建plugins文件夹
新建文件夹命令:mkdir plugins
拷贝的命令:scp -r /plugins/* /root/airflow/plugins/ 前面的路径根据你执行的路径来确定
修改airflow.cfg配置文件
添加这些配置信息,注意,最后一行是你解压插件的位置
–需要加入到airflow.cfg文件中的内容
[dag_creation_manager]
dag_creation_manager_line_interpolate = basis
dag_creation_manager_queue_pool = mydefault:mydefault|mydefault
dag_creation_manager_queue_pool_mr_queue = mydefault:mydefault
dag_creation_manager_category = custom
dag_creation_manager_task_category = custom_task:#ffba40
dag_creation_manager_default_email = xxx@qq.com
dag_creation_manager_need_approver = False
dag_creation_manager_can_approve_self = True
dag_creation_manager_dag_templates_dir = /root/airflow/plugins/dcmp/dag_templates
该插件启用之后,许多功能会被屏蔽掉,此处不开启,如果需要开启在airflow.cfg中的[webserver]配置:
不开启配置如下;
authenticate = False
auth_backend = dcmp.auth.backends.password_auth
开启配置如下;强烈建议不开启,开启的话很多功能就没有了
authenticate = True
auth_backend = dcmp.auth.backends.password_auth
我当时配置的开启,如图所示
第一次执行时需要先升级一下现有的airflow数据库
命令:python /root/airflow/plugins/dcmp/tools/upgradedb.py
需要根据你自己的插件路径来写
执行成功时,显示如下;
[2020-03-16 11:31:00,705] {settings.py:253} INFO - settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=5873
[2020-03-16 11:31:00,855] {dag_converter.py:32} INFO - loading dag template: dag_code
[2020-03-16 11:31:00,955] {dag_converter.py:32} INFO - loading dag template: dag_code
[2020-03-16 11:31:01,022] {models.py:71} WARNING - Run python {AIRFLOW_HOME}/plugins/dcmp/tools/upgradedb.py first
sql:
CREATE TABLE IF NOT EXISTS `dcmp_dag` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`dag_name` varchar(100) NOT NULL,
`version` int(11) NOT NULL,
`category` varchar(50) NOT NULL,
`editing` tinyint(1) NOT NULL,
`editing_user_id` int(11) DEFAULT NULL,
`editing_user_name` varchar(100) DEFAULT NULL,
`last_editor_user_id` int(11) DEFAULT NULL,
`last_editor_user_name` varchar(100) DEFAULT NULL,
`updated_at` datetime(6) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `dag_name` (`dag_name`),
KEY `category` (`category`),
KEY `editing` (`editing`),
KEY `updated_at` (`updated_at`)
) DEFAULT CHARSET=utf8mb4;
[2020-03-16 11:31:01,133] {base_hook.py:84} INFO - Using connection to: id: dag_creation_manager_plugin_sql_alchemy_conn. Host: 127.0.0.1, Port: 3306, Schema: airflow, Login: root, Password: XXXXXXXX, extra: None
()
sql:
CREATE TABLE IF NOT EXISTS `dcmp_dag_conf` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`dag_id` int(11) NOT NULL,
`dag_name` varchar(100) NOT NULL,
`action` varchar(50) NOT NULL,
`version` int(11) NOT NULL,
`conf` text NOT NULL,
`creator_user_id` int(11) DEFAULT NULL,
`creator_user_name` varchar(100) DEFAULT NULL,
`created_at` datetime(6) NOT NULL,
PRIMARY KEY (`id`),
KEY `dag_id` (`dag_id`),
KEY `dag_name` (`dag_name`),
KEY `action` (`action`),
KEY `version` (`version`),
KEY `created_at` (`created_at`)
) DEFAULT CHARSET=utf8mb4;
[2020-03-16 11:31:01,149] {base_hook.py:84} INFO - Using connection to: id: dag_creation_manager_plugin_sql_alchemy_conn. Host: 127.0.0.1, Port: 3306, Schema: airflow, Login: root, Password: XXXXXXXX, extra: None
()
sql:
ALTER TABLE dcmp_dag ADD editing_start datetime(6);
[2020-03-16 11:31:01,164] {base_hook.py:84} INFO - Using connection to: id: dag_creation_manager_plugin_sql_alchemy_conn. Host: 127.0.0.1, Port: 3306, Schema: airflow, Login: root, Password: XXXXXXXX, extra: None
()
sql:
ALTER TABLE dcmp_dag ADD INDEX editing_start (editing_start);
[2020-03-16 11:31:01,183] {base_hook.py:84} INFO - Using connection to: id: dag_creation_manager_plugin_sql_alchemy_conn. Host: 127.0.0.1, Port: 3306, Schema: airflow, Login: root, Password: XXXXXXXX, extra: None
()
sql:
ALTER TABLE dcmp_dag ADD last_edited_at datetime(6);
[2020-03-16 11:31:01,194] {base_hook.py:84} INFO - Using connection to: id: dag_creation_manager_plugin_sql_alchemy_conn. Host: 127.0.0.1, Port: 3306, Schema: airflow, Login: root, Password: XXXXXXXX, extra: None
()
sql:
ALTER TABLE dcmp_dag ADD INDEX last_edited_at (last_edited_at);
[2020-03-16 11:31:01,215] {base_hook.py:84} INFO - Using connection to: id: dag_creation_manager_plugin_sql_alchemy_conn. Host: 127.0.0.1, Port: 3306, Schema: airflow, Login: root, Password: XXXXXXXX, extra: None
()
sql:
ALTER TABLE dcmp_dag_conf CHANGE conf conf mediumtext NOT NULL;
[2020-03-16 11:31:01,231] {base_hook.py:84} INFO - Using connection to: id: dag_creation_manager_plugin_sql_alchemy_conn. Host: 127.0.0.1, Port: 3306, Schema: airflow, Login: root, Password: XXXXXXXX, extra: None
()
sql:
CREATE TABLE IF NOT EXISTS `dcmp_user_profile` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`is_superuser` tinyint(1) NOT NULL,
`is_data_profiler` tinyint(1) NOT NULL,
`is_approver` tinyint(1) NOT NULL,
`updated_at` datetime(6) NOT NULL,
`created_at` datetime(6) NOT NULL,
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`),
KEY `is_superuser` (`is_superuser`),
KEY `is_data_profiler` (`is_data_profiler`),
KEY `is_approver` (`is_approver`),
KEY `updated_at` (`updated_at`),
KEY `created_at` (`created_at`)
) DEFAULT CHARSET=utf8mb4;
[2020-03-16 11:31:01,255] {base_hook.py:84} INFO - Using connection to: id: dag_creation_manager_plugin_sql_alchemy_conn. Host: 127.0.0.1, Port: 3306, Schema: airflow, Login: root, Password: XXXXXXXX, extra: None
()
sql:
ALTER TABLE dcmp_dag ADD approved_version int(11) NOT NULL;
[2020-03-16 11:31:01,438] {base_hook.py:84} INFO - Using connection to: id: dag_creation_manager_plugin_sql_alchemy_conn. Host: 127.0.0.1, Port: 3306, Schema: airflow, Login: root, Password: XXXXXXXX, extra: None
()
sql:
ALTER TABLE dcmp_dag ADD INDEX approved_version (approved_version);
[2020-03-16 11:31:01,459] {base_hook.py:84} INFO - Using connection to: id: dag_creation_manager_plugin_sql_alchemy_conn. Host: 127.0.0.1, Port: 3306, Schema: airflow, Login: root, Password: XXXXXXXX, extra: None
()
sql:
ALTER TABLE dcmp_dag ADD approver_user_id int(11) DEFAULT NULL;
[2020-03-16 11:31:01,468] {base_hook.py:84} INFO - Using connection to: id: dag_creation_manager_plugin_sql_alchemy_conn. Host: 127.0.0.1, Port: 3306, Schema: airflow, Login: root, Password: XXXXXXXX, extra: None
()
sql:
ALTER TABLE dcmp_dag ADD approver_user_name varchar(100) DEFAULT NULL;
[2020-03-16 11:31:01,491] {base_hook.py:84} INFO - Using connection to: id: dag_creation_manager_plugin_sql_alchemy_conn. Host: 127.0.0.1, Port: 3306, Schema: airflow, Login: root, Password: XXXXXXXX, extra: None
()
sql:
ALTER TABLE dcmp_dag ADD last_approved_at datetime(6);
[2020-03-16 11:31:01,514] {base_hook.py:84} INFO - Using connection to: id: dag_creation_manager_plugin_sql_alchemy_conn. Host: 127.0.0.1, Port: 3306, Schema: airflow, Login: root, Password: XXXXXXXX, extra: None
()
sql:
ALTER TABLE dcmp_dag ADD INDEX last_approved_at (last_approved_at);
[2020-03-16 11:31:01,538] {base_hook.py:84} INFO - Using connection to: id: dag_creation_manager_plugin_sql_alchemy_conn. Host: 127.0.0.1, Port: 3306, Schema: airflow, Login: root, Password: XXXXXXXX, extra: None
()
sql:
ALTER TABLE dcmp_dag_conf ADD approver_user_id int(11) DEFAULT NULL;
[2020-03-16 11:31:01,547] {base_hook.py:84} INFO - Using connection to: id: dag_creation_manager_plugin_sql_alchemy_conn. Host: 127.0.0.1, Port: 3306, Schema: airflow, Login: root, Password: XXXXXXXX, extra: None
()
sql:
ALTER TABLE dcmp_dag_conf ADD approver_user_name varchar(100) DEFAULT NULL;
[2020-03-16 11:31:01,574] {base_hook.py:84} INFO - Using connection to: id: dag_creation_manager_plugin_sql_alchemy_conn. Host: 127.0.0.1, Port: 3306, Schema: airflow, Login: root, Password: XXXXXXXX, extra: None
()
sql:
ALTER TABLE dcmp_dag_conf ADD approved_at datetime(6);
[2020-03-16 11:31:01,594] {base_hook.py:84} INFO - Using connection to: id: dag_creation_manager_plugin_sql_alchemy_conn. Host: 127.0.0.1, Port: 3306, Schema: airflow, Login: root, Password: XXXXXXXX, extra: None
()
sql:
ALTER TABLE dcmp_dag_conf ADD INDEX approved_at (approved_at);
[2020-03-16 11:31:01,614] {base_hook.py:84} INFO - Using connection to: id: dag_creation_manager_plugin_sql_alchemy_conn. Host: 127.0.0.1, Port: 3306, Schema: airflow, Login: root, Password: XXXXXXXX, extra: None
()
sql:
ALTER TABLE dcmp_user_profile ADD approval_notification_emails text NOT NULL;
[2020-03-16 11:31:01,624] {base_hook.py:84} INFO - Using connection to: id: dag_creation_manager_plugin_sql_alchemy_conn. Host: 127.0.0.1, Port: 3306, Schema: airflow, Login: root, Password: XXXXXXXX, extra: None
()
上面的其实就是新建几张表,我们去数据库中去查询对比一下就可以了,可以看到新增了几张表
执行python /root/airflow/plugins/dcmp/tools/upgradedb.py这条命令时,如果不成功的时候,你点击dag creation manager菜单会报错的
下面的这张图是点击dag creation manager时报错的图片,清晰的告诉你表不存在
点击create按钮,显示如下;
–点击save,显示如下,则表示保存成功
–可以去数据库中去查看,因为我已经配置好连接MySQL的数据库,所以,直接去MySQL数据库中的airflow库中查询即可,如图所示
–查询泳池对应的表即可
–创建用户
–在数据库中users表中查询,显示如下;
–创建连接
–创建成功后,显示如下;
–去数据库中去验证一下,已经成功的添加进来了
–创建变量
–点击save时,成功的添加,显示如下;
–去数据库中去变量表variable中去查询,显示如下;
–创建xcom,
–点击save成功,显示如下;
–保存成功后,去数据库中查询一下;
–建造泳池的时候报错如下;
Failed to create record. (pymysql.err.InternalError) (1366, “Incorrect string value: ‘\xE8\xBF\x99\xE6\x98\xAF…’ for column ‘description’ at row 1”) [SQL: INSERT INTO slot_pool (pool, slots, description) VALUES (%(pool)s, %(slots)s, %(description)s)] [parameters: {‘pool’: ‘wzx_pool’, ‘slots’: 128, ‘description’: ‘这是我建造的泳池’}] (Background on this error at: http://sqlalche.me/e/2j85)
–解决办法
–不能用汉语,用英语就可以了
执行命令pip install airflow安装过程中遇到的问题:
pip install airflow
Collecting airflow
Downloading https://files.pythonhosted.org/packages/98/e7/d8cad667296e49a74d64e0a55713fcd491301a2e2e0e82b94b065fda3087/airflow-0.6.tar.gz
Complete output from command python setup.py egg_info:
/usr/lib64/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: ‘long_description_content_type’
warnings.warn(msg)
running egg_info
creating pip-egg-info/airflow.egg-info
writing pip-egg-info/airflow.egg-info/PKG-INFO
writing top-level names to pip-egg-info/airflow.egg-info/top_level.txt
writing dependency_links to pip-egg-info/airflow.egg-info/dependency_links.txt
writing manifest file ‘pip-egg-info/airflow.egg-info/SOURCES.txt’
warning: manifest_maker: standard file ‘-c’ not found
reading manifest file 'pip-egg-info/airflow.egg-info/SOURCES.txt'
writing manifest file 'pip-egg-info/airflow.egg-info/SOURCES.txt'
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-build-T5rfvr/airflow/setup.py", line 32, in <module>
raise RuntimeError('Please install package apache-airflow instead of airflow')
RuntimeError: Please install package apache-airflow instead of airflow
----------------------------------------
Command “python setup.py egg_info” failed with error code 1 in /tmp/pip-build-T5rfvr/airflow/
You are using pip version 9.0.1, however version 20.0.2 is available.
You should consider upgrading via the ‘pip install --upgrade pip’ command.
报错信息中可以看到建议安装apache-airflow
–执行安装命令时,报错如下,安装超时的问题
raise ReadTimeoutError(self._pool, None, ‘Read timed out.’)
ReadTimeoutError: HTTPSConnectionPool(host=‘files.pythonhosted.org’, port=443): Read timed out.
出现这个报错,直接升级pip的命令即可
You are using pip version 9.0.1, however version 20.0.2 is available.
You should consider upgrading via the ‘pip install --upgrade pip’ command.
–升级pip
[root@localhost airflow]# pip install --upgrade pip
Collecting pip
Downloading https://files.pythonhosted.org/packages/54/0c/d01aa759fdc501a58f431eb594a17495f15b88da142ce14b5845662c13f3/pip-20.0.2-py2.py3-none-any.whl (1.4MB)
100% |████████████████████████████████| 1.4MB 33kB/s
Installing collected packages: pip
Found existing installation: pip 9.0.1
Uninstalling pip-9.0.1:
Successfully uninstalled pip-9.0.1
Successfully installed pip-20.0.2
–又出现了这个错误;
raise BackendUnavailable(data.get(‘traceback’, ‘’))
BackendUnavailable
–启动的时候报错
No such file or directory: ‘gunicorn’: ‘gunicorn’
–解决方案
–ln -s /usr/local/python3/bin/gunicorn /usr/bin/gunicorn
–安装上传安装包的工具
命令:yum install lrzxz
然后输入rz即可,选择要上传的文件即可