从以下几个点出发
1. 修改缓存方式为redis
安装Redis pip install redis
修改配置文件incubator-superset/superset/config.py
中CACHE_CONFIG为以下内容,设置24小时过期
1
2
|
CACHE_CONFIG = {
'CACHE_TYPE': 'redis',
'CACHE_DEFAULT_TIMEOUT': 60 * 60 * 24, # 1 day default (in secs)
'CACHE_KEY_PREFIX': 'superset_results',
'CACHE_REDIS_URL': 'redis://localhost:6379/0',
}
|
2. 修改celery BROKER数据库sqlite为redis
配置文件:/root/incubator-superset/superset/config.py
修改以下内容:
1
2
3
4
5
6
|
BROKER_URL = 'redis://localhost:6379/0'
CELERY_RESULT_BACKEND = 'redis://localhost:6379/0'
CELERY_ACKS_LATE = True
#关闭utc时区,添加上海时区
CELERY_ENABLE_UTC = False
CELERY_TIMEZONE = 'Asia/Shanghai'
|
完整内容:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
|
class CeleryConfig(object):
BROKER_URL = 'redis://localhost:6379/0'
CELERY_IMPORTS = (
'superset.sql_lab',
'superset.tasks',
)
CELERY_RESULT_BACKEND = 'redis://localhost:6379/0'
CELERYD_LOG_LEVEL = 'DEBUG'
CELERYD_PREFETCH_MULTIPLIER = 10
CELERY_ACKS_LATE = True
CELERY_ENABLE_UTC = False
CELERY_TIMEZONE = 'Asia/Shanghai'
CELERY_ANNOTATIONS = {
'sql_lab.get_sql_results': {
'rate_limit': '100/s',
},
'email_reports.send': {
'rate_limit': '1/s',
'time_limit': 120,
'soft_time_limit': 150,
'ignore_result': True,
},
}
CELERYBEAT_SCHEDULE = {
'cache-warmup-hourly': {
'task': 'cache-warmup',
'schedule': crontab(minute=0,hour='*/2'), # hourly
'kwargs': {
'strategy_name': 'top_n_dashboards',
'top_n': 10,
'since': '7 days ago',
},
},
}
CELERY_CONFIG = CeleryConfig
|
3. 使用warmup指定预热数据策略。提前对画板进行数据缓存.
1.修改配置文件:/root/incubator-superset/superset/config.py
替换CELERYBEAT_SCHEDULE为以下内容,以下策略为,每小时更新最近七天top5的dashboard的缓存数据
1
2
3
4
5
6
7
8
9
10
11
12
|
CELERYBEAT_SCHEDULE = {
'cache-warmup-hourly': {
'task': 'cache-warmup',
'schedule': crontab(minute=0, hour='*'), # hourly
'kwargs': {
'strategy_name': 'top_n_dashboards',
'top_n': 5,
'since': '7 days ago',
},
},
}
|
2.修改配置文件中默认访问地址和端口,后台会通过地址,进行缓存数据,修改为和命令行启动端口一致。
配置文件:/root/incubator-superset/superset/config.py
1
2
3
|
SUPERSET_WEBSERVER_ADDRESS = '0.0.0.0'
SUPERSET_WEBSERVER_PORT = 7668
|
这里需要注意,0.35 ADDRESS 为0.0.0.0时会报错。需要添加前缀 http://0.0.0.0
官方bug:https://github.com/apache/incubator-superset/issues/8461
然后命令启动celery:启动work,-c 并发的work数
celery worker --app=superset.tasks.celery_app:app --broker=redis://localhost:6379/0 --pool=prefork -O fair -c 4
启动定时调度:
celery beat --app=superset.tasks.celery_app:app
任务监控celery,ui(可选择安装)
安装:pip install flower
启动:celery flower --port=7788 --broker=redis://localhost:6379/0
4. 缓存的配置行数,将数据调大。
需改配置文件:
env_superset/lib/python3.7/site-packages/flask_caching/__init__.py
修改行:
config.setdefault("CACHE_THRESHOLD", 50000)
5. 不同链接方式查询时间或许有差异。
经过测试superset 连接AWS Athena athena+rest比jdbc会稍慢。
且查询大部分时间是消耗在了数据返回到前台展示的时间,理解是解析耗时。
查看官方支持的连接方式,http://superset.apache.org/installation.html#database-dependencies,对比查询用时。
6. Gunicorn协程启动,
生产环境不要使用superset run的启动方式,Gunicorn协程启动,参考官网 a-proper-wsgi-http-server
安装gunicorn和gevent
pip install gunicorn
pip install gevent
- 这里需要注意版本问题,superset0.30需要 gunicorn<19.9.0,但gunicorn由于向后不兼容的更改,Gunicorn在Python 3.7上中断,19.9.0才修复了该问题。
如果启动superset报错from gunicorn.workers.async import AsyncWorker
,升级gunicorn和superset:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
|
[Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/gunicorn/util.py", line 135, in load_class
mod = import_module('.'.join(components))
File "/usr/local/lib/python3.7/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 724, in exec_module
File "<frozen importlib._bootstrap_external>", line 860, in get_code
File "<frozen importlib._bootstrap_external>", line 791, in source_to_code
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/geventlet.py", line 27
from gunicorn.workers.async import AsyncWorker
^
SyntaxError: invalid syntax
]
|
错误参考:
https://github.com/apache/incubator-superset/issues/8349
https://github.com/benoitc/gunicorn/issues/1822
启动命令:
- 注意:在不加-D执行下才会输出superset程序日志。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
|
#-D 后台执行
#-w 启动进程数
#--log-level debug 日志级别
#--access-logfile /tmp/superset_access.log 指定access请求日志路径
#--error-logfile /tmp/superset_error.log 指定请求错误日志路径
前台启动:
gunicorn -w 10 -k gevent --timeout 120 -b 0.0.0.0:7668 --limit-request-line 0 --limit-request-field_size 0 "superset.app:create_app()"
后台执行:
superset 0.35.0
gunicorn \
-w 10 \
--timeout 120 \
-b 0.0.0.0:7668 \
--limit-request-line 0 \
--limit-request-field_size 0 \
--forwarded-allow-ips=*
--log-level debug
--access-logfile /tmp/superset_access.log
--error-logfile /tmp/superset_error.log
superset:app -D
superset 0.36.0
gunicorn \
-w 10 \
--timeout 120 \
-b 0.0.0.0:7668 \
--limit-request-line 0 \
--limit-request-field_size 0 \
--forwarded-allow-ips=*
--log-level debug
--access-logfile /tmp/superset_access.log
--error-logfile /tmp/superset_error.log
"superset.app:create_app()" -D
|
停止命令:kill ps aux | grep "gunicorn" | grep -v 'grep' | awk '{print $2}'