superset6

从以下几个点出发

1. 修改缓存方式为redis

安装Redis pip install redis

修改配置文件incubator-superset/superset/config.py中CACHE_CONFIG为以下内容,设置24小时过期

1
2
CACHE_CONFIG = {
    'CACHE_TYPE': 'redis',
    'CACHE_DEFAULT_TIMEOUT': 60 * 60 * 24, # 1 day default (in secs)
    'CACHE_KEY_PREFIX': 'superset_results',
    'CACHE_REDIS_URL': 'redis://localhost:6379/0',
}

2. 修改celery BROKER数据库sqlite为redis

配置文件:/root/incubator-superset/superset/config.py 修改以下内容:

1
2
3
4
5
6
BROKER_URL = 'redis://localhost:6379/0'
CELERY_RESULT_BACKEND = 'redis://localhost:6379/0'
CELERY_ACKS_LATE = True
#关闭utc时区,添加上海时区
CELERY_ENABLE_UTC = False
CELERY_TIMEZONE = 'Asia/Shanghai'

完整内容:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
class CeleryConfig(object):
    BROKER_URL = 'redis://localhost:6379/0'
    CELERY_IMPORTS = (
        'superset.sql_lab',
        'superset.tasks',
    )
    CELERY_RESULT_BACKEND = 'redis://localhost:6379/0'
    CELERYD_LOG_LEVEL = 'DEBUG'
    CELERYD_PREFETCH_MULTIPLIER = 10
    CELERY_ACKS_LATE = True
    CELERY_ENABLE_UTC = False
    CELERY_TIMEZONE = 'Asia/Shanghai'
    CELERY_ANNOTATIONS = {
        'sql_lab.get_sql_results': {
            'rate_limit': '100/s',
        },
        'email_reports.send': {
            'rate_limit': '1/s',
            'time_limit': 120,
            'soft_time_limit': 150,
            'ignore_result': True,
        },
    }
    CELERYBEAT_SCHEDULE = {
        'cache-warmup-hourly': {
            'task': 'cache-warmup',
            'schedule': crontab(minute=0,hour='*/2'),  # hourly
            'kwargs': {
                'strategy_name': 'top_n_dashboards',
                'top_n': 10,
                'since': '7 days ago',
            },
        },
    }

CELERY_CONFIG = CeleryConfig

3. 使用warmup指定预热数据策略。提前对画板进行数据缓存.

1.修改配置文件:/root/incubator-superset/superset/config.py

替换CELERYBEAT_SCHEDULE为以下内容,以下策略为,每小时更新最近七天top5的dashboard的缓存数据

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
CELERYBEAT_SCHEDULE = {
    'cache-warmup-hourly': {
        'task': 'cache-warmup',
        'schedule': crontab(minute=0, hour='*'),  # hourly
        'kwargs': {
            'strategy_name': 'top_n_dashboards',
            'top_n': 5,
            'since': '7 days ago',
        },
    },
}

2.修改配置文件中默认访问地址和端口,后台会通过地址,进行缓存数据,修改为和命令行启动端口一致。

配置文件:/root/incubator-superset/superset/config.py

1
2
3
SUPERSET_WEBSERVER_ADDRESS = '0.0.0.0'
SUPERSET_WEBSERVER_PORT = 7668

这里需要注意,0.35 ADDRESS 为0.0.0.0时会报错。需要添加前缀 http://0.0.0.0

官方bug:https://github.com/apache/incubator-superset/issues/8461

然后命令启动celery:启动work,-c 并发的work数

celery worker --app=superset.tasks.celery_app:app --broker=redis://localhost:6379/0 --pool=prefork -O fair -c 4

启动定时调度:

celery beat --app=superset.tasks.celery_app:app

任务监控celery,ui(可选择安装)

安装:pip install flower

启动:celery flower --port=7788 --broker=redis://localhost:6379/0

4. 缓存的配置行数,将数据调大。

需改配置文件:

env_superset/lib/python3.7/site-packages/flask_caching/__init__.py

修改行:

config.setdefault("CACHE_THRESHOLD", 50000)

5. 不同链接方式查询时间或许有差异。

经过测试superset 连接AWS Athena athena+rest比jdbc会稍慢。

且查询大部分时间是消耗在了数据返回到前台展示的时间,理解是解析耗时。

查看官方支持的连接方式,http://superset.apache.org/installation.html#database-dependencies,对比查询用时。

6. Gunicorn协程启动,

生产环境不要使用superset run的启动方式,Gunicorn协程启动,参考官网 a-proper-wsgi-http-server

安装gunicorn和gevent

pip install gunicorn

pip install gevent

  • 这里需要注意版本问题,superset0.30需要 gunicorn<19.9.0,但gunicorn由于向后不兼容的更改,Gunicorn在Python 3.7上中断,19.9.0才修复了该问题。

如果启动superset报错from gunicorn.workers.async import AsyncWorker,升级gunicorn和superset:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
[Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/gunicorn/util.py", line 135, in load_class
    mod = import_module('.'.join(components))
  File "/usr/local/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
   File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 724, in exec_module
  File "<frozen importlib._bootstrap_external>", line 860, in get_code
  File "<frozen importlib._bootstrap_external>", line 791, in source_to_code
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/geventlet.py", line 27
    from gunicorn.workers.async import AsyncWorker
                              ^
SyntaxError: invalid syntax
]

错误参考:

https://github.com/apache/incubator-superset/issues/8349 https://github.com/benoitc/gunicorn/issues/1822

启动命令:

  • 注意:在不加-D执行下才会输出superset程序日志。
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#-D 后台执行
#-w 启动进程数
#--log-level debug 日志级别
#--access-logfile /tmp/superset_access.log 指定access请求日志路径
#--error-logfile /tmp/superset_error.log 指定请求错误日志路径
        
      
前台启动:
gunicorn  -w 10 -k gevent --timeout 120 -b 0.0.0.0:7668 --limit-request-line 0 --limit-request-field_size 0 "superset.app:create_app()"



后台执行:
superset 0.35.0
gunicorn \
    -w 10 \
    --timeout 120 \
    -b 0.0.0.0:7668 \
    --limit-request-line 0 \
    --limit-request-field_size 0 \
    --forwarded-allow-ips=* 
    --log-level debug 
    --access-logfile /tmp/superset_access.log 
    --error-logfile /tmp/superset_error.log 
    superset:app -D
    
    
superset 0.36.0
gunicorn \
    -w 10 \
    --timeout 120 \
    -b 0.0.0.0:7668 \
    --limit-request-line 0 \
    --limit-request-field_size 0 \
    --forwarded-allow-ips=* 
    --log-level debug 
    --access-logfile /tmp/superset_access.log 
    --error-logfile /tmp/superset_error.log 
    "superset.app:create_app()" -D

停止命令:kill ps aux | grep "gunicorn" | grep -v 'grep' | awk '{print $2}'