ceph优化篇

操作系统优化

linux kernel

  1. 调度 (默认值deadline)
    ssd使用noop,hd使用deadline

    修改磁盘调度器有三种方法

    方法一:
    直接修改文件,重启失效,

    echo noop > /sys/block/sd*/queue/scheduler
    echo deadline > /sys/block/sd*/queue/scheduler

    方法二:
    修改grub,编辑/etc/default/grub文件

    GRUB_CMDLINE_LINUX="elevator=noop"

    之后执行update-grub2更新grub,重启之后生效
    这会把系统里的所有磁盘的调度器都全部修改,适合系统里磁盘都是统一类型的机器

    方法三:
    针对不通类型的磁盘,创建udev文件(create /etc/udev/rules.d/60-ssd-scheduler.rules)

    # set deadline scheduler for non-rotating disks
    ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="0", ATTR{queue/scheduler}="deadline"
  1. 预读 (默认值128)
    linux默认预读不能满足 rados的需求,建议8192kb

    echo '8192' > /sys/block/sd*/queue/read_ahead_kb
  2. 进程 (默认值196608)
    osd会消耗大量进程

    echo 4194303 > /proc/sys/kernel/pid_max
  3. cpu频率 (默认performance)
    使cpu运行在性能模式

    echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor >/dev/null
  4. swap
    在系统默认情况下,就算还有大量内存,swap也会开始使用,这对ceph集群来说是会影响性能

    echo "vm.swappiness = 0" | tee -a /etc/sysctl.conf
  5. cpu绑定
    绑定osd进程到指定cpu核心,不过这样有利有弊

network

  1. 巨型帧
    减少大量数据包分片对网络的影响
    ifconfig eth0 mtu 9000
    echo "MTU=9000" | tee -a /etc/sysconfig/network-script/ifcfg-eth0
    systemctl restart network

硬件加速

  1. 减小内存拷贝
    打开网卡tcp-segmentation-offload功能
    ethtool -K em1 tso on

ceph配置优化

[global]
fsid = c74e7a1b-b4aa-490b-b60e-8a8656d08226
mon initial members = ssd-node241
mon host = 172.16.8.241
public network = 172.16.8.0/24
cluster network = 172.16.8.0/24
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
osd pool default size = 2
osd pool default min size = 1
max open files = 131072
debug bluestore = 0/0
debug bluefs = 0/0
debug bdev = 0/0
debug rocksdb = 0/0
rbd cache = false
osd pool default pg num = 256
osd op num shards = 8
osd_op_num_threads_per_shard= 2
[client]
#rbd cache size = 268435456
#rbd cache max dirty = 134217728
#rbd cache max dirty age = 5
rbd_default_features = 3
rbd cache writethrough until flush = True

[mon]
mon data = /var/lib/ceph/mon/ceph-$id
mon_allow_pool_delete = true

[osd]
enable experimental unrecoverable data corrupting features = bluestore rocksdb zs
bluestore fsck on mount = true
osd objectstore = bluestore
bluestore = true
osd data = /var/lib/ceph/osd/ceph-$id
osd mkfs type = xfs
osd mkfs options xfs = -f
osd max write size = 512
osd client message size cap = 2147483648
osd deep scrub stride = 131072
osd op threads = 8
osd disk threads = 4
osd map cache size = 1024
osd map cache bl size = 128
osd_mount_options_xfs = rw,noatime,inode64,logbsize=256k
osd recovery op priority = 4
osd recovery max active = 10
osd max backfills = 4

从传统运维到云运维演进历程之软件定义存储(三)上
Ceph性能优化总结(v0.94)

文章作者: j0ck1e
文章链接: https://blog.j0ck1e.com/2017/07/20/ceph_Tuning/
版权声明: 本博客所有文章除特别声明外,均采用 CC BY-NC-SA 4.0 许可协议。转载请注明来自 j0ck1e's blog