So, i got persuaded to switch from a “server that is going to do everything” to “compute server + storage server”

The two are connected via a DAC on an intel x520 network card.

Compute is 10.0.0.1, Storage is 10.255.255.254 and i left the usable hosts in the middle for future expansion.

Before I start to use it, I’m wondering if i chose the right protocols to share data between them.

I set NFS and iSCSI.

With iSCSI i create an image, share that image on the compute server, format it as btrfs, use it as a native drive. Files are not accessible anywhere else.

With NFS i just mount the share and files can be accessed from another computer.

Speed:

I tried to time how long it takes to fill a dummy file with zeroes.

/iscsi# time sh -c "dd if=/dev/zero of=ddfile bs=8k count=250000 && sync"
250000+0 records in
250000+0 records out
2048000000 bytes (2.0 GB, 1.9 GiB) copied, 0.88393 s, 2.3 GB/s

real    0m2.796s
user    0m0.051s
sys     0m0.915s
/nfs# time sh -c "dd if=/dev/zero of=ddfile bs=8k count=250000 && sync"
250000+0 records in
250000+0 records out
2048000000 bytes (2.0 GB, 1.9 GiB) copied, 2.41414 s, 848 MB/s

real    0m3.539s
user    0m0.038s
sys     0m1.453s
/sata-smr-wd-green-drive-for-fun# time sh -c "dd if=/dev/zero of=ddfile bs=8k count=250000 && sync"
250000+0 records in
250000+0 records out
2048000000 bytes (2.0 GB, 1.9 GiB) copied, 10.1339 s, 202 MB/s

real    0m46.885s
user    0m0.132s
sys     0m2.423s

what i see from this results:

the sata slow drive goes at 1.6 gigabit/s but then for some reason the computer needs so much time to acknowledge the operation.

nfs transferred it at 6.8 gigabit/s which is what i expected from a nvme array. Same command on the storage server gives similar speed.

iscsi transfers at 18.4 gigabit/s which is not possible with my drives and the fiber connection. Probably is using some native file system trickery to detect “it’s just a file full of zeroes, just tell the user it’s done”

The biggest advantage of NFS is that I can share a whole directory and get direct access. Also sharing another disk image via iscsi requires a service restart which means i have to take down the compute server.

But with iscsi i am the owner of the disk so i can do whatever i want, don’t need to worry about permissions, i am root, chown all the stuff

So… after this long introduction and explanation, what protocol would you use for…:

  • /var/lib/mysql - a database. Inside a disk image shared via iscsi or via nfs?

  • virtual machine images. Copy them inside another image that’s then shared via iscsi? Maybe nfs is much better for this case. Otherwise with iscsi i would have a single giant disk image that contains other disk images…

  • lots of small files like WordPress. Maybe nfs would add too much overhead? But it would be much easier to backup if it was an NFS share instead of a disk image

  • @[email protected]
    link
    fedilink
    English
    4
    edit-2
    1 year ago

    I haven’t ever run an iSCSI setup, but…

    I don’t know what your application is, but if you’re planning on running a MySQL database on this, I can imagine that a throughput test isn’t going to be representative of your performance, since latency may matter a lot and throughput not so much. You may want to specifically test that.

    ponders

    I would guess that iSCSI probably exposes write barriers. That is, btrfs can say “all writes prior to this point must become durable before writes subsequent to this point”, without actually requiring that any data is committed to the disk at the time that the write barrier is issued.

    But I believe that the Linux file API has a more-limited set of ways in which it can provide ordering without durability. There’s no fwritebarrier(), just fsync(), and that forces a change to become durable.

    Depending upon how MySQL works, that might have a significant impact on performance.

    Also, NFSv3, which I assume you are using, has behavior around locking and caching that differs from NFSv4 and I don’t know for sure how it will interact with something like MySQL, which may care a lot about precise write ordering behavior.

    Disk images will also rely on write ordering to avoid corruption on power loss.

    googles

    Yeah.

    https://dev.mysql.com/doc/refman/8.2/en/disk-issues.html

    Using NFS with MySQL

    You should be cautious when considering whether to use NFS with MySQL. Potential issues, which vary by operating system and NFS version, include the following:

    • MySQL data and log files placed on NFS volumes becoming locked and unavailable for use. Locking issues may occur in cases where multiple instances of MySQL access the same data directory or where MySQL is shut down improperly, due to a power outage, for example. NFS version 4 addresses underlying locking issues with the introduction of advisory and lease-based locking. However, sharing a data directory among MySQL instances is not recommended.

    • Data inconsistencies introduced due to messages received out of order or lost network traffic. To avoid this issue, use TCP with hard and intr mount options.

    • Maximum file size limitations. NFS Version 2 clients can only access the lowest 2GB of a file (signed 32 bit offset). NFS Version 3 clients support larger files (up to 64 bit offsets). The maximum supported file size also depends on the local file system of the NFS server.

    Using NFS within a professional SAN environment or other storage system tends to offer greater reliability than using NFS outside of such an environment. However, NFS within a SAN environment may be slower than directly attached or bus-attached non-rotational storage.

    If you choose to use NFS, NFS Version 4 or later is recommended, as is testing your NFS setup thoroughly before deploying into a production environment.

    That’s kind of hand-wavy, but it does reinforce my concern about sticking a MySQL database on the thing.

    I don’t have an answer for you as to which to use – it’s been a while since I’ve worked on network filesystem stuff, and I’m kinda shaking loose rusty bits trying to recall this – but in general I would be a little concerned about data integrity of both disk images and MySQL databases stored over a network. One can build a system that does it correctly, but I would try to do what I can to research potential issues there.

    I would also probably test your actual workload if you’re concerned about performance, because it may differ a lot from what a simple throughput test might suggest for those uses.

    • @[email protected]OP
      link
      fedilink
      English
      11 year ago

      yes after more thought, database is much better on iscsi. I can just create a 10gb image and share that. And getting backups from daily ZFS snapshots