Filesystem
The cluster filesystem is based on GlusterFS in a shared root configuration. The master server is sharing its filesystem at the root level and all nodes connect to it and merge into it at boot time. As part of the boot process, an algorithm runs which analyzes the master filesystem and compares it the ramdisk that is running the currently booting compute node. All conflicts and duplications are resolved here, and the node will then boot into a unified filesystem that is identical for all nodes.
1. Server Configuration
The server configuration is controlled by a single config file at /etc/glusterfs/glusterfs-server.vol. This can be modified with any text editor, and can be viewed in the web interface under the Cluster Storage section.
The configuration file is made up of functional areas called “translators”. Each translator is like a communication layer that takes input from the translator above it, performs its function, and then hands the output to the next layer down. By default, the configuration file has only minimal translators implemented and is quite simple.
The config file that follows is typical:
##############################################
### GlusterFS Server Volume Specification ##
##############################################
# Primary Volume
volume storage1
type storage/posix # POSIX FS translator
option directory / # Export this directory
end-volume
#
#volume iothreads #iothreads can give performance a boost
# type performance/io-threads
# option thread-count 8
# option cache-size 32MB
# subvolumes storage1
#end-volume
#
volume server
type protocol/server
option transport-type socket
option address-family inet
subvolumes storage1
option auth.addr.storage1.allow 192.168.1.* # Allow access to "storage1" volume
end-volume
and can be broken down as follows:
1. We define an export of / as volume “storage1”
2. “iothreads” is a performance translator that is currently disabled (commented out)
3. “storage1” is exported by a server daemon on the TCP/IP stack, and all client threads on this network are allowed to connect
Many other performance translators can be implemented and are explained in more detail on the GlusterFS website. It is possible to greatly expand the functionality of the storage back end by implementing additional translators and using a more complex topology, but that is beyond the scope of this document.
Please contact us for help expanding the storage back end to include additional storage nodes, network data striping, Infiniband high speed interconnects, and/or configuration file tuning for faster i/o and more throughput.
2. Client Configuration
##############################################
### GlusterFS Client Volume Specification ##
##############################################
### Add client feature and attach to remote subvolume
volume master1
type protocol/client
option transport-type socket
option address-family inet
option remote-host 192.168.1.60 # IP address of the remote brick
option remote-subvolume storage1 # name of the remote volume
end-volume
### Add io-cache feature
volume io-cache
type performance/io-cache
option cache-size 64MB # default is 32MB
option page-size 128KB # default is 128KB
subvolumes master1
end-volume
Just as with the server config, there are optional performance translators that can modify the behavior of the i/o. “io-cache”, for example, takes physical memory and caches i/o requests. Repeated i/o will be much faster coming from cache and for some applications, this could greatly improve performance.
There are other translators that provide aggregate writes (combining frequent small writes into fewer large ones), posix file locks, multi-path high availability, and the ability to group multiple storage servers into a single namespace.