Current problems and solutions
The reliability of cluster operation is as usual the main problem. Because of the presence computers with SUN OS, Solaris and Digital UNIX in one cluster the problem was to combine together all of them. The solution to simplify the problem is to migrate from SUN OS to Solaris for all SUN computers. As a near future task the transfer from VMS to Digital UNIX must be done for all DEC Alphas after physicists will finish data processing jobs on VMS systems.
More clear distribution of functions between servers may bring additional advantage when speaking about reliability and performance of cluster. For example cross-mounted directories when used may cause overloading of a network segment or cluster hub decreasing overall performance.
The introduction of separate HDD RAID server based on special equipment or on usual computer (that is cheaper) may give higher disk reliability and better speed of computations. In this case all big disks are concentrated on one RAID server, and all other working servers have system disks only. The same server may be used as NFS and FTP server.
The network management includes NIS/NIS+, DNS services, LAN monitoring. These functions are to be executed by a separate server. Also it is reasonable to organize separate servers treating magnetic tape devices such as Exabyte and DLT, and servers for E-mail and WWW .
For the cluster operation under conditions when electric power is unstable the matrix high power UPS (uninterruptible power source) are needed.
Reliability and speed of LAN (now based mainly on thin Ethernet) can be improved by introducing of ATM standard on inter-laboratory institute level (fiber-optic, 155 Њb/s) and between LNP buildings and by recabling to 100 Њb/s links inside cluster. Also the modernisation of physical media of LAN to fiber-optic links between buildings are foreseen. In the buildings the recabling of LNP network to twisted pairs and star configuration in buildings for PC connection is planned.
Computer security becomes today a more serious problem. To keep safe the information received during expensive experiments and results of processing from unathorized destroying (un-intentionally or not) is of great importance. The use of ssh and Kerberos, more complex passwords and improving users consultancy will help to fight unathorised access to cluster.
Adding network laser printers in buildings, unification of E-mail based on PINE, introduction of technology of network FAX will improve the service for users of the computer centre.
Personnel problems are traditional. Measures to solve are search and education of system administrators and programmers, attracting of specialists from other LNP departments.