Richard Jones, University of Connecticut
last updated May 8, 2013
This document describes the daemons that must be running on a client machine in order to submit jobs from there to OSG resources through the globus and Condor-G remote job submission and monitoring tools. Provision is also made for gridftp access to OSG compute and storage elements so that users can install software and databases needed by jobs, and access output data saved by the jobs on OSG storage resources.
This document is a condensation of the globus firewall howto. That reference document describes how to configure the firewall on a client machine in order to enable full access to OSG resources, but it also explains what components of the stack are responsible for, and what takes place over the different ports and protocols that must be enabled in the firewall. What follows is a summary of the essential information from that article.
The OSG virtual data toolkit (vdt) supports two mechanisms for job submission and management: Condor-G, and globus GRAM. Globus GRAM comes in two flavors, the web services (WS) interface (v.4) and the pre-WS interface (v.2). These two GRAM flavors are redundant, and only the pre-WS interface is mandatory for OSG sites, so at UConn we only support the pre-WS (v.2) interface. Nowadays the Condor-G mechanism is the preferred method for most OSG users, so the focus of this document is on that option, but GRAM is also very useful for issuing simple commands to remote servers and for testing the system.
Using Condor-G entails setting up a simplified condor server that runs on a single host and has no worker nodes. Its only purpose is to receive jobs that are submitted by local users, push them to remote OSG sites for execution, and monitor their progress on behalf of the user. Condor-G is automatically installed as a part of the vdt client stack. It can run either as root or as a particular user. To start it as a non-root user, the option --non-root must be used with the vdt-control condor startup command, as follows.
$ vdt-control --non-root --on condor
This command starts four daemons with the privileges of the user that started it.
Jobs to be run on the osg are submitted to the condor grid universe. Grid jobs flow through condor_schedd to globus job managers on remote sites. To manage the transfer of job files to the remote site, condor_schedd starts a temporary process called gahp_server to manage transfer of the job executable and other files that are submitted with it. This process communicates with its counterpart called the gridmanager on the server side over a set of ephemeral tcp ports that must be opened in the firewalls in both directions between the client host and OSG compute elements. This port range can be decided by the system administrator. The default configuration uses the following range of ephemeral ports, which are defined in the file $CONDOR_LOCATION/etc/condor_config.
A separate gahp_server is started on the client for each user submitting jobs over Condor-G. Tcp connections are initiated in both directions between the client and server during the lifetime of a job. Each gahp_server uses about 10-12 ephemeral tcp ports during initial job transfer, and again during delivery of job results back to the client at job completion. The default range of 500 ports is shared with globus tools as well.
The globus GRAM job submission protocol does not involve running any daemons, as the Condor-G mechanism does, but the communication between the client and server is similar because both of them use the globus gatekeeper. Instead of preparing a condor job description file, users of globus GRAM simply issue executable commands directly on the command line using tools "globusrun" and "globus_job_run". Job details, such as the number of copies of the job to run and other parameters, are specified using command-line arguments or provided in an input file through the -file option. The command "globus-job-get-output" is used to fetch results after job completion. Optionally the user can request that stdout and/or stderr be continuously streamed back to files on the client host during job execution.
As in the case of Condor-G, this functionality requires the availability of ephemeral ports opened in the client firewall for tcp connections initiated from both the client and the server. The ephemeral port range on the client that gets used for this is set in the user's environment who runs the jobs, in the following environmental variables.
To configure a client to use the UConn-OSG globus gatekeeper service, firewall rules must be added to accept incoming connections with the following specifications.
protocol | source addr | source port | destination port |
---|---|---|---|
tcp | 137.99.79.129/25 | * | 20000-20499 |
tcp | 137.99.19.1/25 | * | 20000-20499 |
The PORT_RANGE numbers define an inclusive range of ephemeral tcp ports that must be able to accept incoming tcp connections from OSG servers. The SOURCE_RANGE numbers define what ports will be used on the client for outgoing tcp connections to OSG compute elements. Usually firewall rules only care about the first of these two settings.
The OSG vdt provides a number of tools for file transfer to/from compute and storage elements on the grid. The most frequently used of these are gridftp user tools globus-url-copy and uberftp, and the SRM tool suite for managing grid storage. Normally jobs stage in their input files during job initiation and return them to the submission directory on the submit host upon completion. However there are cases where this mechanism is inappropriate, and there are files that should have a lifetime on the remote resource that is independent of the life cycle of any one job.
The globus-url-copy application is the primary tool for gridftp transfers. In cases where filesystem operations are needed, such as directory creation and listing or file removal, the uberftp command provides a suitable interface. Both of these tools employ the standard ftp protocol with separation of control and data paths. Data transfer takes place between ephemeral tcp ports on the client and server. Connections can be initiated from either end, so the client needs to be able to accept tcp connections to ephemeral ports from the server. The range of ephemeral ports used for gridftp data transfers is set by the following environment variables.
Their meaning is the same as explained above in section 1.2. A single active transfer with p parallel streams requires p ports from the above PORT_RANGE, so one should avoid making the port range too small and starving gridftp sessions of available ports.
SRM tools are specific to grid storage elements, which are used to hold large files to be read by grid jobs and large output files from grid jobs. SRM tools provide global access to files stored on grid storage elements that is optimized for large payloads and high throughput. SRM transactions include reserving space for future uploads, uploads and downloads of data files, listing of directories, setting of access permissions and ACL's on stored files, and control of archival to near-line storage in cases where a HSM is available.
SRM transactions do not actually move data, but rather are used to schedule data transfers (which are then delegated to gridftp transactions which actually do the transfer) and to manipulate metadata on the storage element. SRM tools perform their work by connecting to the SRM service on the storage element (typically at port 8443) and exchanging SOAP messages over http with the service. The only client firewall condition required for this to work is that outgoing connections to remote port 8443 on the server be allowed to go out through the firewall on the client end.
For simplicity, I have shared the same ephemeral port range between the Condor-G daemons, the globus GRAM components, and the gridftp tools. Each of these applications can take up 10-12 ephemeral ports per active request. A single user may have several requests active at once, and a single host may have several active users at once. The default setting above of 500 ephemeral ports should be sufficient, even for a client node hosting a dozen busy users.