Bccd-ng-notes-march-08

From Education

Jump to: navigation, search
How to handle multiple mpi libraries
	mpich
	lam
	openmpi
	bccd-switch-mpi (changes env variables and .profile), if NFS than it can write once rather than doing something remote
	build binaries for all supported MPIs, how to handle apt-get building for more than one

Community Mode
	login/inittab listens for a community leader, if one or more is present prompts user to join one or go standalone.
	
	Standalone - local password file, no shared disk, user is prompted for username they would like to use, we create and they go.  Polls every N seconds for a community and updates the list of available ones.
			
	Community Member - join auth protocol, mount distributed file system, user creates creds, workshop leader approves. 

	Community Leader - starts bccd-community and is prompted for which directory should be exported as /bccd/... and the name of the community.  Tool to sync locally created users with Edu-Grid LDAP server.  Must have a persistant local disk to save u/p, etc.  

	Shared file system
		Optional, enables "community mode", pkbcast and bccd-switch-mpi work the same with and without it.  Command line tool to join later. 
			
		When you boot-up in community mode you name your island, and then if the bccd login program detects one island, just joins; if more than island available give them a choice of which to join
			
		nodes have to go up and down easily.
			
		on start of community then prompted for name and usernames to use, manual list entry or prefix with a range. 
				how to make this persistant?  BCCD community than must have writable disk of some sort!
				
				detected with bccd-community-start, if there is an "image" available option 
				
		mpirun is a wrapper that builds the machines file in the correct format based on what the environment variables are.  we sanitize it before we call the real mpirun.
				
		when clients run login prompted to join a community (if one or more exists).  If one exists, and nobody logins-in, than we join that automatically after 1 minute or so.

		NFS and LDAP?
			
		bccd-
			allow-all - no
			build-info - yes
			checkem - subsumed under mpirun wrapper
			deny-all - subsumed under login/community 
			join-group - subsumed under login/community 
			leave-group - subsumed under login/community 
			snarf-hosts - subsumed under mpirun wrapper
			syncdir - no
			
			pbkast - 
			
		Edu-Grid interface on the BCCD (web pointer)
		
		Torque - front-end to Edu-Grid queues and ultimately a local queue that comes with the community
		
		USB boot of this image.  Use the USB key with not only the image but credentials which allow community membership.
		
		boot modes
			automode - subsumed under login/community
			c3mode - subsumed under login/community
			intelfb - replaced by knoppix
			i810fb - replaced by knoppix
			nohotplug - replaced by knoppix
			quickboot - yes
			runinram - no, NFS root later
			startdhcp - subsumed under login/community
			
			Later
				NFS root to replace runinram

	Review high level BCCD list of software
		Yes 
			gnu - gcc, gfortran, gdb, gprof, gcov, gmp
			ATLAS, GotoBLAS
			OpenMP
			icc/MKL/idb - Wilf about license issues
			Java - license ok now, PJ (from RIT)
			Condor
			Apache
			Ganglia running and configured on community leader for monitoring emerging islands of activity.
			R and Rmpi
			Octave
			PAPI/PERFCTR
			FFTW
			Gromacs 
			Firefox 
			mpich, lam, openmpi
			Torque - check license compatibility 
			Python
			Ruby
			C3 tools
			xpdf
			xmpi
			sl
			robotfindskitten
			xgaliga

			Later			
				PVFS/Lustre
				Sage
				POVray
				Blender 
				Ogre

		No
			openmosix
			wulfd
			rdesktop
			
	Testing plan - Alex and Kevin with old ACLs

	Liberation model - Skylar's working on it now
		
Kevin's Notes
	Charliep: Material is kind of a stack
		- 3 layers
	Paul: Sprints
	Material
		- Condor (next level)
		- BCCD (fundamental)
		-
	
	BCCD:
	 - pkbcast
	  - some kind of broadcast "I'm here"
	  - static keys? absolutely not.  must be created /at least/ once per boot
	  - supposed to clobber old keys
	  - what happens if machine reboots to new ip address?
	  - each machine keeps own list of who's out there
	
	  - user wants to run mpi
		- need to grab current list of machines file
		-
	
	BCCD: new features
	 - OpenMP
	 - gfortran
	 - icc?  get their permission
	 - java?  <---- license? okay /if/ extending?
	 - mpich
	 - goto ... what hoops?  ask J (John B TACC guy)
Personal tools
SC Education sites