Please help us continue to improve nanoHUB operation and service by completing our survey - http://bit.ly/nH-survey14. Thank you - we appreciate your time. close

Support

Support Options

Submit a Support Ticket

 
  • Discoverability Visible
  • Join Policy Invite Only
  • Created 29 Jun 2010

questions about optimizing memory use

  1. Jonathan DuBois

    I’m running some tests with the following input file roughly based on the input file in the manual.  I’m getting out of memory errors for every node configuration I’ve tried so far.  I’ve tried running on up to 120 cores with 2 Gb / core but it still runs out of memory. The number of atoms isn’t that large ~ 20000 so I suspect I am doing something stupid with the setup.  Any advice?

    Regards, Jonathan DuBois

    Structure
    {
      Material
      {
        name = Bi2Te3
          tag = substrate
          regions = (1)
          crystal_structure = Bi2Te3
      }
    
    
      Domain
      {
        name = structure1
          type = pseudomorphic
          base_material = substrate
          dimension = (50,50,50)
    
          periodic = (false, false, false)
    
          crystal_direction1 = (1,0,0)
          crystal_direction2 = (0,1,0)
          crystal_direction3 = (0,0,1)
    
          regions = (1)
          output = (xyz)
          geometry_description = simple_shapes
      }
    
      Geometry
      {
        Region
        {
          shape         = cuboid
            region_number = 1
            priority      = 4
            min           = (  0.0,  0.0, 0.5) // in nm
            max           = ( 10.0, 10.0, 5.5)
        }
      }
    }
    
    
    Solvers
    {
      solver {
        name = my_structure
          type = Structure
          domain = structure1
          active_atoms_only = true
          structure_file = structure.vtk
          unit_cell_file = unit_cell.vtk
          output_format = vtk
      }
      solver
      {
        name = my_schroedi
          type = Schroedinger
          domain = structure1
          active_regions = (1)
          tb_basis = sp3d5sstar_SO
          job_list = (assemble_H, passivate_H, calculate_band_structure)
          output = (energies, eigenfunctions_VTK)
          charge_model = electron_hole
          automatic_threshold = true
          chem_pot = 0.0
          temperature = 300
          eigen_values_solver = krylovschur
          number_of_eigenvalues = 10
          shift = 0.5
          k_space_basis = cartesian
          k_points = [(0,0,0)]
      }
    }
    
    
    Global
    {
      solve    = (my_structure,my_schroedi)
        messaging_level = 2
        logfile = structure.log
        database = ../../materials/all.mat
    }
    

    Report abuse

  2. Jean Michel D Sellier

    Dear Jonathan,

    The first thing I can say is that your input deck does not specify any parallelization scheme. You will have to add a new section right after the Region section in the Geometry section. You can do it in 2 ways. The first one is the following (attention, the numbers there are just examples):

    
      max_number_iterations = 20000
      convergence_limit = 1e-8
      monitor_convergence = true
      preconditioner = mumps
      ncv = 42
    
      shift = 1.10
      solver_transformation_type = sinvert
    
      job_list       =  (passivate_H,calculate_band_structure)
      eigen_values_solver   = krylovschur
      number_of_eigenvalues = 20
    
      eps_orthog_refinement = never
    
      output         = (energies,eigenfunctions_Point3D,eigenfunctions_Silo,eigenfunctions_xyz)
    
      tb_basis = sp3d5sstar_SO
    
    

    In this specific example, the user wants to parallelize his device between 120 CPUs. The keyword x_extension defines the range of the system to be parallelized in the x direction. The code will determine automatically what spatial parallelization scheme to use.

    If you want to specify the partition scheme then you will have to use a syntax like the following one:

    
      max_number_iterations = 20000
      convergence_limit = 1e-8
      monitor_convergence = true
      preconditioner = mumps
      ncv = 42
    
      shift = 1.10
      solver_transformation_type = sinvert
    
      job_list       =  (passivate_H,calculate_band_structure)
      eigen_values_solver   = krylovschur
      number_of_eigenvalues = 20
    
      eps_orthog_refinement = never
    
      output         = (energies,eigenfunctions_Point3D,eigenfunctions_Silo,eigenfunctions_xyz)
    
      tb_basis = sp3d5sstar_SO
    
    

    The keyword x_partition_nodes specify the boundary of the slices of the spatial parallelization.

    Concerning the parameters in the Schroedinger solver, try what you have now but if they don’t work I would suggest to use the following ones:

    
      max_number_iterations = 20000
      convergence_limit = 1e-8
      monitor_convergence = true
      preconditioner = mumps
      ncv = 42
    
      shift = 1.10
      solver_transformation_type = sinvert
    
      job_list       =  (passivate_H,calculate_band_structure)
      eigen_values_solver   = krylovschur
      number_of_eigenvalues = 20
    
      eps_orthog_refinement = never
    
      output         = (energies,eigenfunctions_Point3D,eigenfunctions_Silo,eigenfunctions_xyz)
    
      tb_basis = sp3d5sstar_SO
    
    

    I hope this helps,

    JM and Jim

    P.S.: next time you post a input deck please add { { { and close them with } } } without the spaces.

    Report abuse

  3. Jonathan DuBois

    Ok thanks, I thought the parallelization would be done automagically if I didnt specify the partition based on the available cpus.

    now I see that our quing system isnt playing nice with the static nemo binary. we use srun here. so I typically submit jobs like this:

    srun -p pdebug -n 60 ../../nemostatic structure.in > out

    but nemo doesnt seem to be aware of the other cpus. as I now get errors like this after adding a partition section

     Partitioning
      {
       x_extension = (-0.5, 5.5)
       y_extension = (-0.5, 5.5)
       z_extension = (-0.5, 5.5)
       num_geom_CPUs = 24
      }
    


    SimpleShapes: number of CPU is less than the number of partitions: (1

    Report abuse

  4. James Fonseca

    Hi Jonathan, Just to eliminate a potential issue I would make sure the # of processors you’re requesting with srun (60) equals the num_geom_CPUs. ( I realize the error is basically saying the opposite).

    I’m not really sure how transferable the static executable is going to be to different systems. I think it may depend on how mpiexec is built, but I’m just guessing.

    Ideally, tomorrow, you will have the capability to submit jobs from nanohub onto the RCAC clusters here at Purdue. Thanks, Jim

    Report abuse

  5. Jonathan DuBois

    I tried ncpu = num_geom_CPU but I got the same error. I think the problem is, as you suggest, with how mpiexec was built. I would attempt build from source but the resource seems to be not present at https://nanohub.org/resources/13244

    Thanks for all your help.

    Report abuse

  6. James Fonseca

    Hi Jonathan, Sorry about that. I’m not sure why it wasn’t there. I’ve put up the latest revision

    Jim

    Report abuse


nanoHUB.org, a resource for nanoscience and nanotechnology, is supported by the National Science Foundation and other funding agencies. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.