Modeler, a tool for computational modeling of linear polymer systems

The computational study of intermolecular relationships of a given material can be used as a route for predicting quantities impossible or difficult to be determined experimentally. Furthermore properties of new materials can also be predicted by techniques of this type, when they are still in the modeling phase. This technique reproduces the classical dynamic relationships between the constituent elements of the material, atoms or unicorpuscular approximations of molecules, from interaction potential models called force fields. This work aims to develop a tool that performs the composition of linear polymeric chain systems through a self-avoided walk. For this, the concept of self-experimentation of long walks (SAWLC) was used, together with the Python language to develop MpolSys Modeler. This tool is a non-overlapping polymer chain generator, which in turn generates outputs that can be used as input to Moltemplate. To validate the tool's results, experiments were carried out in which the numbers and polymerization chains of the simulated polymer were varied, observing the overlap or not of the molecules that make up the simulation. At the end of the simulations, there were positive results that indicate a promising usage of the tool for the creation of polymers with a high number of chains and degrees of polymerization.


Introduction
Developing computational molecular dynamics (CMD) techniques is an arduous task that is associated with many possibilities of error during the modeling process. This is due to the need to determine the spatial arrangement of the model elements in the initial condition, correct description of the connections between particles and accurate selection of the reference simulation potentials (force fields). Because of this, the input composition of multipurpose simulation systems such as Largescale Atomic/Molecular Massively Parallel Simulator (LAMMPS), a classical molecular dynamics code with a focus on materials modeling developed by Plimpton et al. (2020), is a task that is almost impossible to accomplished manually.
Moreover, as in all other fields of science, but especially in molecular dynamics, the reproducibility of experiments is the only way to guarantee the reliability of the method, data and the assertions resulting from successful research. Humbert et al.
(2019) emphasizes the importance of data reproducibility by discussing the role of post-processing tools for CMD, an argument that applies even to pre-processing tools.
Gartner III and Jayaraman (2019), reviewed modeling and simulating polymers by CMD studies. They pointeds out results from a wrong model, or a poorly parameterized simulation, do not appear to be obvious errors and can, even in these cases be understood as acceptable.
In this sense, and in the same line of thought as Humbert et al. (2019), the tools for CMD preprocessing, act as an interface between the developer of the molecular model and the solver used for simulation, besides facilitate the input development processes, they reduce the probability of error in the modeling stage and maximize their reproducibility. Among those available for LAMMPS, the VMD TopoTools, Avogadro, Packmol and Moltemplate, the latter adopted in this research.  Jewett et al. (2021). Such tools, however, do not exempt the researchers from the setup of their models, because this process is highly dependent on the system to be studied and what is to be studied.
In this work we also present MPolSys Modeler acronym for Multiple Polymer Chain Systems, a complementary tool to Moltemplate for the composition of linear polymeric chain systems whose intermediate and extreme monomeric components (tail and head) have been previously created using data from some force field. The system aims to facilitate access to the selfavoided walk techniques to determine the positions of the elementary components, as well as to quantify the effectiveness of the modeling technique.
This work aims to present the main features of the current version of the software and the results of functionality tests performed.

Determining the shape of the chains
In our system, the shapes of the modeled chains are determined by movements of arbitrary direction governed by a kernel that determines them via self-avoiding three-dimensional wormlike chains (SAWLCs) (Kratky & Porod's, 1949). When a particle system is spatially represented, the space occupied by each component of the system constitutes a prohibitive condition for the occurrence of a Research, Society and Development, v. 10, n. 16, e359101624007, 2021 (CC BY 4.0) | ISSN 2525-3409 | DOI: http://dx.doi.org/10.33448/rsd-v10i16.24007 second particle in the volume occupied by the first, requiring that individual positions be unique. However, the arrangement of particles in a real system, a polymer in an arbitrary solvent for example, does not obey any deterministic condition and is completely random (Gujrati & Leonove, 2010). Because of this, the initialization of the system to be modeled should, in principle, obey both conditions so that the potential energy of the system is not excessively high to the point of causing an error in simulation time, nor so low that demanded excessive computing time or made convergence difficult (Birta & Arbez, 2013).
By definition, SAWLCs are theoretically capable of meeting both requirements, as they perform random walks (in R³ in our case) avoiding positions that have already been covered before, and simultaneously maintaining some semi-flexibility, as is common in polymer chains. The kernel for producing the positions in the created system was developed by Stefko and Douglass (2017) in the Laboratory of Experimental Biophysics of École polytechnique fédérale de Lausanne, Switzerland, the system uses Eigen/C++ library for matrices, vectors, and numeric computational routines (Guennebaud & Jacob, 2010).
The Stefko and Douglas numeric method is a discretized version of Kratky and Porod's (1949) SAWLCs continuous model, it considers the chain as a series of segments of the same length. The changes in direction of each of the successive segments are determined by the zenith angles θ and azimuth φ. While the azimuth varies randomly between 0 and 2π, the zenith varies according to a probability distribution function that simulates the stiffness of the polymer. If the persistence length is taken as lp, then the zenith is determined by Where X is a random variable normally distributed between 0 and 1. In our experiments the persistence length was considered equal to 5 Å.

Chemical bond size correction
Due to the maximum bond diameter generated by PolymerCpp is unity, MPolSys Modeler provides an intermonomeric bond distance correction tool, positon_crr function. When active, the user must provide an ideal separation distance for each of the monomers belonging to the modeled chains. It is desirable that this distance is generally consistent with those force field standards used in the study, this reduces the total energy of the system, reducing the pre-simulation energy minimization time, or even totally eliminating the need for this step.
The correction method operates on the Cartesian distance L between the monomers considering a pair of coordinates as a constant k determined by Equations (2) and (3).

= ∆ + (4)
If the corrected bond length is expressed by L', then the length correction factor in one of the orthogonal directions of the system is given by Equation (5).
Where ′ is the corrected length which is given by Equation (6).

Determining the complexity of the system
MPolSys Modeler aims not only to model chains of arbitrary lengths, but to compose entire systems for the simulation of multiple chains by CMD. However, PolymerCpp does not perform operations between different chains, leaving the risk that components of distinct chains will overlap.   The chains are distributed spirally and not circularly concentric to avoid the risk of unwanted bias during the simulation, caused by a regular distribution of units of the same chemical nature (Ovchinnikov & Karplus, 2012), (Calio et al., 2020), (Tribello et al., 2012).
The sequencing of repeat units is carried out through the sequence method used by MPolSys Modeler, which receives the chemical forms of the tail, head and intermediate monomers, besides the polymerization degree (PD) and the number of desired chains (NC). This process is sufficient to determine the positions of all monomers in the system to be simulated.
Unfortunately, this approach does not prioritize simulation volume reduction, which is an important computational aspect, but a method of chain translation and simulation volume optimization is currently under development and will replace this in later versions.
The transfer of the model to Moltemplate, is done through the GenPoly class belonging to Moltemplate itself. MPolSys Modeler already provides the matrix adjustment of chain coordinate format in a suitable shape compatible with that system. The software also performs automatic scaling of the simulation volume as the PolymerCpp kernel increases system complexity.

Modeling test
The system was tested under 6 different modeling conditions, prioritizing the combination of low PD with a large NC, and a small number of chains with a high degree of polymerization. Table 1 shows the combinations of GP and NC used in the automated tests that were performed. Research, Society andDevelopment, v. 10, n. 16, e359101624007, 2021 (CC BY 4.0) | ISSN 2525-3409 | DOI: http://dx.doi.org/10.33448/rsd-v10i16.24007  For each configuration, 500 computational experiments were performed to measure the measured tail to head distance R, and the radius of gyration Rg, the verifyWLC method was used in the process (Stefko & Douglass, 2017). The values of the mean radius and mean turning radius were used to compose a histogram of the 500 experiments performed. In addition, the system's adherence to the wormlike chains theoretical model (WLC) was also tested in a scalability assessment. A bond distance of 1.0 Å was used as a standard for the experiments.
Polyethylene chains with 2 and 3 monomers modeled according to the standard configurations of COMPASS force field were chosen. The characteristics of the functional groups modeled and extracted from Compass can be seen in Table 2, a graphical representation of the groups can also be seen in Figure 3 (Sun, 1998;Kondratyuk & Pisarev, 2019).

Results and Discussion
In Figures 4 and 5 show a two-view representation of the PD-10 polyethylene for a system with 10 and 100, chains. In a), the front views of each system and in b) the perspective views. Source: Authors, generated with OVITO (Stukowski, 2009).
The systems were visually inspected to detect particle superposition, considering the rendering radius close to the atomic radii of carbon and hydrogen. Because of the helical character of the arrangement of the chains, a single inspection of the innermost chains is enough to guarantee the non-overlapping of the particles. Source: Authors, generated with OVITO (Stukowski, 2009).
Due to the disproportion between the size of the chains and the NC = 500 system, these views were not placed here.
The adherence of ∆〈 2 〉 1/2 values to the optimal WLC theoretical model, however, the same only occurs for ∆〈 2 〉 1/2 values in longer chains, indicating a greater effectiveness of the system for chains with linear lengths greater than 6 Å.
In Figures (9) and (10) can be seen a two-view representation of the PD-100 polyethylene for a system with 10 and 100. Source: Authors, generated with OVITO (Stukowski, 2009).
Histograms retain the same normal profile shown in simpler systems. With the increase in the radius of the chains, the Research, Society andDevelopment, v. 10, n. 16, e359101624007, 2021 (CC BY 4.0) | ISSN 2525-3409 | DOI: http://dx.doi.org/10.33448/rsd-v10i16.24007 resolution of the mean values of ∆〈 2 〉 1/2 and ∆〈 2 〉 1/2 is lost, however, they still maintain a significant proximity to zero. The adherence of numerical data to the theoretical WLC model maintained a similar behavior, even with the increase in scale. Figure 12. Histograms of ∆〈 2 〉 1/2 and ∆〈 2 〉 1/2 for a system NC − 100/PD − 100 and results of numerical experiments adherence with the theoretical prediction in a scalability condition.
The uniformity of results for systems of multiple complexities favors the software's behavioral stability for a wide range of use, both for polymerization degrees and number of chains. Figure 14. Histograms of ∆〈 2 〉 1/2 and ∆〈 2 〉 1/2 for a system NC − 10/PD − 1000 and results of numerical experiments adherence with the theoretical prediction in a scalability condition.

Final Considerations
Based on the results obtained through the simulations carried out, it is possible to verify that MPolSys Modeler was successful in using the SAWLC's methodology to ensure that the polymer chains do not overlap. Thus, it can be used in simulation systems with a range of polymerization degrees and chains.
In parallel to this, it was also possible to notice, based on the generated adhesion graphs, that the tool performs better when it is used to generate systems with chain lengths greater than 6 Å. It should be noted that the tool can also be used for systems that have chains with lengths less than 6 Å, however adherence to the theoretical model will be lower. This is because of a limitation imposed by the SAWLC method.
After analyzing the weightings performed above, it is possible to suggest some increments that improved the performance of MPolSys Modeler. The first consideration is regarding the replacement of the chain overlapping check, which is done through visual localization, by an algorithm that performs this check. This would bring relevant improvements regarding the tool's reliability. Another gap for improvement is the replacement of the chain replication method by a methodology that uses the SAWLCs concept to perform the creation of unique chains. This will cause more realistic simulations, making the chains different from each other. Finally, there is also a possibility of optimization in incrementing the steps in the chain's composition arrangement helix. This optimization will make the system have a more cubic layout.