RNAstructure logo

RNAstructure Python Interfaces

RNAstructure tools can be accessed from the Python scripting language by accessing the RNAstructure Python Interface, which is a Python extension module.

Most RNAstructure classes have Python bindings, including:

  • RNA
  • ProbScan
  • Dynalign_object
  • Multilign_object
  • Oligowalk_object

The file components of the extension are listed below:

• RNAstructure.py The main RNAstructure Python class.
• Error_handling.py An internal utility that turns RNAstructure numeric error codes into Python exceptions.
• RNAstructure_wrap.py The Python-code interface between RNAstructure.py and the native (C++) binary library.
(This is generated by SWIG.)
• _RNAstructure_wrap.so
or _RNAstructure_wrap.dll
Native (C++) binary library that adds Python bindings to the RNAstructure class library.
The name of this file depends on the operating-system and the version of python.
In python 3, C++ extensions have long platform-specific suffixes, like _RNAstructure_wrap.cpython-36m-x86_64-linux-gnu.so
For simplicity it is always called _RNAstructure_wrap.so in the documentation below.

In source-only releases of RNAstructure, the Python scripts (*.py) can be found in the python_interface directory. They are copied into the RNAstructure/exe directory when the binary library (_RNAstructure_wrap.so) is compiled as described below.

In binary releases of RNAstructure, all components can be found in the RNAstructure/exe directory. However, due to many system-dependant factors (i.e. Python version and flavor, processor architecture etc) the pre-compiled binary library (_RNAstructure_wrap.so) may not be compatible with your specifc Python environment, in which case it must be compiled as described below.

Compiling the RNAstructure Python Interface

To compile the RNAstructure Python Interface binary library (_RNAstructure_wrap.so), both Python and the necessary Python development tools (header files, etc) must be installed on your system. (See the Troubleshooting section for details.)

Open a terminal an run the following command from with the root RNAstructure directory:
# Enter this shell command from the RNAstructure directory
make python


Compiled Python native (C++) extensions are, in general, only compatible with a single version of Python. Because of this, the default behavior of the `make python` command is to build two libraries -- one for Python 2.x and another for Python 3.x.
If you only want to build one of these libraries, or if you wish to target an alternative version of Python, you can do so by specifying the PYTHON environment variable, as shown below.
This is also a useful strategy if python is not available in your PATH. For example:
# Examples for using a different version of Python or specifying a path
make python PYTHON=python3              # Use Python 3.x
make python PYTHON=/usr/bin/python2.7 # Specify full path


The library and all python files required by the extension will be copied to the RNAstructure/exe directory.
If errors are encountered when building the interface, please see the Troubleshooting section below.

Using the RNAstructure Python Interface

In order to use the RNAstructure Python Interface from a Python script or the Python interpreter, all components listed above must be located in a directory where Python will look for modules/extensions. This can be done by either (A) Setting the PYTHONPATH environment variable to include the directory where the extension files are located or (B) Moving the required files into a directory listed in Python's sys.path. The extension files are typically located in RNAstructure/exe, so the following should work in most cases:

# Set PYTHONPATH to include the location of the RNAstructure module
# path/to/RNAstructure/exe can be a relative or absolute path.
# If running python from within the RNAstructure root directory, just use "exe"
export PYTHONPATH=path/to/RNAstructure/exe
# or export PYTHONPATH+=:path/to/RNAstructure/exe # preserves existing PYTHONPATH


The following commands can be entered into a Python interpreter or written to a python script file to demonstrate the RNAstructure module:
#### Python Code ####

# First import the RNAstructure module
import RNAstructure

# Access the in-line documentation
help(RNAstructure)

# Each class and method has in-line documentation.
help(RNAstructure.RNA)
help(RNAstructure.RNA.FoldSingleStrand())
help(RNAstructure.ProbScan.probability_of_multibranch_loop) # Each class can be instantiated with its fromString and fromFile methods p = RNAstructure.RNA.fromString("GGGGAAACCCC") q = RNAstructure.RNA.fromFile("../tests/time/ivslsu.seq") m = RNAstructure.Dynalign_object.fromString("CGCGCGUUCGGCGCGC","GCGCGCUUCAGCGCGC") # The classes have structure calculation methods identical to those in the C++ classes p.FoldSingleStrand()
p.PartitionFunction()
m.Dynalign() # The multiple-sequence classes such as dynalign have multiple RNA # objects that can be accessed.
rna2 = m.GetRNA2() # returns an RNA # They have IO methods to write structures to disc
p.WriteCt("foo.ct")
m.GetRNA2.WriteCt("bar.ct") # RNA and ProbScan objects are iterable. To iterate over the sequence:
for nuc in p: print nuc # Or over the nucleotide indices: for i in p.iterIndices(): print GetNucleotide(i) #identical behavior to previous line. # It's also possible to get the pairing information and to manipulate it within python pairs = [(i,p.GetPair(i)) for i in p.iterIndices()] # Or pair probabilities p.GetPairProbability(1,10) # returns 0.437

Troubleshooting

Notes for Compiling the Interface

  1. Your system should be configured to allow python to be invoked directly as `python`.
    If that is not the case, please do ONE of the following:
    1. Set your PATH to include the location of the python executable.
    2. Create a symlink (in ~/bin or /usr/bin etc) so `python` points to the desired executable.
    3. Define the PYTHON environment variable to point to the desired executable.
      export PYTHON=python3          # Example using Python 3.x
      export PYTHON=/usr/bin/python  # Example using full path
    4. Specify PYTHON directly on the Make command-line:
      make python  PYTHON=/usr/bin/python3
  2. To build the native python extension, you must also have the python development
    headers and libraries installed. If you get an error about a missing Python.h file, it is likely you'll have to install the development tools using the appropriate command below:
    • For apt (Ubuntu, Debian...):
      sudo apt-get install python-dev   # for python2.x installs
      sudo apt-get install python3-dev  # for python3.x installs
    • For yum (CentOS, RHEL...):
      sudo yum install python-devel
    • For dnf (Fedora...):
      sudo dnf install python2-devel  # for python2.x installs
      sudo dnf install python3-devel # for python3.x installs
    • For zypper (openSUSE...):
      sudo zypper in python-devel   # for python2.x installs
      sudo zypper in python3-devel # for python3.x installs
    • For Cygwin:
      # If apt-cyg is installed
      apt-cyg install python2-devel   # for python2.x installs
      apt-cyg install python3-devel   # for python3.x installs
      # Or use the Cygwin installer to select and install the python2-devel 
      # or python3-devel package.
  3. There are two distinct ways to compile the RNAstructure Python Interface. The first is recommended, but if it fails on your system, try the second method.
    1. RNAstructure Build (Recommended): This uses the normal RNAstructure Makefile build system to
      compile the object files and final shared extension library. It should work well in most cases.
      make python # compile using RNAstructure build system
      # or 
      make python PYTHON=python3 # to specify name or path of python
    2. Python distutils Build: This uses Python's distutils package to automatically configure and build
      the the object files and final shared extension library. It uses Python's common extension
      paradigm (setup.py).
      make python_dist  # compile using distutils build system (setup.py)
      # or 
      make python_dist PYTHON=python3 # to specify name or path of python

Notes for Compiling the Interface on Windows

  1. If you choose to use the distutils method (see #3 above), python might use the wrong compiler name. This results in an error: /bin/bash: gcc: command not found
    Workaround:
    Create a symlink named "gcc" in ~/bin or /usr/bin etc that points to /usr/bin/x86_64-w64-mingw32-g++ (or whatever your compiler is named -- this is for MinGW-w64).
    You may also need to do this for "g++".
  2. If compiling from the native Windows python executable (as opposed to cygwin's python),
    you may need to choose which compiler to use.
    Add the following lines to RNAstructure/setup.cfg or /usr/lib/pythonXX/distutils/distutils.cfg (neither of which may exist prior to you creating them).
    # Python config settings (add to setup.cfg or distutils.cfg)
    [build_ext]
    compiler=mingw32
    
    [build]
    compiler=mingw32

    (Also see WindowsCompilers on wiki.python.org)
  3. Compiler error:
    pyport.h:351:24: fatal error: sys/select.h: No such file or directory
    On Windows, you may need to edit pyconfig.h in the python include directory (for example /usr/include/python2.7/pyconfig.h) You can find this directory by running
    make python-debug

    from the root RNAstructure directory. Look for the PY_INCLUDE_DIR definition in the output.
    Once you have located pyconfig.h, open it in a text editor and search for the following variables and undefine them: HAVE_SYS_SELECT_H and  HAVE_SYS_TERMIO_H
    i.e.: put these statements below the corresponding #define's
    #define HAVE_SYS_SELECT_H 1  // already in the file
    #undef HAVE_SYS_SELECT_H     // Add this line.
    
    // ... a few lines later ...
    
    #define HAVE_SYS_TERMIO_H    // already in the file
    #undef HAVE_SYS_TERMIO_H     // Add this line.

  4. There is a bug in cygwinccompiler.py (in python's distutils directory).
    It gives the following error:
      File "/usr/lib/python2.7/distutils/cygwinccompiler.py", line 189, in link
    	libraries.extend(self.dll_libraries)
      TypeError: 'NoneType' object is not iterable

    Workaround: find and edit cygwinccompiler.py (use the path as shown in the error message)
    Go to the line number mentioned in the error. The code should be similar to
    # Additional libraries
    libraries.extend(self.dll_libraries)

    Change it to this:
    # Additional libraries
    if self.dll_libraries: libraries.extend(self.dll_libraries)

  5. Import fails at runtime: "ImportError: No such file or directory"
    See #3 in the next section.

Notes on Running the Python Interface

  1. Python scripts that which to use the RNAstructure extension should have the following import statement:
    import RNAstructure
  2. The environment variable PYTHONPATH must include the directory that contains all of the following files.
    These files are normally placed in the RNAstructure/exe directory, so PYTHONPATH should point there, unless
    you have installed the files in another location.
    • RNAstructure.py
    • Error_handling.py
    • RNAstructure_wrap.py
    • _RNAstructure_wrap.so (aka _RNAstructure_wrap.dll on Windows)
  3. On Windows: the python distutils build system creates an interface binary that is dynamically linked with system libraries.
    If the system libraries are not available on the PATH, you will get an import error, similar to this:
      _RNAstructure_wrap = swig_import_helper()
      File "/home/rna/rna/RNAstructure/exe/RNAstructure_wrap.py", line 16, in swig_import_helper
    	return importlib.import_module('_RNAstructure_wrap')
      File "/usr/lib/python2.7/importlib/__init__.py", line 37, in import_module
    	__import__(name)
      ImportError: No such file or directory

    Workaround: Add system libraries to your path.
    PATH+=:'/usr/x86_64-w64-mingw32/sys-root/mingw/bin' 

    The above path pay not be appropriate for your system. You can list the
    required dynamically linked files with the following shell command:
    ldd exe/_RNAstructure_wrap.dll

    In the output, look for these files (if present) and make sure their parent directory is in your PATH:
    • libgcc_s_seh-1.dll
    • libwinpthread-1.dll
    • libstdc++-6.dll
    The following files may also be present, but can be ignored:
      ntdll.dll  kernel32.dll  KERNELBASE.dll  msvcrt.dll
      USER32.dll  GDI32.dll  LPK.dll  USP10.dll  libpython2.7.dll