RNAstructure tools can be accessed from the Python scripting language by accessing the RNAstructure Python Interface, which is a Python extension module.
Most RNAstructure classes have Python bindings, including:
- RNA
- ProbScan
- Dynalign_object
- Multilign_object
- Oligowalk_object
The file components of the extension are listed below:
• RNAstructure.py |
The main RNAstructure Python class. |
• Error_handling.py |
An internal utility that turns RNAstructure numeric error codes into Python exceptions. |
• RNAstructure_wrap.py |
The Python-code interface between RNAstructure.py and the native (C++) binary library.
(This is generated by SWIG.) |
• _RNAstructure_wrap.so
or _RNAstructure_wrap.dll |
Native (C++) binary library that adds Python bindings to the RNAstructure class library.
The name of this file depends on the operating-system and the version of python.
In python 3, C++ extensions have long platform-specific suffixes, like _RNAstructure_wrap.cpython-36m-x86_64-linux-gnu.so
For simplicity it is always called _RNAstructure_wrap.so in the documentation below.
|
In source-only releases of RNAstructure, the Python scripts (*.py) can be found in the python_interface directory. They are copied into the RNAstructure/exe directory when the binary library (_RNAstructure_wrap.so) is compiled as described below.
In binary releases of RNAstructure, all components can be found in the RNAstructure/exe directory. However, due to many system-dependant factors (i.e. Python version and flavor, processor architecture etc) the pre-compiled binary library (_RNAstructure_wrap.so) may not be compatible with your specifc Python environment, in which case it must be compiled as described below.
Compiling the RNAstructure Python Interface
To compile the RNAstructure Python Interface binary library (
_RNAstructure_wrap.so), both Python and the necessary Python development tools (header files, etc) must be installed on your system. (See the
Troubleshooting section for details.)
Open a terminal an run the following command from with the root RNAstructure directory:
make python
Compiled Python native (C++) extensions are, in general, only compatible with a single version of Python. Because of this, the default behavior of the `make python` command is to build
two libraries -- one for Python 2.x and another for Python 3.x.
If you only want to build one of these libraries, or if you wish to target an alternative version of Python, you can do so by specifying the
PYTHON environment variable, as shown below.
This is also a useful strategy if python is not available in your PATH. For example:
make python PYTHON=python3
make python PYTHON=/usr/bin/python2.7
The library and all python files required by the extension will be copied to the
RNAstructure/exe directory.
If errors are encountered when building the interface, please see the
Troubleshooting section below.
Using the RNAstructure Python Interface
In order to use the RNAstructure Python Interface from a Python script or the Python interpreter, all components listed above must be located in a directory where Python will look for modules/extensions. This can be done by either (A) Setting the
PYTHONPATH environment variable to include the directory where the extension files are located or (B) Moving the required files into a directory listed in Python's
sys.path. The extension files are typically located in
RNAstructure/exe, so the following should work in most cases:
path/to/RNAstructure/exeexe
export PYTHONPATH=path/to/RNAstructure/exe
export PYTHONPATH+=:path/to/RNAstructure/exe
The following commands can be entered into a
Python interpreter or written to a python script file to demonstrate the RNAstructure module:
import RNAstructure
help(RNAstructure)
help(RNAstructure.RNA)
help(RNAstructure.RNA.FoldSingleStrand())
help(RNAstructure.ProbScan.probability_of_multibranch_loop)
p = RNAstructure.RNA.fromString("GGGGAAACCCC")
q = RNAstructure.RNA.fromFile("../tests/time/ivslsu.seq")
m = RNAstructure.Dynalign_object.fromString("CGCGCGUUCGGCGCGC","GCGCGCUUCAGCGCGC")
p.FoldSingleStrand()
p.PartitionFunction()
m.Dynalign()
rna2 = m.GetRNA2()
p.WriteCt("foo.ct")
m.GetRNA2.WriteCt("bar.ct")
for nuc in p: print nuc
for i in p.iterIndices(): print GetNucleotide(i)
pairs = [(i,p.GetPair(i)) for i in p.iterIndices()]
p.GetPairProbability(1,10)
Troubleshooting
Notes for Compiling the Interface
- Your system should be configured to allow python to be invoked directly as `python`.
If that is not the case, please do ONE of the following:
- Set your PATH to include the location of the python executable.
- Create a symlink (in ~/bin or /usr/bin etc) so `python` points to the desired executable.
- Define the PYTHON environment variable to point to the desired executable.
export PYTHON=python3
export PYTHON=/usr/bin/python
-
Specify PYTHON directly on the Make command-line:
make python PYTHON=/usr/bin/python3
- To build the native python extension, you must also have the python development
headers and libraries installed. If you get an error about a missing Python.h
file, it is likely you'll have to install the development tools using the appropriate command below:
-
For apt (Ubuntu, Debian...):
sudo apt-get install python-dev
sudo apt-get install python3-dev
- For yum (CentOS, RHEL...):
sudo yum install python-devel
- For dnf (Fedora...):
sudo dnf install python2-devel
sudo dnf install python3-devel
- For zypper (openSUSE...):
sudo zypper in python-devel
sudo zypper in python3-devel
- For Cygwin:
apt-cyg install python2-devel
apt-cyg install python3-devel
- There are two distinct ways to compile the RNAstructure Python Interface. The first is recommended, but if it fails on your system, try the second method.
- RNAstructure Build (Recommended): This uses the normal RNAstructure Makefile build system to
compile the object files and final shared extension library. It should work well in most cases.
make python
make python PYTHON=python3
- Python distutils Build: This uses Python's distutils package to automatically configure and build
the the object files and final shared extension library. It uses Python's common extension
paradigm (setup.py).
make python_dist
make python_dist PYTHON=python3
Notes for Compiling the Interface on Windows
- If you choose to use the distutils method (see #3 above), python might use the wrong compiler name. This results in an error: /bin/bash: gcc: command not found
Workaround:
Create a symlink named "gcc" in ~/bin or /usr/bin etc that points to /usr/bin/x86_64-w64-mingw32-g++ (or whatever your compiler is named -- this is for MinGW-w64).
You may also need to do this for "g++".
- If compiling from the native Windows python executable (as opposed to cygwin's python),
you may need to choose which compiler to use.
Add the following lines to RNAstructure/setup.cfg or /usr/lib/pythonXX/distutils/distutils.cfg (neither of which may exist prior to you creating them).
[build_ext]
compiler=mingw32
[build]
compiler=mingw32
(Also see WindowsCompilers on wiki.python.org)
- Compiler error:
pyport.h:351:24: fatal error: sys/select.h: No such file or directory
On Windows, you may need to edit pyconfig.h in the python include directory (for example /usr/include/python2.7/pyconfig.h) You can find this directory by running
make python-debug
from the root RNAstructure directory. Look for the PY_INCLUDE_DIR definition in the output.
Once you have located pyconfig.h, open it in a text editor and search for the following variables and undefine them: HAVE_SYS_SELECT_H and HAVE_SYS_TERMIO_H
i.e.: put these statements below the corresponding #define's
#undef HAVE_SYS_SELECT_H
#undef HAVE_SYS_TERMIO_H
- There is a bug in cygwinccompiler.py (in python's distutils directory).
It gives the following error:
File "/usr/lib/python2.7/distutils/cygwinccompiler.py", line 189, in link
libraries.extend(self.dll_libraries)
TypeError: 'NoneType' object is not iterable
Workaround: find and edit cygwinccompiler.py (use the path as shown in the error message)
Go to the line number mentioned in the error. The code should be similar to
libraries.extend(self.dll_libraries)
Change it to this:
if self.dll_libraries: libraries.extend(self.dll_libraries)
- Import fails at runtime: "ImportError: No such file or directory"
See #3 in the next section.
Notes on Running the Python Interface
- Python scripts that which to use the RNAstructure extension should have the following import statement:
import RNAstructure
- The environment variable PYTHONPATH must include the directory that contains all of the following files.
These files are normally placed in the RNAstructure/exe directory, so PYTHONPATH should point there, unless
you have installed the files in another location.
- RNAstructure.py
- Error_handling.py
- RNAstructure_wrap.py
- _RNAstructure_wrap.so (aka _RNAstructure_wrap.dll on Windows)
- On Windows: the python distutils build system creates an interface binary that is dynamically linked with system libraries.
If the system libraries are not available on the PATH, you will get an import error, similar to this:
_RNAstructure_wrap = swig_import_helper()
File "/home/rna/rna/RNAstructure/exe/RNAstructure_wrap.py", line 16, in swig_import_helper
return importlib.import_module('_RNAstructure_wrap')
File "/usr/lib/python2.7/importlib/__init__.py", line 37, in import_module
__import__(name)
ImportError: No such file or directory
Workaround: Add system libraries to your path.
PATH+=:'/usr/x86_64-w64-mingw32/sys-root/mingw/bin'
The above path pay not be appropriate for your system. You can list the
required dynamically linked files with the following shell command:
ldd exe/_RNAstructure_wrap.dll
In the output, look for these files (if present) and make sure their parent directory is in your PATH:
- libgcc_s_seh-1.dll
- libwinpthread-1.dll
- libstdc++-6.dll
The following files may also be present, but can be ignored:
ntdll.dll kernel32.dll KERNELBASE.dll msvcrt.dll
USER32.dll GDI32.dll LPK.dll USP10.dll libpython2.7.dll