Exploring end-to-end sequence to sequence ensemble model for predicting RNA secondary structure

Ammartayakun, Aukkawut

Student Work

Exploring end-to-end sequence to sequence ensemble model for predicting RNA secondary structure

Público Deposited

The problem of predicting the secondary structure of RNA has been long studied to help understand the logic and use that for many applications like designing the primer to detect specific diseases. The challenge of RNA secondary structure prediction is its search space complexity. This work will explore how to use neural networks to approximate the pairing distribution or generate the sequence from the sequence in encoder-decoder format. The former approach uses an end-to-end pre-trained Graph Convolutional Network (GCN) and convolutional neural network (CNN). This work also uses the statistical, rule-based context-free model called CONTRAfold to improve the GCN model by providing the attention-like pairing distribution as an edge feature for the GCN. The models are trained with the bpRNA dataset and evaluated on the bpRNA and bpRNA-new datasets. The GCN end-to-end model results show that graph neural networks can learn with distribution distinct to the CONTRAfold. Moreover, its performance is comparable with the State-Of-The-Art models.

This report represents the work of one or more WPI undergraduate students submitted to the faculty as evidence of completion of a degree requirement. WPI routinely publishes these reports on its website without editorial or peer review.

Creator