RDP-Tools

RDP-Tools is a Python package for collapsing and inflating (de-collapsing) Ribosome Profiling data files obtained from RiboSeq.Org’s Data Portal and RDP (RiboSeq Data Portal).

It also facilitates querying the RiboSeq Data Portal database to programmatically identify samples of interest and subsequently facilitate file download.

Features

File Collapsing:

  • FASTQ to Collapsed FASTA

File Inflating:

  • FASTA to Inflated FASTA

  • FASTA to Inflated FASTQ

  • BAM to Inflated BAM

Programmatic Database Interaction

  • Query Metadata

  • Fetch Download Links

  • Return Database Fields

Installation

Install RDP-Tools by running:

pip install RiboSeq-DP-Tools

or:

docker pull quay.io/jackcurragh/rdp-tools

Collapse

RDP-Tools collapse functionality takes a FASTQ file of Ribo-Seq reads and returns a file in FASTA format that is significantly smaller without losing any sequence information.

Each identical Ribo-Seq read is only stored once and the number of occurrences of each read is stored in the FASTA header. Due to the nature of the short Ribo-Seq reads and their tendency to originate from a relatively low diversity set of genomic regions (CDSs) this collapsing significantly reduces file sizes and processing load. It is advised that collapsing is carried out post-QC (eg. with FastQC) as quality information is dropped from the outputted files.

Usage

To collapse a FASTQ file, run:

RDP-Tools collapse input.fastq

Inflate

Although collapsing Ribo-Seq reads lightens the computational load of data processing it can not be done at the expense of the usefulness of the resulting files. In order to ensure the files on RiboSeq.Org are usable in downstream applications, RDP-Tools also provides ‘inflation’ functionality. Essentially, this returns a file from its collapsed state, where each unique read gets one entry, back to where each entry in the file represents one sequenced RPF.

Usage

To inflate a FASTA file, run:

RDP-Tools inflate input.fasta

To inflate a FASTA to a FASTQ file (perfect sequencing quality is assigned), run:

RDP-Tools inflate input.fasta -o output.fastq/fq

To inflate a BAM file, run:

RDP-Tools inflate input.bam

Database

RDP-Tools also lets the user query the RDPs database of existing Ribo-Seq data. This facilitates the programmatic finding and fetching of Ribo-Seq read and alignment files.

Usage

To get the all of the fields related to Ribo-Seq samples;

RDP-Tools database sample-fields 

To get the metadata for all Ribo-Seq samples whose tissue is labeled as Lung, do:

RDP-Tools database samples --query TISSUE=Lung 

To specify the fields returned for the samples that match a query, do:

RDP-Tools database samples --query TISSUE=Lung --fields TISSUE,CELL_LINE

To increase or decrease the number of entries returned (default: 100), do:

RDP-Tools database samples --query TISSUE=Lung --fields TISSUE,CELL_LINE --limit 1000000

Contribute

Support

If you are having issues, please let us know. Our email is riboseqXYZ@gmail.com (remove the XYZ).

License

The project is licensed under the MIT license.