Because transcript abundance varies greatly, from zero to tens of

Because transcript abundance varies greatly, from zero to tens of thousands of copies, T to C coverage of crosslinking sites on highly expressed genes would be preferentially higher truly than T to C coverage of crosslink ing sites of lowly expressed genes. To avoid this bias, we normalized T to C RPMs to length normalized tran script abundances from our mRNA seq libraries. Two percent of Ts with T to C RPM coverage in gPAR CLIP libraries were located on genes that lacked mRNA seq coverage and were thus removed from further analysis. To adjust for the additional kilo base normalization factor used in RPKM, ratios of gPAR CLIP RPM mRNA seq RPKM were multiplied by a factor of 1,000. Calculation of RBP crosslinking sites Generation of read clusters from gPAR CLIP libraries All six gPAR CLIP libraries were aggregated into one large dataset to generate read clusters.

A read cluster was defined as a continuous stretch of nucleotides cov ered by at least one gPAR CLIP read Inhibitors,Modulators,Libraries harboring one or two T to C conversion events. This step resulted in 84,136 gPAR CLIP clusters and 1,915 Puf3p PAR CLIP clusters. Defining crosslinking site boundaries Manual inspection of read clusters revealed long Inhibitors,Modulators,Libraries regions covered by gPAR CLIP reads contain ing one or more distinct peaks indicative of distinct cross linking sites. To distinguish between read peaks within long read clusters and trim low read coverage surrounding strong single peaks, we fit a Gaussian smoothed curve to each read clus ter and used the inflection points of this curve to define the boundaries of individual crosslinking sites.

This step resulted in 91,290 gPAR CLIP crosslinking sites and 1,915 Puf3p PAR CLIP crosslinking sites. Calculating read coverage of crosslinking sites From the set of RBP crosslinking sites derived from all gPAR CLIP libraries, we determined read coverage for each site from Inhibitors,Modulators,Libraries each individual library by calculating the average RPM covering each nucleotide in the crosslink ing site. This coverage was divided by the RPKM of the associated gene and multiplied by 1,000 to enable direct comparison of RBP occupancy of crosslinking sites between growth conditions. Assigning FDR to each crosslinking site A small fraction Inhibitors,Modulators,Libraries of T to C mismatches in gPAR CLIP reads likely represent sequencing error instead of cross linking events, so crosslinking sites derived from this error were removed.

We repeated the crosslinking site generation steps using Inhibitors,Modulators,Libraries mRNA seq reads with one or two T to C mismatches, which represent the rate of T to C sequencing error for the Illumina HiSeq plat form. For each gPAR CLIP and inhibitor supplier mRNA seq crosslinking site, we calculated the T to C conversion rate as the num ber of reads with T to C conversion events divided by the number of total reads covering Ts. gPAR CLIP and mRNA seq crosslinking sites were binned into groups based on total read coverage.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>