Because some reads mapped to multiple positions in the genome and

Because some reads mapped to multiple positions in the genome and thus inappropriately lower the deduced copy number in regions Vandetanib molecular weight with low sequence complexity, we removed all the 1 kb windows with RPKM lower than 2 (RPKM value of one copy =2.29) prior to change point analysis. Breakpoints with posterior possibility >0.95 were used. Copy number was assigned to segments based on the fold between average segments RPKM value between breakpoints (2.29��1.15 RPKM =1 copy, 4.58��1.15 RPKM =2 copy, etc.). Genes spanning two segments were not used in gene expression analysis. For RNA-Seq data, we counted the number of unique mapped reads within all unique exons of Drosophila Flybase [45] Release 5.12 annotation (Oct. 2008) and calculated the total number of reads of all unique exons per kb of total length of unique exons per million mapped reads (RPKM) for each annotated gene.

The RPKM calculation was done for individual RNA-Seq libraries separately, and then RPKM values were averaged for biological replicates (r2=0.98 between replicates). Non-expressed genes are not useful for ratiometric analysis and these were therefore excluded. We used RPKM values for intergenic regions to determine expression thresholds. For intergenic regions, the RPKM values were calculated for total number of reads between adjacent gene model pairs. Only 5% of intergenic regions in S2 cells have a RPKM value greater than or equal to 4. Therefore, we called genes with RPKM values no less than 4 in S2 cells as expressed with an estimated type I error rate of 5%.

All microarray data (except CGH) and statistical tests were processed and analyzed in R/Bioconductor [46]. For the ChIP-chip experiments, we used quantile normalization based on the input channel. The distributions of raw and normalized intensities were checked to make sure that normalization was appropriate (i.e., that the skew was maintained). We used the average ChIP/input ratio from biological replicates (r2=0.40�C0.54 between replicates). The ChIP/input ratios in RNAi and mock treated cells were used for K-means clustering analysis with 3 nodes using Euclidean similarity metric and genes on X chromosome and autosomes were clustered separately using Cluster3.0 and then visualized using Tree-View [47]. For expression profiling, we normalized using loess within each 12-plex and quantile between 12-plexes.

Average probeset log2 intensities were calculated in both channels for each gene. Correlations between array intensities and RPKM values were estimated by Spearman’s rank correlation coefficient. The comparisons for the distributions of DNA densities or expression values among different chromosomes and different copy numbers were performed using two sample Kolmogorov-Smirnov tests (KS tests). Normalization is inherently problematic when a Cilengitide large fraction of the genome changes expression, as in the RNAi experiments.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>