Mar 31, 2009
My newly published papers in 2009
2. REMAS: a new regression model to identify alternative splicing events from exon array data. Zheng H, Hang X, Zhu J, Qian M, Qu W, Zhang C, Deng M. BMC Bioinformatics. 2009 Jan 30;10 Suppl 1:S18. PMID: 19208117
3. Exon array data analysis and new algorithms for alternative splicing identification. HANG Xing-Yi, DENG Ming-Hua, SUN Zhi-Xian, ZHANG Cheng-Gang. Bulletin of the academy of military medical science. (Chinese) Accepted.
The second paper is based on the cooperation with Peking University and I am one of the co-first authors. I have given an oral presentation of the same topic in APBC (Asia Pacific Bioinformatics Conference) 2009. It is notable that this paper is one of the two papers of Best Paper Awards in APBC2009.
Finally, pray for my ongoing project of stroke disease. Good luck!
Mar 28, 2009
Differentiation in bioinformatics
Read more papers of Plos CB and MSB. Comparing with Bioinformatics and NAR, I find Plos CB and MSB prefer to inspire the new application to functional biology. New computational biology and systems biology blaze the promising way in functinonal biology.However, bioinformatics will never fade aways but play more roles as technological skills.
Mar 26, 2009
Mar 24, 2009
Nature Review Genetics research highlights
ALTERNATIVE SPLICING: Deciding between the alternatives
"altered splicing patterns might be more important than expression changes in determining complex human traits."
ORIGINAL RESEARCH PAPERS
Yu, Y. et al. Dynamic regulation of alternative splicing by silencers that modulate 5′ splice site competition. Cell 135, 1224–1236 (2008)
Heinzen E. L. et al. Tissue-specific genetic control of splicing: implications for the study of complex traits. PLoS Biol. 6, e1000001 (2008).
GenBlastA: enabling BLAST to identify homologous gene sequences.
- Genome Res. 2009 Jan;19(1):143-9. Epub 2008 Oct 6.
-
GenBlastA: enabling BLAST to identify homologous gene sequences.
School of Computing Science, Simon Fraser University, Burnaby, British Columbia, V5A 1S6 Canada.
BLAST is an extensively used local similarity search tool for identifying homologous sequences. When a gene sequence (either protein sequence or nucleotide sequence) is used as a query to search for homologous sequences in a genome, the search results, represented as a list of high-scoring pairs (HSPs), are fragments of candidate genes rather than full-length candidate genes. Relevant HSPs ("signals"), which represent candidate genes in the target genome sequences, are buried within a report that contains also hundreds to thousands of random HSPs ("noises"). Consequently, BLAST results are often overwhelming and confusing even to experienced users. For effective use of BLAST, a program is needed for extracting relevant HSPs that represent candidate homologous genes from the entire HSP report. To achieve this goal, we have designed a graph-based algorithm, genBlastA, which automatically filters HSPs into well-defined groups, each representing a candidate gene in the target genome. The novelty of genBlastA is an edge length metric that reflects a set of biologically motivated requirements so that each shortest path corresponds to an HSP group representing a homologous gene. We have demonstrated that this novel algorithm is both efficient and accurate for identifying homologous sequences, and that it outperforms existing approaches with similar functionalities.
PMID: 18838612
- Url: http://genome.sfu.ca/projects/genBlastA/
Mar 23, 2009
Mar 18, 2009
A potential and new drug for stroke, another one.
The FDA’s Cardiovascular and Renal Drugs Advisory Committee voted in favour of approving prasugrel, an oral anticoagulant developed by Daiichi Sankyo and Eli Lilly.
The advisory committee, however, agreed that the benefit of preventing myocardial infarction, cardiovascular death and stroke outweighs the risk of major bleeding, such as haemorrhages, and voted 9–0 in favour of prasugrel approval.
来自团长和天天的启发
昨天天天也说了同样的话,当你的理想刚起步的时候是经不起折腾的,只有“哄”着你的手下,用你的真心去感化和团结他们。XY也说:“我要和员工一起出去玩的,不能脱离他们”!是这样的。
我今天离开实验室看见师弟、师妹们忙碌的身影,我突然有所感想:无论是谁,当你接近或实现理想的时候,千万不要持有苛求的态度,要有一颗宽容博爱的心,要力求公平。只有强者的社会必定是不稳定的。
Thanks gifts from Harvard B.
By the way, I have shared your "caillers" with my lab colleagues, very tasty.
Mar 16, 2009
NGS Alignment Programs and comments from Sanger experts
Software
Currently, this page only includes software I am familiar with. Most of them aim for aligning next-generation sequencing (NGS) data and were developed since 2007. I may extend the list when I have time. Several notes:
- The programs are listed in the alphabet order in each category.
- Features shown in brackets are optional and may affect efficiency.
- The version number shown for each program is the one I have checked, but may not be the latest.
Indexing Reads with Hash Tables
- Cross_match [1.080730]. The latest cross_match has been substantially improved for short read alignment. Its speed is comparable to other aligners and might be the best choice for local alignment.
- Platform: Illumina; 454
- Features: gapped alignment (maximum 2 gaps in the fast mode); local alignment
- Availability: academic free source codes
- Eland [1.0]. Probably the first short read aligner. Eland substantially influences many aligners in this category and still outperforms many followers. Although it is not the fastest any more, it is close to the fastest and has the smallest memory footprint. Eland itself works for 32bp single-end reads only. Additional Perl scripts in GAPipeline extend its ability.
- Platform: Illumina
- Features: PET mapping; mapping quality; SNP caller; counting suboptimal occurrences.
- Advantages: fast; light-weighted
- Availability: free source codes for machine buyers.
- MAQ [0.7.1, PMID: 18714091]. This is my program to align short reads and to call variants. It has been used in several high-profile papers.
- Platform: Illumina; SOLiD (partial)
- Features: PET mapping; quality aware; gapped alignment for PET; mapping quality; adapter trimming; partial occurrences counting; SNP caller
- Advantages: feature rich; publication proved
- Limitation: up to 128bp reads; no gapped alignment for single-end reads
- Availability: GPL
- RMAP [0.41, PMID: 18307793]. One of the earliest short read aligners.
- Platform: Illumina
- Features: quality aware; [gapped alignment]; best unique hits
- Availability: GPL
- SeqMap [1.0.8, PMID: 18697769]. An Eland-like program.
- Platform: Illumina
- Features: [gapped alignment]
- Limitation: not counting suboptimal hits
- Availability: GPL
- SHRiMP [1.10]. Q-gram based algorithm.
- Platform: SOLiD; Illumina; 454
- Features: SOLiD mapping; gapped alignment; potential support for mapping quality
- Limitations: a little slow
- Availability: GPL
- ZOOM [1.2.5, PMID: 18684737]. Eland-like algorithm with the improvement of using spaced seed. ZOOM supports longer reads and faster than Eland, although it uses more memory. ZOOM is feature rich, but some features may come at the cost of speed.
- Platform: Illumina; SOLiD
- Features: PET mapping; SOLiD mapping; [gapped alignment]; [mapping quality]; [quality aware]
- Advantage: fast; feature rich
- Limitation: up to 224bp reads; gapped alignment comes with cost
- Availability: commercial
Indexing Genome with Hash Tables
- BFAST [0.3.1] (alternative link).
- Platform: Illumina; SOLiD
- Availability: academic free
- MOM [0.1, PMID: 19228804].
- Platform: Illumina; (?)
- Features: counting suboptimal occurrences; local alignment
- Availability: free
- Mosaik [0.9.891]. Although I have not tried personally, Mosaik has been used in several high-profile publications and delivers good performance.
- Platform: Illumina; 454
- Advantages: long reads
- Availability: academic free binary
- NovoAlign [2.0]. NovoAlign competes with MAQ on speed and feature set, and may be more accurate than MAQ. It also implements several features missing in MAQ.
- Platform: Illumina
- Features: PET mapping; gapped alignment; mapping quality; quality aware; adapter trimming; MAQ format
- Advantages: highly accurate; gapped alignment; feature rich
- Requirements: >8GB RAM for paired-end mapping against the human genome.
- Availability: proprietary; academic free binary (no multi-threading support)
- PASS [0.5, PMID: 19218350].
- Platform: Illumina; SOLiD; 454
- Features: PET mapping
- Advantages: long reads
- Requirement: >15GB RAM against human genome
- Availability: free source codes to academic users
- RazerS [20081029]. Based on the SeqAn library.
- Availability: free source codes
- SOAPv1 [1.11, PMID: 18227114]. The first published short read aligner.
- Platform: Illumina
- Features: PET mapping; adapter trimming; gapped alignment; SNP caller; counting occurrences
- Advantages: feature rich
- Requirements: >14GB RAM against human genome
- Availability: GPL
Merge Sorting
- Slider [0.6, PMID: 18974170]. A very clever short read aligner specifically designed for Illumina reads. It is able to use the second best base call, which potentially improves the accuracy on SNP finding.
- Platform: Illumina
- Features: Using second base
- Advantages: fast; potentially more accurate on SNP discovery
- Requirements: >160GB disk space
- Availability: free source codes
Indexing Genome with Suffix Array/BWT
- Bowtie [0.9.9]. This is probably the fastest short read aligner to date. Although under the default option Bowtie does not guarantee to find the best hit or tell if the hit it finds is unique, it is possible to improve this behaviour at the cost of speed.
- Platform: Illumina
- Features: partial PET mapping; quality aware; [mapping quality]
- Advantages: very fast
- Availability: GPL
- BWA [0.4.5]. Another aligner written by me. Given high-quality reads, it is an order of magnitude faster than MAQ while achieving similar alignment accuracy.
- Platform: Illumina; SOLiD (in development)
- Features: PET mapping; gapped alignment; mapping quality; counting suboptimal occurrences; SAM output
- Advantages: fast
- Limitations: slow for long reads and reads with high error rate
- Availability: GPL
- SOAPv2 [2.0.1]. A marvelous program developed by the group who wrote BWT-SW. The current version of SOAPv2 performs best for reads containing no more than 2 mismatches.
- Platform: Illumina
- Features: PET mapping; mapping quality; counting occurrences
- Advantages: fast
- Availability: academic free binary
- vmatch [SpringerLink].
- Availability: academic free binary
Recommendation
First of all, as I am the key developer of two short read aligners (BWA and MAQ), it is really hard for me to give an unbiased evaluation. Please bear this fact in mind when reading through my comments below.
For Illumina reads, I would recommend my program BWA. BWA implements most of the major features of a practical aligner. It is relatively small in memory and highly efficient with little tradeoff on accuracy. BWA outputs alignment in the SAM format. Users may use SAMtools to sort/merge alignments and to make variants calls. One potential concern about BWA is it has not been widely used at the moment. It may be less robust than those publication-proved aligners such as Eland and MAQ.
Mapping inconsistent read pairs with NovoAlign is recommended for PET-based structural varition detection where alignment accuracy is the leading factor on reducing false positive calls. NovoAlign is the most accurate aligner to date.
Affymetrix and Genisphere Launch Best-in-Class Solution for microRNA Expression Research
http://www.affymetrix.com/products_services/arrays/specific/mi_rna.affx
Mar 14, 2009
Con for my paper!
BMC Genomics
Mar 13, 2009
Mar 12, 2009
I think the editors fo Bioinformatics are angry.
See http://bioinformatics.oxfordjournals.org/cgi/reprint/25/6/701.
Breaking news today
Washington, DC, Mar. 11—President Obama has signed a spending bill that includes a slight increase in funding for the National Institutes of Health for fiscal year 2009, which began Oct. 1, 2008. The NIH 2009 budget is $30.3 billion, a 3.2% increase from last year.
I just wonder if the budget increase is enough to offset expected inflation?
Obama lifts embryonic stem cell research ban
Washington, DC, Mar. 9—U.S. President Barack Obama signed an executive order that lifted the ban on federal funding for embryonic stem cell research.
From Biotechniques.
Sequencing literature watch
Lister R, Ecker JR.
Genome Res. 2009 Mar 9. [Epub ahead of print]
PMID: 19273618
2. Massively parallel sequencing of the polyadenylated transcriptome of C. elegans.
Hillier LW, Reinke V, Green P, Hirst M, Marra MA, Waterston RH.
Genome Res. 2009 Mar 6. [Epub ahead of print]
PMID: 19181841
Molecular Cell, Volume 33, Issue 5, 547-558, 13 March 2009
Article
Single-Stranded DNA Orchestrates an ATM-to-ATR Switch at DNA Breaks
Bunsyo Shiotani1andLee Zou1,2,Go To Corresponding Author,
1 Massachusetts General Hospital Cancer Center, Harvard Medical School, Charlestown, MA 02129, USA 2 Department of Pathology, Harvard Medical School, Boston, MA 02115, USA
Summary
ATM and ATR are two master checkpoint kinases activated by double-stranded DNA breaks (DSBs). ATM is critical for the initial response and the subsequent ATR activation. Here we show that ATR activation is coupled with loss of ATM activation, an unexpected ATM-to-ATR switch during the biphasic DSB response. ATM is activated by DSBs with blunt ends or short single-stranded overhangs (SSOs). Surprisingly, the activation of ATM in the presence of SSOs, like that of ATR, relies on single- and double-stranded DNA junctions. In a length-dependent manner, SSOs attenuate ATM activation and potentiate ATR activation through a swap of DNA-damage sensors. Progressive resection of DSBs directly promotes the ATM-to-ATR switch invitro. In cells, the ATM-to-ATR switch is driven by both ATM and the nucleases participating in DSB resection. Thus, single-stranded DNA orchestrates ATM and ATR to function in an orderly and reciprocal manner in two distinct phases of DSB response.
A good search engin for PubMed using computational linguistics methods
High-Performance Gene Name Normalization with GENO.
Wermter J, Tomanek K, Hahn U.
Bioinformatics. 2009 Feb 2. [Epub ahead of print]
PMID: 19188193
http://bioinformatics.oxfordjournals.org/cgi/content/full/25/6/815
Today's Nature Technology Feature: The digital generation:
Next-generation sequencing is pushing gene-expression profiling further into the digital age. But analog methods still have plenty of wind left. Nathan Blow looks at the looming battle over the cell’s transcriptome.
一场没有硝烟的战争?蜃景中的交锋?It's my time.
Products:
SOLiD™ System Sequencing Whole Transcriptome Analysis
Illumina Genome Analyzer Applications Transcriptome Analysis
1.mRNA-seq
2.tag profiling
3.small RNA discovery and analysis
Roche 454 Life Science SAGE based expression profiling on GS FLX system
“For digital gene expression you are just counting the number of times you hit a gene and then assuming that that represents the number of copies of the transcript that you have in your population,” says Chad Nusbaum.
In addition to looking at RNA expression patterns, RNA-seq can allow researchers to discover new classes of RNA, detect point mutations in expressed transcripts, identify fusion transcripts or uncover new alternative splicing events.
Comments:Is this so called "point mutations in expressed transcripts" SNPs?
For this reason a number of researchers expect microarrays to migrate towards more targeted applications in the future, perhaps associated with biomarker validation or other diagnostic applications.
Direction of microarray.
Pay attention to cytogenetics and procreation
Mar 11, 2009
理想
A breakthrough of synthetic biology.
"Brother" of systems biology,synthetic biology welcome his breakthrough to make self-reproducing synthetic ribosome, which is an important step in the quest to create artificial life forms. I almost forget that ribosome is the protein factory in organism. I can prospect a great promotion to biologic drug discovery than using bacterias.
George shocks me at a second time. I just wonder whether he will select me as one of volunteers in his personal genomics project.
Mar 8, 2009
很羡慕李白和唐寅之类的仙人,比较后我发现古人比今人接近仙境的几率要高很多!
桃花庵歌--唐寅
桃花坞里桃花庵,桃花庵里桃花仙;
桃花仙人种桃树,又摘桃花换酒钱。
酒醒只在花前坐,酒醉还来花下眠;
半醉半醒日复日,花开花落年复年。
但愿老死花酒间,不愿鞠躬车马前;
车尘马后富者趣,酒盏花枝贫者缘。
若将富贵比贫者,一在平地一在天;
若将贫贱比车马,他得驱使我得闲。
别人笑我忒疯癫,我笑他人看不穿;
不见五陵豪杰慕,无花无酒锄作田!
弘治乙丑三月 桃花庵主人 唐寅
Resolve messy code problem of Chinese for SSH visit
export LANG=zh_CN.GB18030
export LANGUAGE=zh_CN.GB18030:zh_CN.GB2312:zh_CN