Splice life to understand yourself: 2009/03

Mar 31, 2009

My newly published papers in 2009

1. Transcription and splicing regulation in human umbilical vein endothelial cells under hypoxic stress conditions by exon array. Hang X, Li P, Li Z, Qu W, Yu Y, Li H, Shen Z, Zheng H, Gao Y, Wu Y, Deng M, Sun Z, Zhang C. BMC Genomics. 2009 Mar 25;10(1):126. PMID: 19320972

2. REMAS: a new regression model to identify alternative splicing events from exon array data. Zheng H, Hang X, Zhu J, Qian M, Qu W, Zhang C, Deng M. BMC Bioinformatics. 2009 Jan 30;10 Suppl 1:S18. PMID: 19208117

3. Exon array data analysis and new algorithms for alternative splicing identification. HANG Xing-Yi, DENG Ming-Hua, SUN Zhi-Xian, ZHANG Cheng-Gang. Bulletin of the academy of military medical science. (Chinese) Accepted.

The second paper is based on the cooperation with Peking University and I am one of the co-first authors. I have given an oral presentation of the same topic in APBC (Asia Pacific Bioinformatics Conference) 2009. It is notable that this paper is one of the two papers of Best Paper Awards in APBC2009.

Finally, pray for my ongoing project of stroke disease. Good luck!

Mar 28, 2009

Differentiation in bioinformatics

Chase up the trends of new in silico biology.

Read more papers of Plos CB and MSB. Comparing with Bioinformatics and NAR, I find Plos CB and MSB prefer to inspire the new application to functional biology. New computational biology and systems biology blaze the promising way in functinonal biology.However, bioinformatics will never fade aways but play more roles as technological skills.

Mar 26, 2009

牛年好兆头，老婆中奖得了个金牛，还周大福的呢！

真的，金牛这年头每个人都太需要了，特别是金融危机下的牛年啊。
上图：

Mar 24, 2009

Nature Review Genetics research highlights

Nature Review Genetics volume 10 February 2009

ALTERNATIVE SPLICING: Deciding between the alternatives

"altered splicing patterns might be more important than expression changes in determining complex human traits."

ORIGINAL RESEARCH PAPERS
Yu, Y. et al. Dynamic regulation of alternative splicing by silencers that modulate 5′ splice site competition. Cell 135, 1224–1236 (2008)
Heinzen E. L. et al. Tissue-specific genetic control of splicing: implications for the study of complex traits. PLoS Biol. 6, e1000001 (2008).

GenBlastA: enabling BLAST to identify homologous gene sequences.

Sequence analysis is the central topic of bioinformatics all along.

Genome Res. 2009 Jan;19(1):143-9. Epub 2008 Oct 6.

GenBlastA: enabling BLAST to identify homologous gene sequences.

She R, Chu JS, Wang K, Pei J, Chen N.

School of Computing Science, Simon Fraser University, Burnaby, British Columbia, V5A 1S6 Canada.

BLAST is an extensively used local similarity search tool for identifying homologous sequences. When a gene sequence (either protein sequence or nucleotide sequence) is used as a query to search for homologous sequences in a genome, the search results, represented as a list of high-scoring pairs (HSPs), are fragments of candidate genes rather than full-length candidate genes. Relevant HSPs ("signals"), which represent candidate genes in the target genome sequences, are buried within a report that contains also hundreds to thousands of random HSPs ("noises"). Consequently, BLAST results are often overwhelming and confusing even to experienced users. For effective use of BLAST, a program is needed for extracting relevant HSPs that represent candidate homologous genes from the entire HSP report. To achieve this goal, we have designed a graph-based algorithm, genBlastA, which automatically filters HSPs into well-defined groups, each representing a candidate gene in the target genome. The novelty of genBlastA is an edge length metric that reflects a set of biologically motivated requirements so that each shortest path corresponds to an HSP group representing a homologous gene. We have demonstrated that this novel algorithm is both efficient and accurate for identifying homologous sequences, and that it outperforms existing approaches with similar functionalities.

PMID: 18838612

Url: http://genome.sfu.ca/projects/genBlastA/

Mar 23, 2009

笔记

么事不去骨干版，无聊人居多；
么事不看其美贴，低俗者居多；
么事少去新浪网，意淫贴居多。

难事常有，多看多想必有路。凡事不要急，耐心探索必有路。

Mar 18, 2009

A potential and new drug for stroke, another one.

Nature Reviews Drug Discovery Volume 8 | March 2009 | 183

The FDA’s Cardiovascular and Renal Drugs Advisory Committee voted in favour of approving prasugrel, an oral anticoagulant developed by Daiichi Sankyo and Eli Lilly.

The advisory committee, however, agreed that the benefit of preventing myocardial infarction, cardiovascular death and stroke outweighs the risk of major bleeding, such as haemorrhages, and voted 9–0 in favour of prasugrel approval.

来自团长和天天的启发

今天看《我的团长我的团》，Mike愤怒且感动地说：“你是我见过的最热爱士兵的军官，因为你什么都没有。” 是的，龙文章除了理想确实什么都没有，为了拥有自己的团，为了打鬼子，他说了无数的谎话，但是都是为了理想。“回家不积极，脑袋有问题”，呵呵！要用最朴素和真诚的言语去打动你的部下。其实作为团长，龙文章的压力和责任是最大的。在片中好几处都不经意的表现出来，作为一个团长既要实现自己的理想，又要对兄弟们的生命负责，对于炮灰团来讲这个压力何其之大。所以当Mike开车扬长而去之际，他执意要步行回去，好好缓解一下吧！他也苦笑着地对Mike讲：“我羡慕你能说这样的话----这不是我职责范围内的事。”

昨天天天也说了同样的话，当你的理想刚起步的时候是经不起折腾的，只有“哄”着你的手下，用你的真心去感化和团结他们。XY也说：“我要和员工一起出去玩的，不能脱离他们”！是这样的。

我今天离开实验室看见师弟、师妹们忙碌的身影，我突然有所感想：无论是谁，当你接近或实现理想的时候，千万不要持有苛求的态度，要有一颗宽容博爱的心，要力求公平。只有强者的社会必定是不稳定的。

Thanks gifts from Harvard B.

Hi, B. I am so appreciated that you bring me so many "treasures" from uncle Sam. Declaration of Independence, Switzerland caillers and especially the "The Last Lecture", which will inspire me to live each day of my lives with purpose and joy.

By the way, I have shared your "caillers" with my lab colleagues, very tasty.

Mar 16, 2009

NGS Alignment Programs and comments from Sanger experts

Cite from http://www.sanger.ac.uk/Users/lh3/NGSalign.shtml

Software

Currently, this page only includes software I am familiar with. Most of them aim for aligning next-generation sequencing (NGS) data and were developed since 2007. I may extend the list when I have time. Several notes:

The programs are listed in the alphabet order in each category.
Features shown in brackets are optional and may affect efficiency.
The version number shown for each program is the one I have checked, but may not be the latest.

Indexing Reads with Hash Tables

Cross_match [1.080730]. The latest cross_match has been substantially improved for short read alignment. Its speed is comparable to other aligners and might be the best choice for local alignment.
- Platform: Illumina; 454
- Features: gapped alignment (maximum 2 gaps in the fast mode); local alignment
- Availability: academic free source codes
Eland [1.0]. Probably the first short read aligner. Eland substantially influences many aligners in this category and still outperforms many followers. Although it is not the fastest any more, it is close to the fastest and has the smallest memory footprint. Eland itself works for 32bp single-end reads only. Additional Perl scripts in GAPipeline extend its ability.
- Platform: Illumina
- Features: PET mapping; mapping quality; SNP caller; counting suboptimal occurrences.
- Advantages: fast; light-weighted
- Availability: free source codes for machine buyers.
MAQ [0.7.1, PMID: 18714091]. This is my program to align short reads and to call variants. It has been used in several high-profile papers.
- Platform: Illumina; SOLiD (partial)
- Features: PET mapping; quality aware; gapped alignment for PET; mapping quality; adapter trimming; partial occurrences counting; SNP caller
- Advantages: feature rich; publication proved
- Limitation: up to 128bp reads; no gapped alignment for single-end reads
- Availability: GPL
RMAP [0.41, PMID: 18307793]. One of the earliest short read aligners.
- Platform: Illumina
- Features: quality aware; [gapped alignment]; best unique hits
- Availability: GPL
SeqMap [1.0.8, PMID: 18697769]. An Eland-like program.
- Platform: Illumina
- Features: [gapped alignment]
- Limitation: not counting suboptimal hits
- Availability: GPL
SHRiMP [1.10]. Q-gram based algorithm.
- Platform: SOLiD; Illumina; 454
- Features: SOLiD mapping; gapped alignment; potential support for mapping quality
- Limitations: a little slow
- Availability: GPL
ZOOM [1.2.5, PMID: 18684737]. Eland-like algorithm with the improvement of using spaced seed. ZOOM supports longer reads and faster than Eland, although it uses more memory. ZOOM is feature rich, but some features may come at the cost of speed.
- Platform: Illumina; SOLiD
- Features: PET mapping; SOLiD mapping; [gapped alignment]; [mapping quality]; [quality aware]
- Advantage: fast; feature rich
- Limitation: up to 224bp reads; gapped alignment comes with cost
- Availability: commercial

Indexing Genome with Hash Tables

BFAST [0.3.1] (alternative link).
- Platform: Illumina; SOLiD
- Availability: academic free
MOM [0.1, PMID: 19228804].
- Platform: Illumina; (?)
- Features: counting suboptimal occurrences; local alignment
- Availability: free
Mosaik [0.9.891]. Although I have not tried personally, Mosaik has been used in several high-profile publications and delivers good performance.
- Platform: Illumina; 454
- Advantages: long reads
- Availability: academic free binary
NovoAlign [2.0]. NovoAlign competes with MAQ on speed and feature set, and may be more accurate than MAQ. It also implements several features missing in MAQ.
- Platform: Illumina
- Features: PET mapping; gapped alignment; mapping quality; quality aware; adapter trimming; MAQ format
- Advantages: highly accurate; gapped alignment; feature rich
- Requirements: >8GB RAM for paired-end mapping against the human genome.
- Availability: proprietary; academic free binary (no multi-threading support)
PASS [0.5, PMID: 19218350].
- Platform: Illumina; SOLiD; 454
- Features: PET mapping
- Advantages: long reads
- Requirement: >15GB RAM against human genome
- Availability: free source codes to academic users
RazerS [20081029]. Based on the SeqAn library.
- Availability: free source codes
SOAPv1 [1.11, PMID: 18227114]. The first published short read aligner.
- Platform: Illumina
- Features: PET mapping; adapter trimming; gapped alignment; SNP caller; counting occurrences
- Advantages: feature rich
- Requirements: >14GB RAM against human genome
- Availability: GPL

Merge Sorting

Slider [0.6, PMID: 18974170]. A very clever short read aligner specifically designed for Illumina reads. It is able to use the second best base call, which potentially improves the accuracy on SNP finding.
- Platform: Illumina
- Features: Using second base
- Advantages: fast; potentially more accurate on SNP discovery
- Requirements: >160GB disk space
- Availability: free source codes

Indexing Genome with Suffix Array/BWT

Bowtie [0.9.9]. This is probably the fastest short read aligner to date. Although under the default option Bowtie does not guarantee to find the best hit or tell if the hit it finds is unique, it is possible to improve this behaviour at the cost of speed.
- Platform: Illumina
- Features: partial PET mapping; quality aware; [mapping quality]
- Advantages: very fast
- Availability: GPL
BWA [0.4.5]. Another aligner written by me. Given high-quality reads, it is an order of magnitude faster than MAQ while achieving similar alignment accuracy.
- Platform: Illumina; SOLiD (in development)
- Features: PET mapping; gapped alignment; mapping quality; counting suboptimal occurrences; SAM output
- Advantages: fast
- Limitations: slow for long reads and reads with high error rate
- Availability: GPL
SOAPv2 [2.0.1]. A marvelous program developed by the group who wrote BWT-SW. The current version of SOAPv2 performs best for reads containing no more than 2 mismatches.
- Platform: Illumina
- Features: PET mapping; mapping quality; counting occurrences
- Advantages: fast
- Availability: academic free binary
vmatch [SpringerLink].
- Availability: academic free binary

Recommendation

First of all, as I am the key developer of two short read aligners (BWA and MAQ), it is really hard for me to give an unbiased evaluation. Please bear this fact in mind when reading through my comments below.

For Illumina reads, I would recommend my program BWA. BWA implements most of the major features of a practical aligner. It is relatively small in memory and highly efficient with little tradeoff on accuracy. BWA outputs alignment in the SAM format. Users may use SAMtools to sort/merge alignments and to make variants calls. One potential concern about BWA is it has not been widely used at the moment. It may be less robust than those publication-proved aligners such as Eland and MAQ.

Mapping inconsistent read pairs with NovoAlign is recommended for PET-based structural varition detection where alignment accuracy is the leading factor on reducing false positive calls. NovoAlign is the most accurate aligner to date.

Affymetrix and Genisphere Launch Best-in-Class Solution for microRNA Expression Research

Benefits include fastest labeling assay on the market and industry-leading array content covering 71 organisms.

http://www.affymetrix.com/products_services/arrays/specific/mi_rna.affx

Mar 14, 2009

Con for my paper!

Editorially accepted manuscripts: Transcription and splicing regulation in human umbilical vein endothelial cells under hypoxic stress conditions by exon array
BMC Genomics

Mar 13, 2009

Bookmarker of "Fundamentals of Biostatistics"

1-30.

Mar 12, 2009

I think the editors fo Bioinformatics are angry.

They are angry about the messy situation of submissions on microarray analysis. An explicit requirement for these topics has been ruled. Please pay attention to the editorial before you submit similar papers to Bioinformatics.

See http://bioinformatics.oxfordjournals.org/cgi/reprint/25/6/701.

Breaking news today

NIH gets slight boost in 2009 budget
Washington, DC, Mar. 11—President Obama has signed a spending bill that includes a slight increase in funding for the National Institutes of Health for fiscal year 2009, which began Oct. 1, 2008. The NIH 2009 budget is $30.3 billion, a 3.2% increase from last year.

I just wonder if the budget increase is enough to offset expected inflation?

Obama lifts embryonic stem cell research ban
Washington, DC, Mar. 9—U.S. President Barack Obama signed an executive order that lifted the ban on federal funding for embryonic stem cell research.

From Biotechniques.

Sequencing literature watch

1. Finding the fifth base: Genome-wide sequencing of cytosine methylation.
Lister R, Ecker JR.
Genome Res. 2009 Mar 9. [Epub ahead of print]
PMID: 19273618

2. Massively parallel sequencing of the polyadenylated transcriptome of C. elegans.
Hillier LW, Reinke V, Green P, Hirst M, Marra MA, Waterston RH.
Genome Res. 2009 Mar 6. [Epub ahead of print]
PMID: 19181841

Molecular Cell, Volume 33, Issue 5, 547-558, 13 March 2009
Article
Single-Stranded DNA Orchestrates an ATM-to-ATR Switch at DNA Breaks
Bunsyo Shiotani1andLee Zou1,2,Go To Corresponding Author,
1 Massachusetts General Hospital Cancer Center, Harvard Medical School, Charlestown, MA 02129, USA 2 Department of Pathology, Harvard Medical School, Boston, MA 02115, USA

Summary

ATM and ATR are two master checkpoint kinases activated by double-stranded DNA breaks (DSBs). ATM is critical for the initial response and the subsequent ATR activation. Here we show that ATR activation is coupled with loss of ATM activation, an unexpected ATM-to-ATR switch during the biphasic DSB response. ATM is activated by DSBs with blunt ends or short single-stranded overhangs (SSOs). Surprisingly, the activation of ATM in the presence of SSOs, like that of ATR, relies on single- and double-stranded DNA junctions. In a length-dependent manner, SSOs attenuate ATM activation and potentiate ATR activation through a swap of DNA-damage sensors. Progressive resection of DSBs directly promotes the ATM-to-ATR switch invitro. In cells, the ATM-to-ATR switch is driven by both ATM and the nucleases participating in DSB resection. Thus, single-stranded DNA orchestrates ATM and ATR to function in an orderly and reciprocal manner in two distinct phases of DSB response.

A good search engin for PubMed using computational linguistics methods

http://www.semedico.org/

High-Performance Gene Name Normalization with GENO.
Wermter J, Tomanek K, Hahn U.
Bioinformatics. 2009 Feb 2. [Epub ahead of print]
PMID: 19188193
http://bioinformatics.oxfordjournals.org/cgi/content/full/25/6/815

Today's Nature Technology Feature: The digital generation:

NATURE|Vol 458|12 March 2009
Next-generation sequencing is pushing gene-expression profiling further into the digital age. But analog methods still have plenty of wind left. Nathan Blow looks at the looming battle over the cell’s transcriptome.

一场没有硝烟的战争？蜃景中的交锋？It's my time.

Products:
SOLiD™ System Sequencing Whole Transcriptome Analysis
Illumina Genome Analyzer Applications Transcriptome Analysis
1.mRNA-seq
2.tag profiling
3.small RNA discovery and analysis
Roche 454 Life Science SAGE based expression profiling on GS FLX system

“For digital gene expression you are just counting the number of times you hit a gene and then assuming that that represents the number of copies of the transcript that you have in your population,” says Chad Nusbaum.

In addition to looking at RNA expression patterns, RNA-seq can allow researchers to discover new classes of RNA, detect point mutations in expressed transcripts, identify fusion transcripts or uncover new alternative splicing events.

Comments：Is this so called "point mutations in expressed transcripts" SNPs？

For this reason a number of researchers expect microarrays to migrate towards more targeted applications in the future, perhaps associated with biomarker validation or other diagnostic applications.

Direction of microarray.

Tree-Planting Day

Tree-Planting Day of this year. Remember to do a green thing today.

Pay attention to cytogenetics and procreation

A very promising field for health industry.Affymetrix SNP 6.0 can deal with both analysis of copy number & cytogenetics and SNPs.

Mar 11, 2009

理想

很高兴多少年理想一直在我心中，昨天又强烈点燃(mY gENE)！我坚信我要是坚定起来力量是可怕的，因为很早之前我就曾经领悟到第七感，虽然很难很难持久，但毕竟我做到过，只要我愿意我肯定能再次做到。所以最近把圣斗士星矢又从头到尾看了一遍，这个是团队作战和个人奋斗相结合必看的教育片，还能保持年轻的心。不过必须承认我还是喜欢黄金圣斗士，星矢这样的草根不是我的偶像，只是特例，很多事情当面的气质就已经盖棺定论了。

A breakthrough of synthetic biology.

"Brother" of systems biology,synthetic biology welcome his breakthrough to make self-reproducing synthetic ribosome, which is an important step in the quest to create artificial life forms. I almost forget that ribosome is the protein factory in organism. I can prospect a great promotion to biologic drug discovery than using bacterias.

George shocks me at a second time. I just wonder whether he will select me as one of volunteers in his personal genomics project.

Mar 8, 2009

很羡慕李白和唐寅之类的仙人，比较后我发现古人比今人接近仙境的几率要高很多！

赠诗一首：
桃花庵歌－－唐寅
桃花坞里桃花庵，桃花庵里桃花仙；
桃花仙人种桃树，又摘桃花换酒钱。
酒醒只在花前坐，酒醉还来花下眠；
半醉半醒日复日，花开花落年复年。
但愿老死花酒间，不愿鞠躬车马前；
车尘马后富者趣，酒盏花枝贫者缘。
若将富贵比贫者，一在平地一在天；
若将贫贱比车马，他得驱使我得闲。
别人笑我忒疯癫，我笑他人看不穿；
不见五陵豪杰慕，无花无酒锄作田！

弘治乙丑三月桃花庵主人唐寅

Resolve messy code problem of Chinese for SSH visit

Add information below into .bash_profile
export LANG=zh_CN.GB18030
export LANGUAGE=zh_CN.GB18030:zh_CN.GB2312:zh_CN

Mar 3, 2009

A great resource for brain expression

http://www.brain-map.org/

Mar 31, 2009

Mar 28, 2009

Mar 26, 2009

Mar 24, 2009

GenBlastA: enabling BLAST to identify homologous gene sequences.

Mar 23, 2009

Mar 18, 2009

Mar 16, 2009

Software

Indexing Reads with Hash Tables

Indexing Genome with Hash Tables

Merge Sorting

Indexing Genome with Suffix Array/BWT

Recommendation

Mar 14, 2009

Mar 13, 2009

Mar 12, 2009

Mar 11, 2009

Mar 8, 2009

Mar 3, 2009

Blog Archive

Systemer