New non-parametric analyis algorithm for Detecting Differentially Expressed Genes with Replicated Microarray Data

Previous nonparametric statistical methods on constructing the test and null statistics require having at least 4 arrays under each condition. In this paper, we provide an improved method of constructing the test and null statistics which only requires 2 arrays under one condition if the number of arrays under the other condition is at least 3. The conventional testing method defines the rejection region by controlling the probability of Type I error. In this paper, we propose to determine the critical values (or the cut-off points) of the rejection region by directly controlling the false discovery rate. Simulations were carried out to compare the performance of our proposed method with several existing methods. Finally, our proposed method is applied to the rat data of Pan et al. (2003). It is seen from both simulations and the rat data that our method has lower false discovery rates than those from the significance analysis of microarray (SAM) method of Tusher et al. (2001) and the mixture model method (MMM)of Pan et al. (2003).

study published by

Shunpu Zhang (2006) “An Improved Nonparametric Approach for Detecting Differentially Expressed Genes with Replicated Microarray Data,” Statistical Applications in Genetics and Molecular Biology: Vol. 5 : Iss. 1, Article 30.
Available at: http://www.bepress.com/sagmb/vol5/iss1/art30

Advertisements

Even the hopeful US president jumps on Web2.0 bandwagon

 Barack Obama looks to be diving into this whole “Web 2.0” thing head first, what with his own Facebook profile, Flickr account, and YouTube account. In addition to all this stuff, he also has my.barackobama.com, a social networking type site for his supporters to create profiles, network, and make blogs all about how great Barack Obama is. Meanwhile Former Senator John Edwards is also facing setback in his blogs when two of his former bloggers bloggers Amanda Marcotte and Melissa McEwan are asked to step down for posting blogs that upset the Christian community and Bush supporters

So whats preventing our young scientists from going web2.0 and using blogs, Business networking sites such as Linkedin has given much required value to the business commnity compared to stes like Orkut whch is for the liter side of networking althought even orkut also offers communities too , shouldnt it be time to start one for the scientific community , there are few small steps in this way such as

http://www.cos.com   Community of Science (COS) is the leading global resource for hard-to-find information critical to scientific research and other projects across all disciplines.

http://labcircle.net/  Networking – the new LabCircle.net makes it possible. It is where the global laboratory, analysis, biotech, chemistry and pharma industry meets. Based on the theory of “six degrees of separation”, the club allows members to maintain their personal networks, generate new contacts and actively participate in various forums to exchange information, experiences and opinions.

http://www.scientistsolutions.com/ an international life science forum

http://linkedin.com/  reach a key decision maker and find your colleague or someone working in your field

google video publish your expertise in tackling the problems facing while operating your protolcs or project work , tips and tricks what ever it is all you need is a webcam

James from Research Information Network UK has commented on a previous blog I had published about an article on how researchers fish for information ,

Early in 2006, the Research Information Network commissioned a study as part of its work to promote better arrangements for researchers to find out what information resources relevant to their work are available, where these are, and how they may have access to them. The work has now been concluded, and the report from the study is attached below.

http://www.rin.ac.uk/researchers-discovery-services 

Surprisingly many people still do not know hoe to use the search features of google yet

Microaray and Genomcis consortiums have now started to use more collaborating tools such as wikipedia and wiki pages. few good examples are

https://daphnia.cgb.indiana.edu/83.html and http://en.wikiversity.org/wiki/Portal:Life_Sciences  and http://www.e-biosci.org/

Online Microarray tools

Open source was always the favourite with scientists, Now with companies liek Google and IBM pushing the concept of software as a service educational institutions and non profit organisation alike can offer there efficiencies and expertise to scores of scientists cost effectively

for a start take a look at the online microarray analysis tool offered at European Bioinformatics Institute

http://www.embl-ebi.ac.uk/expressionprofiler/

Carnegie Mellon U. Transforms DNA Microarrays With Standard Internet Communications Protocol

 Source: Carnegie Mellon University December 2005

A standard Internet protocol that checks errors made during email transmissions has now inspired a revolutionary method to transform DNA microarray analysis, a common technology used to understand gene activation. The new method, which blends experiment and computation, strengthens DNA microarray analysis, according to its Carnegie Mellon University inventor, who has published his findings in the December issue of Nature Biotechnology with collaborators at the Hebrew University in Israel. Ziv Bar Joesph

The innovative method combines a new experimental procedure and a new algorithm to identify gene activation captured by DNA microarray analysis with greater sensitivity and specificity. The work holds great promise for vastly improving research on health and disease, according to Ziv Bar-Joseph, assistant professor of computer science and biological sciences at Carnegie Mellon.

“We are very excited about introducing this versatile, powerful method to the research community because it can be used to study a wide range of complex, dynamic systems more comprehensively,” said Bar-Joseph, who also is a member of the Center for Automated Learning and Discovery at the School of Computer Science. “Such systems under study include stress and drug response, cancer and embryo development.”

DNA microarray analysis — a multimillion-dollar-a-year industry — identifies gene activation in living, complex biological systems. DNA microarrays monitor the behavior of thousands of genes over time by detecting changes in the expression of as many as 30,000 different genes on one small chip. The technique has been used to study some of the most important biological systems, including how cells normally divide (the cell cycle) and immune responses to disease and infection.

“Ultimately, we think that the addition of this method to standard DNA microarray analysis will make it more accurate and cost-effective,” Bar-Joseph added.

“While DNA microarrays are very powerful, they present a sampling problem,” Bar-Joseph said. “DNA microarrays only take static snapshots of gene activity over time. In between these snapshots, genes could be activated and we just don’t see them turning on. Our protocol will offer greater overall sensitivity in detecting the expression of any gene, even if a gene turns on when no microarray sampling takes place.”

Bar-Joseph’s procedure is based on a “check-sum” protocol initially developed to ensure that email messages sent via the Internet don’t become garbled in transmission. In the standard Internet check-sum protocol, bits of information that begin as one value (0 or 1) may inadvertently flip to the opposite value as they move from one computer to the next in the form of an email. This data loss, ascribed to noise in the communication channel, is checked by counting the number of 1’s in the message. If this number is odd, then the last bit is set to 1; otherwise it is set to 0. By comparing the number of 1’s on the sending end with the value of the last bit on the receiving end, the recipient’s computer can determine whether the message was accurately received. If not, the recipient’s computer asks the sender’s computer to forward the message again.

Bar-Joseph’s method carries out a similar analysis of the microarray snapshots by “checking” the sum of a set of DNA microarray data points over time (a time series experiment) against the “summary” of the temporal response. If the two sets of results are equal, then what is captured by the DNA microarray time series is real. If the time series results produce a lower value than the microarray summary, the protocol indicates that the researchers have missed a gene’s activation somewhere in their time series.

Just as important, according to Bar-Joseph, is whether a DNA microarray summary value exceeds its time sequence value. If that’s the case, then researchers have likely identified gene activity that should be attributed to changes taking place during an experiment — adding a chemical or changing the temperature, for instance. This aspect of the method provides scientists with the specificity they need to weed out such introduced gene activation from fundamental gene activation pathways that form the hallmark of processes like cancer or immunity. To prove the effectiveness of this new method, Bar-Joseph studied the human cell division cycle. Considered one of the most important biological systems, the cell cycle plays a major role in cancer. Using their new method, Bar-Joseph and his colleagues identified many new human genes that were not previously found to be participants in this system.

“This new set of gene discoveries opens the way to new and more accurate models of the cell cycle system, which in turn can lead to new targets for cancer drugs,” said Bar-Joseph.

The new method also overcomes synchronization loss, a vexing problem for scientists who study hundreds or thousands of cells over time, according to Bar-Joseph. Large groups of living cells that start out together at the same biological point in time eventually become asynchronized in their activities, he noted.

“You can compare a group of cells starting out in an experiment like a group of marathoners at the starting line. Over time, some marathoners will be far ahead on the track, while others will fall back.” After the race begins, finding one marathoner among the thousands is difficult. Similarly, with asynchronous cells, trying to sort out a single cell response is virtually impossible. But Bar-Joseph has incorporated mathematical tools in his method that can detect genes affected by such asynchrony in a population of cells.

Bioinformatics Techniques for spam detection

Its not a new topic IBM has discovered that it could use many of the pattern detection techniques and analysis used in bioinformatics in other fields as well.

I thought of adding this as bioinformatics is and microarrays are growing in popularity and decided to give few bytes such articles as wel.

Many of these studies are based on the homology detection. Perhaps going forward the techniques used in SNP detection in SNP microarrays might also find use in other fields notably in spam detection and share market analysis or trends analysis

I find some of the presentation onthe web andd from IBm on using the famous Teiresias algorithm, for spam detection

Chung-Kwei applies advanced pattern matching algorithms developed in IBM’s bioinformatics group to spam detection. This new classification algorithm can detect complex patterns in messages that go beyond the simple word or word phrases used in most algorithms.

A technique originally designed to analyse DNA sequences is the latest weapon in the war against spam. An algorithm named Chung-Kwei (after a feng-shui talisman that protects the home against evil spirits) can catch nearly 97 per cent of spam.

Chung-Kwei is based on the Teiresias algorithm, developed by the bioinformatics research group at IBM’s Thomas J Watson Research Center in New York, US. Teiresias was designed to search different DNA and amino acid sequences for recurring patterns, which often indicate genetic structures
that have an important role.

Instead of chains of characters representing DNA sequences, the research group fed the algorithm 65,000 examples of known spam. Each email was treated as a long, DNA-like chain of characters. Teiresias identified six million recurring patterns in this collection, such as “Viagra”.

Each pattern represented a common sequence of letters and numbers that had appeared in more than one unsolicited message. The researchers then ran a collection of known non-spam (dubbed “ham”) through the same process, and removed the patterns that occurred in both groups.

Genuine email Incoming email was given a score based on how many spam patterns it had. A long email that only had a few spammy sentences would get a relatively low score; but one with many patterns spread across the length of the message would score much higher. The Chung-Kwei correctly identified 64,665 of 66,697 test messages as being spam or 96.56 per cent. More importantly, its rate of misidentifying genuine email as spam was just 1 in 6000 messages. Losing a single email in a torrent of spam is a greater failing in a filter than letting the occasional spam email through.

Chung-Kwei deals with common spammer strategies to dodge pattern-recognition schemes, such as replacing the s with a $, as in “increa$e your $ex power” using its built-in tolerance for different, but
functionally equivalent, DNA sequences. Just as in genetic analysis, Teiresias could be taught that CCC and CCU codons both produce the same amino acid, proline, the anti-spam system an be trained to accept $ and s as identical.

IBM intends to include Chung-Kwei in its commercial product, SpamGuru. Justin Mason, who developed SpamAssassin, one of the most popular open-source anti-spam filters, says that Chung-Kwei looks promising.

 

 

 

standardization in microarray analysis software industry

scouting for the right software for the microarray analysis software , kept me thinkng why despite these software being used by scores or scientists no one has come forward to create what can be called as a standard for such software, the confusion rains in this field as one company’s software data do not work with another one and vice versa, For an industry like biology and drug discovery  that is trying to benefit from the knowledge of mathematics statitics and chemistry physics inability to port data across platform is a serious roadblock. there are standards such as MIAMe and MAGE but these are just data standards, not for softwares, I believe ther should be  something similar to ISO standards, SEI CMI etc.

majority of the newsgroup and forums are used by graduate and at times senor researchers to find out which is the best software to be used, I thought of starting a wiki page where researchers can post their comments and rate the products and compare the features against each other,

can open source ideals begin to give a real answer to biotech’s future

would it be possible to adopt the ideals of the open source in microarray development, there has been many research works that can be hailed as open source ideals in the biotechnology space, human genome project can be the perfect example, But apart from the few attempts by academia and non profits institutions there havnt been many attempts to look at this as a way forward, Microarray development can be termed as a lucrative field where such a coalition would accrue great benefits, By releasing the research works for others for free of cost it is possible to bring down the cost of microarray, there is no doubt that it would benefit the new research frontiers such as pharmacogenomics and toxicogenomics, by reducing the cost per array closer to any to other screening test currently adopted in hospitals or used by forensic labs. Microarray can also be used for reducing the costly PCR technique by closing in on a more focused number of genes to amplify from But it would mean that there has to be enough researchers out there who will be buying theses product in the first stage itself so as the company involved i such an audacious attempt would recover its cost and make profits to continue further work, thats a major hurdle to overcome, as custom microarray or at times even the existing one may not be useful to every researcher even if they are working on the same genome for example one person may be in toxicology research and the other in ecology or pure genetics even if they work on same genome the controls required and number of gene of interest would vary vastly across the spectrum, it may take a long time for the open source ideals to bear any fruit in this arena but that may be the way forward to bring meaningful results with less cost, till outsourcing can be a start for all things to come

 Abin paul Xavier

http://www.ocimumbio.com

%d bloggers like this: