Top 100+ Bioperl Interview Questions And Answers
Question 1. What Is Bioperl?
BioPerl is a toolkit of perl modules beneficial in constructing bioinformatics solutions in Perl. It is constructed in an object-orientated way in order that many modules depend upon each other to achieve a venture. The series of modules within the bioperl-stay repository consist of the middle of the functionality of bioperl. Additionally auxiliary modules for growing graphical interfaces (bioperl-gui), persistent storage in RDMBS (bioperl-db), jogging and parsing the outcomes from loads of bioinformatics packages (Run package), software program to automate bioinformatic analyses (bioperl-pipeline) are all to be had as CVS modules in our repository.
Question 2. What Is The Difference Between 1.5.1 And 1.4.0? What Do You Mean Developer Release?
The 1.Four.X collection changed into launched in 2004 and represented a stable launch collection. The 1.Five.Zero launch changed into made in early 2005 however numerous annoying insects were covered in it. The 1.Five.1 release in October has fixed those insects and additionally brought a number of recent modules as well. See the Changes record for more information.
Developer releases are peculiar numbered releases (1.Three, 1.Five, and many others) no longer supposed to be completely stable (even though all assessments ought to skip). Stable releases are even numbered (1.0, 1.2, 1.6) and meant to provide a solid API so that modules will retain to appreciate the API throught a solid release series. We cannot guarantee that APIs are solid among releases (i.E. 1.6 may not be absolutely well suited with scripts written for 1.Four), however we endeavor to maintain the API solid in order that upgrading is easy.
0.7.X series (zero.7.Zero, zero.7.2) have been all released in 2001 and had been stable releases on 0.7 branch. This manner they'd a set of functionality that is maintained throughout (no experimental modules) and were guaranteed to have all exams and subsequent bug restore releases with the 0.7 designation might no longer have any API modifications.
The zero.9.X series become our first try at liberating so called developer releases. These are snapshots of the actively evolved code that at a minimum bypass all our checks.
Bio Interview Questions
Question three. Can You Explain The Object Model Design And Rationale?
There isn't any easy answer to this question. Simply placed, that is a toolkit which has grown organically. The dreams and user target audience has evolved. Some choices were made and we were pressured to stay by using them rather than smash backward compatibility. In addition there are exclusive philosophies of software program improvement. The important builders on the venture have tried to impose a fixed of standards on the code so that the project can be coordinated without each commit being cleared via a few key individuals (see Eric S. Raymond's essay "The Cathedral and the Bazaar" for extraordinary styles of strolling an open source mission - we're surely on the Bazaar end). Advanced BioPerl talks extra about precise layout dreams.
The clear consensus of the task developers is that BioPerl should be consistent. This can also motive us to pay the price of a few copy-and-paste of code, with the Get/Set accessor techniques being a sore spot for some, and the lack of the usage of AUTOLOAD. By being constant we are hoping that a person can grok the gist of a module from the fundamental documentation, see instance code, and get a hard and fast of techniques from the API documentation. We intention to make the core item layout smooth to understand. This has no longer been found out by means of any stretch of the imagination as the toolkit has properly over a thousand modules in bioperl-live and bioperl-run alone.
That stated we do want to enhance things. We want to test with more recent modules which make Perl greater item-orientated. We have high hopes for a number of the guarantees of Perl6. To try to recognize this aim we're encouraging developers to play with new object models in a bioperl-experimental venture.
Question four. How Do I Submit A Patch Or Enhancement To Bioperl?
We endorse the subsequent. Post your idea to the proper mailing list. If it's far a definitely new concept don't forget taking us through your notion method. We'll help you tease out the necessary statistics inclusive of what methods you will need and how it is able to have interaction with different BioPerl modules. If it's far a port of some thing you've already worked on, deliver us a precis of the modern strategies. Make certain there is an interface to the module, now not simply an implementation and make sure there could be a fixed of tests with the intention to be within the t/ listing to insure that your module is examined. If you've got a counseled patch and/or code enhancement, the SubmitPatch HOWTO gives recommendations on how to correctly put up them thru Bugzilla. See additionally Advanced BioPerl for greater information.
Java Script Tutorial
Question five. What Is Bioperl-pedigree?
The Pedigree bundle became started out by means of Jason Stajich and affords primary tools for interacting with pedigree records and rendering pedigree plots.
Java Script Interview Questions
Question 6. What Is Bioperl-gui?
The GUI package presents a Graphical User Interface for interacting with series and feature objects. It is used as part of the Genquire challenge.
Question 7. What Is Bioperl-microarray?
The Microarray package gives a few basic equipment for microarray capability. It turned into commenced by means of Allen Day and can want some extra work before it's far a mature product.
PHP and Jquery Tutorial PHP and Jquery Interview Questions
Question eight. What Is Bioperl-db?
The BioPerl db package deal contains interfaces and adaptors that paintings with a BioSQL database to serialize and de-serialize Bioperl objects. Hilmar Lapp strongly recommends you use the CVS model with the modern biosql-schema.
Question nine. Bioperl-ext Won't Compile The Staden Io Lib Part - What Do I Do?
Make certain you examine the README approximately copying documents over. Some extra objects to test off earlier than asking.
Are you sure io_lib is established in which you observed it's miles, and that the install route is visible by Perl (did you solution the questions at some point of perl Makefile.PL ?)
Did you reproduction the numerous missing .H documents (os.H config.H if I don't forget right) from your io_lib supply listing into the install consist of listing while putting in io_lib?
When you ran make for io_lib did you see any mistakes or messages approximately how you should likely run "ranlib" at the library item?
Did you run "ranlib" on the hooked up libread document(s)?
Biometrics Interview Questions
Question 10. What Is Bioperl-ext?
bioperl-ext is a package of code for C-extensions (for this reason the 'ext') to BioPerl. These include interfacing to the staden IO library (the io_lib library) for studying in chromatogram documents and Bio::Ext::Align that is a Smith-Waterman implementation.
Question eleven. I'm Trying To Run Bio::tools::run::standaloneblast And I'm Seeing Error Messages Like Can't Locate Bio/tools/run/wrapperbase.Pm - How Do I Fix This?
This document is missing in version 1.2. Two possible answers: set up model 1.2.1 or more or retrieve and copy WrapperBase.Pm to the right place.
Question 12. What Does The Future Hold For Running Applications Within Bioperl?
We are trying to construct a widespread start line for analysis utility with a purpose to likely appear like Bio::Tools::Run::AnalysisFactory in order to allow the person to request ?which sort of remote or local server they want to use to run their analyses. This will hook up with the Pasteur's PISE server, the EBI's Novella server, as well as be aware of wrappers to run applications regionally.
Bio Interview Questions
Question thirteen. Hey, I Want To Run Clustalw Within Bioperl, I Used
Most of the Bio/Tools/Run listing became moved to a new bundle, bioperl-run, to assist make the dimensions of the center code smaller and separate out the more specialised nature of software running from the rest of BioPerl. You can get these modules by using installing the bioperl-run bundle. Download it thru Getting BioPerl. This changeover started out inside the bioperl 1.1 developer release.
Question 14. How Do I Tell Blast To Search Multiple Database Using Bio::tools::run::standaloneblast?
Put the names of the databases in a variable. Like so:
my $dbs = '"/dba/BMC.Fsa /dba/ALC.Fsa /dba/HCC.Fsa"';
my @params = ( d => "$dbs",
software => "BLASTN",
_READMETHOD => "Blast",
outfile => "$dir/est.Bls" );
my $factory =
my $seqio = Bio::SeqIO->new(-document=>'t/amino.Fa',-layout => 'Fasta' );
my $seqobj = $seqio->next_seq();
Question 15. How Do I Run Blast From Within Bioperl?
Use the module Bio::Tools::Run::StandAloneBlast. It will come up with access to some of the seek tools in the NCBI BLAST suite along with blastall, bl2seq, blastpgp. The primary shape is like this.
my $factory = Bio::Tools::Run::StandAloneBlast->new(p => 'blastn',
d => 'nt',
e => '1e-five');
my $seq = Bio::PrimarySeq->new(-id => 'test1',
-seq => 'AGATCAGTAGATGATAGGGGTAGA');
my $record = $manufacturing facility->blastall($seq); # get returned a Bio::SearchIO record
Question sixteen. How Do I Merge A Set Of Sequences Along With Their Features And Annotations?
Try the cat() technique in Bio::SeqUtils:
$merged_seq = Bio::SeqUtils->cat(@seqs)
This technique makes use of the primary series in the array as a foundation and provides the following sequences to it, together with their capabilities and annotations.
Question 17. Can I Query Medline Or Other Bibliographic Repositories Using Bioperl?
Yes! The solution lies in Bio::Biblio*, a fixed of modules that offer get admission to to MEDLINE and OpenBQS-compliant servers the usage of SOAP.
Question 18. How Do I Do Motif Searches With Bioperl? Can I Do "find All Sequences That Are 75% Identical" To A Given Motif?
There are a number of processes. Within BioPerl take a look at Bio::Tools::SeqPattern. Or, check the TFBS package deal. This BioPerl-compliant bundle focuses on pattern looking of nucleotide collection the usage of matrices.
It's also potential that the mixture of BioPerl and Perl's ordinary expressions ought to do the trick. You may consider the CPAN module String::Approx (this module addresses the percent match query), however experienced customers question whether or not its distance estimates are correct, the Unix agrep command is thought to be faster and more accurate.
Java Script Interview Questions
Question 19. How Do I Find All The Orfs In A Nucleotide Sequence? Antigenic Sites In A Protein? Calculate Nucleotide Melting Temperature? Find Repeats?
In fact, none of those functions are constructed into BioPerl but they may be all available within the EMBOSS package, in addition to many others. The BioPerl developers created a simple interface to EMBOSS such that any and all EMBOSS programs may be run from inside BioPerl. See Bio::Factory::EMBOSS for greater data, it is in the bioperl-run package.
If you can not find the functionality you need in BioPerl then ensure to search for it in EMBOSS, those applications integrate pretty gracefully with BioPerl. Of path, you may should install EMBOSS to get this functionality.
In addition, BioPerl after version 1.Zero.1 consists of the Pise/Bioperl modules. The Pise package deal became designed to offer a uniform interface to bioinformatics programs, and currently provides wrappers to more than 250 such packages! Included among these wrapped apps are HMMER, PHYLIP, BLAST, GENSCAN, and the EMBOSS suite. Use of the Pise/BioPerl modules does not require installation of Pise locally as it runs thru the HTTP protocol of the net.
Question 20. I Get The Warning (old Style Annotation) On New Style Annotation::series. What Is Wrong?
You're the use of an vintage version! You'll see this mistake because the modules and interface has changed beginning with BioPerl 1.Zero. Before v1.Zero there has been a Bio::Annotation module with add_Comment, add_Reference, each_Comment, and each_Reference techniques.
After v1.0 there's a Bio::Annotation::Collection module with add_Annotation('comment', $ann) and get_Annotations('comment').
Please replace your code so one can avoid seeing those caution messages. In the destiny the Reference gadgets will in all likelihood be applied with the aid of the Bio::Biblio device but we hope to keep a compatible API for these.
Question 21. How Do I Get The Reverse-complement Of A Sequence Using The Subseq Method?
One way is to skip the area to subseq within the shape of a Bio::LocationI object. This object holds strand statistics in addition to coordinates.
my $region = Bio::Location::Simple->new(-start => $start,
-quit => $cease,
-strand => "-1");
# expect we have already got a sequence item
my $rev_comp_substr = $seq_obj->subseq($location);
Question 22. How Do I Get The Complete Spliced Nucleotide Sequence From The Cds Section?
You can use the spliced_seq method. For instance:
my $seq_obj = $db->get_Seq_by_id($gi);
foreach my $feat ( $seq_obj->top_SeqFeatures )
if ( $feat->primary_tag eq 'CDS' )
my $cds_obj = $feat->spliced_seq;
print "CDS collection is ",$cds_obj->seq,"n";
Question 23. How Do I Retrieve A Nucleotide Coding Sequence When I Have A Protein Gi Number?
You should go through the protein's function table and discover the coded_by fee. The trick is to companion the coded_by nucleotide coordinates to the nucleotide entry, which you will retrieve the usage of the accession number from the identical characteristic.
my $gp = Bio::DB::GenPept->new;
my $gb = Bio::DB::GenBank->new;
# factory to show strings into Bio::Location items
my $loc_factory = Bio::Factory::FTLocationFactory->new;
my $prot_obj = $gp->get_Seq_by_id($protein_gi);
foreach my $feat ( $prot_obj->top_SeqFeatures )
if ( $feat->primary_tag eq 'CDS' )
# instance: 'coded_by="U05729.1:1..122"'
my @coded_by = $feat->each_tag_value('coded_by');
my ($nuc_acc,$loc_str) = cut up /:/, $coded_by;
my $nuc_obj = $gb->get_Seq_by_acc($nuc_acc);
# create Bio::Location object from a string
my $loc_object = $loc_factory->from_string($loc_str);
# create a Feature object by using a Location
my $feat_obj = Bio::SeqFeature::Generic->new(-place =>$loc_object);
# partner the Feature item with the nucleotide Seq object
my $cds_obj = $feat_obj->spliced_seq;
print "CDS sequence is ",$cds_obj->seq,"n";
Question 24. How Do I Parse The Cds Join Or Complement Statements In Genbank Or Embl Files To Get The Sub-places?
For instance, how can I get the the coordinates forty five and 122 in be part of(45..122,233..267) :
You ought to use primary_tag to locate the CDS features and the Bio::Location::SplitLocationI item to get the coordinates:
foreach my $function ($seqobj->top_SeqFeatures)
if ( $feature->area->isa('Bio::Location::SplitLocationI') and $function->primary_tag eq 'CDS' )
foreach my $vicinity ( $feature->region->sub_Location )
print $area->start , ".." , $vicinity->cease, "n";
PHP and Jquery Interview Questions
Question 25. How Do I Retrieve All The Features From A Sequence? How About All The Features Which Are Exons Or Have A /observe Field That Contains A Certain Gene Name?
To get all of the capabilities:
my @features = $seq->all_SeqFeatures();
To get all the capabilities filtering on simplest the ones that have the primary tag (ie. Function kind) exon.
My @genes = grep $_->primary_tag eq 'exon'
To get all of the capabilities filtering in this that have the /observe tag and within the note field comprise the asked string $noteval.
My @f_with_note = grep my @a = $_->has_tag('word') ? $_->each_tag_value('be aware') : ();
grep /$noteval/ @a; $seq->all_SeqFeatures();
Question 26. Does Bio::searchio Parse The Html Output That Blast Creates Using The -t Option?
Yes, with a twist. You can modify Bio::SearchIO's _readline() approach such that it reads in the HTML and strips it of tags the usage of the HTML::Strip module.
Please observe: We do not advocate parsing BLAST HTML output if it could be avoided. We actively help XML, tabular, and text output parsing of NCBI BLAST reviews simplest; we've got in no way supported parsing of NCBI BLAST HTML output directly through BioPerl and could now not try and rectify issues where HTML output parsing post-stripping of the tags breaks but parsing text output works. Consider this honest warning.
my $hs = HTML::Strip->new();
# update the blast parser's _readline approach with one that
# automobile-strips HTML:
package deal Bio::SearchIO::blast;
my ($self, @args) = @_;
my $line = $self->SUPER::_readline(@args);
go back until described $line;
go back $hs->parse($line);
# now parse the use of the BLAST layout module
my $in = new Bio::SearchIO(-layout => 'blast', -report => $record);
Question 27. Can I Get Domain Number From Hmmpfam Or Hmmsearch Output?
SH2_5: area 2 of 2, from 349 to 432: score 104.Four, E = 1.9e-26
Not at once however you may compute it for the reason that domain names are numbered by means of their order on the protein:
my @domains = $hit->domain names;
my $domainnum = 1;
my $total = scalar @domains;
foreach my $area ( sort $a->begin <=> $b->begin $hit->domains )
print "domain $domainnum of $general,n";
Biometrics Interview Questions
Question 28. How Do I Get The Frame For A Translated Search?
I'm the usage of Bio::Search* and its frame() to parse BLAST but I'm seeing 0, 1, or 2 in preference to the anticipated -3, -2, -1, +1, +2, +three.
Why am I seeing these extraordinary numbers and the way do I get the body consistent with BLAST?
These are GFF frames - so +1 is zero in GFF, -three could be encoded with a frame of two with the strand being set to -1.
Frames are relative to the hit or question sequence so you need to query it primarily based on sequence you are inquisitive about:
So the cost in line with a blast file of -3 may be constructed as:
my $blastframe = ($hsp->question->body + 1) * $hsp->query->strand;
Question 29. How Can I Generate A Pairwise Alignment Of Two Sequences?
Look at Bio::Factory::EMBOSS to look a way to use the water and needle alignment programs which might be part of the EMBOSS suite. Bio::Factory::EMBOSS is part of the bioperl-run package deal.
Or you can use the pSW module for DNA alignments or the dpAlign module for protein alignments. These are a part of the bioperl-ext package deal; download it via Getting BioPerl.
You also can use prss34 (a part of FASTA package deal) to evaluate the significance of a pairwise alignment by means of shuffling the sequences.
Question 30. I Want To Parse Fasta Or Ncbi -m7 (xml) Format, How Do I Do This?
It is as easy as parsing textual content BLAST results - you truly want to specify the layout as fasta or blastxml and the parser will load the right module for you. You can use the precise logic and code for all of those formats as we've got generalized the modules for sequence database looking. The web page describing Bio::SearchIO gives a table showing how the formats match as much as unique modules. Note that, for parsing BLAST XML output, you may want XML::SAX and that XML::SAX::ExpatXS is usually recommended to speed up parsing.
Question 31. What Was Wrong With Bio::equipment::blast?
Bio::Tools::Blast* is not supported, as of BioPerl model 1.1. Nothing is truly wrong with it, it has simply been outgrown by using a extra universal technique to reviews. This prevalent technique lets in us to simply write pluggable modules for FASTA and BLAST parsing whilst the use of the equal framework. This is absolutely analogous to the Bio::SeqIO device of parsing series files. However, the items produced are of the Bio::SearchIO instead of Bio::Seq variety.
Question 32. I Want To Parse Blast, How Do I Do This?
As of version 1.1, BioPerl handiest helps one technique - the Bio::SearchIO interface. There are different BLAST parsing modules within the package, but they remain simply to aid older legacy code. Bio::SearchIO supports:
Question 33. I Would Like To Make My Own Custom Fasta Header - How Do I Do This?
You need to apply the approach preferred_id_type().
Here's some instance code:
my $seqin = Bio::SeqIO->new(-document => $report,
-layout => 'genbank');
my $seqout = Bio::SeqIO->new(-fh => *STDOUT,
-layout => 'fasta');
# From Bio::SeqIO::fasta
my $remember = 1;
even as (my $seq = $seqin->next_seq)
# override the ordinary display_id together with your personal
You can bypass one of the following values to preferred_id_type: "accession", "accession.Model", "display", "number one". The description line is robotically appended to the favored identification kind but this can additionally be set, like so:
Question 34. Accession Numbers Are Not Present For Fasta Sequence Files.If You Parse A Fasta Sequence Format File With Bio::seqio The Sequences Won't Have The Accession Number. What To Do?
All the statistics is within the $seq->display_id it just wishes to be parsed out. Here is some code to set the accession variety.
(undef,$gi,undef,$acc,$locus) = split(/don't we just pass ahead and try this? For one, we don't make any assumptions about the layout of the ID part of the collection. Perhaps the parser code should try to locate if it's miles a GenBank formatted ID and cross ahead and set the accession range field. It would be trivial to do, just no person has volunteered the time - placed it at the Project precedence listing if you assume it's far essential and better but, volunteer the code patch!
Question 35. How Do I Parse A Sequence File?
Use the Bio::SeqIO system. This will create Bio::Seq objects for you.
Question 36. I Can't Get Sequences With Bio::db::genbank Any More, Why Not?
If you're walking an old BioPerl version, NCBI changed the internet CGI script that furnished this get admission to. You ought to use a present day version like 1.4.X or 1.5.X.
Question 37. How Can I Get Nt_ Or Nm_ Or Np_ Accessions From Ncbi (reference Sequences)?
To retrieve GenBank reference sequences, or RefSeqs, use Bio::DB::RefSeq, no longer Bio::DB::GenBank or Bio::DB::GenPept while you are retrieving these accession numbers. This continues to be a place of lively development due to the fact the information vendors have not provided the pleasant interface for us to query. EBI has supplied a mirror with their dbfetch device which is on the market thru the Bio::DB::RefSeq item however, there are cases in which NT_ accession numbers will no longer be retrievable.
Question 38. How Can I Use Bio::seqio To Parse Sequence Data To Or From A String?
Use this code to parse sequence facts from a string:
my $stringfh = new IO::String($string);
my $seqio = new Bio::SeqIO(-fh => $stringfh,
-layout => 'fasta');
even as( my $seq = $seqio->next_seq )
# process every seq
And right here is a way to write to a string:
my $io = IO::String->new($s);
my $seqOut = new Bio::SeqIO(-layout =>'swiss', -fh =>$io);
print $s; # $s carries the record in swissprot format and is stored in the string
Question 39. How Do I Use Bio::index::fasta And Index On Different Ids?
I'm using Bio::Index::Fasta so as to retrieve sequences from my indexed fasta file however I maintain seeing MSG: Did not offer a legitimate Bio::PrimarySeqI object once I call fetch observed via write_seq() on a Bio::SeqIO cope with. Why?
It's probable that fetch didn't retrieve a Bio::Seq object. There are few viable factors however the maximum commonplace purpose is that the identity you are passing to fetch isn't the key to that sequence within the index. For instance12366 and your identity is 12366 then fetch won't discover the series, it expects to peer12366. You want to apply the get_id approach to specify the important thing utilized in indexing, like this:
$inx = Bio::Index::Fasta->new(-filename =>$indexname);
$inx = id_parser(&get_id);
my $header = shift;
$header =~ /^>gi
The same trouble arises whilst you use Bio::DB::Fasta, but if so the code would possibly seem like this:
$inx = Bio::DB::Fasta->new($fastaname, -makeid => &get_id);
Question 40. Cannot Get An Accession From Genbank When I Know It Is There?
I'm using Bio::DB::GenBank to question GenBank and I'm positive that the identity is there however I'm seeing the mistake MSG: acc does now not exist. This worm in variations 1.2 and 1.2.1, however it's miles fixed in 1.2.2. Either improve to at least one.2.2 or higher, or edit the module Bio::DB::GenBank and exchange protein to nucleotide inside the BEGIN block.