Loading…
This event has ended. View the official site or create your own event → Check it out
This event has ended. Create your own
View analytic

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Tuesday, August 1
 

08:30

Registration
Tuesday August 1, 2017 08:30 - 09:00
Graduate School of Management Building, room 309 Volkhovskiy Pereulok, 3, St. Petersburg, Russia

09:30

Opening Ceremony
Speakers
avatar for Alla Lapidus

Alla Lapidus

Professor, St.Petersburg State University


Tuesday August 1, 2017 09:30 - 10:00
Graduate School of Management Building, room 309 Volkhovskiy Pereulok, 3, St. Petersburg, Russia

10:00

From Metagenome Sequencing to Genome Mining for New Antibiotics

Recent studies revealed numerous biosynthetic gene clusters (BGCs) across a wide range of bacterial and fungal species amenable to cultivation. However, little is known about the biosynthetic machinery and natural products produced by uncultivated organisms. I discuss the bottleneck of identifying BGCs coding for Peptidic Natural Products (PNPs) from metagenomics data and argue that the future progress in exploration of antibiotics critically depends on a transition from the current one-off process of PNP analysis to a high-throughput PNP discovery, including PNP discovery from metagenomics data. I further describe recent developments of metaSPAdes, truSPAdes, and 10XSPAdes assemblers that significantly increased the contig lengths and opened the door towards genome mining for PNPs in metagenomics datasets. Finally, I will describe recent advances in computational PNP discovery that span bioinformatics techniques ranging from metagenomics to genome mining to peptidogenomics.


Speakers

Tuesday August 1, 2017 10:00 - 11:00
Graduate School of Management Building, room 309 Volkhovskiy Pereulok, 3, St. Petersburg, Russia

11:00

Use of high throughput DNA sequencing technologies to develop new approaches to study, diagnose and treat autoimmune diseases

The talk will be devoted to the new bioinformatics approaches to diagnose the condition and the study of the history of human autoimmune diseases using Ankylosing spondylitis (Bekhterev's disease) as an example. In addition, we will present a number of different strategies that can be used to develop new  treatments for dealing with autoimmune diseases based on the analysis of the diversity of T- and B-cell receptors.



Tuesday August 1, 2017 11:00 - 11:30
Graduate School of Management Building, room 309 Volkhovskiy Pereulok, 3, St. Petersburg, Russia

11:30

Break
Tuesday August 1, 2017 11:30 - 11:50
Graduate School of Management Building, room 309 Volkhovskiy Pereulok, 3, St. Petersburg, Russia

11:50

Immunoinformatics: Immunology meets Computing

Recent advances in sequencing technologies enabled obtaining data of scanning adaptive immune receptors: antibodies and T-cell receptors. Such progress allowed to state many immunological problems as computational ones and founded a new field in Bioinformatics — Immunoinformatics. The starting problem in this young field is accurate reconstruction of immune receptors repertoire from immunosequencing data. Even the highest-quality immunosequencing data obtained with modern protocols are prone to high error rate. Thus, distinguishing natural diversity of immune receptors from sample preparation errors is a prerequisite for more advanced immunological problems. One of them is evolutionary analysis of antibodies repertoires. An antibody repertoire is the result of a fast evolution that is achieved by various processes of the secondary diversification. As a result of multiple cycles of the secondary diversification, antibody repertoire represents a set of clonal lineages with various abundances. Each such lineage can be viewed as a clonal tree. Construction of clonal trees on antibody repertoire sequences during immune response allows one to detect functional antibodies — typically the most abundant clonal tree is comprised by antibodies specific to invading antigen.

Our immunoinformatics group at Center for Algorithmic Biotechnology (CAB), SPbU has developed the following toolset for immunoinformatics problems. IgReC — a tool for antibody reconstruction from Illumina MiSeq reads; BarcodedIgReC — a modification of IgReC using information about unique molecular identifiers (UMIs); IgQUAST — a tool for quality assessment of antibody repertoire construction; IgDiversityAnalyzer — a tool for analysis of antibody repertoire diversity; AntEvolo — a novel algorithm for construction of clonal trees for antibody repertoires — is currently being developed in CAB.

Speakers
avatar for Andrey Bzikadze

Andrey Bzikadze

Researcher, Center for Algorithmic Biotechnology, St Petersburg State University


Tuesday August 1, 2017 11:50 - 12:10
Graduate School of Management Building, room 309 Volkhovskiy Pereulok, 3, St. Petersburg, Russia

12:10

Genome assembly evaluation with QUAST
Current sequencing technologies and software face many complications that impede reconstruction of full chromosomes. Different assembly programs use various heuristic approaches to tackle these challenges, resulting in many differences in the contigs they output. This cause the need to compare assemblies between each other. In this talk, I present QUAST, a universal toolkit for genome assembly quality assessment and comparison. I briefly discuss key quality metrics that we compute and show functionality of current members of the package: QUAST (regular genome assembly evaluation), MetaQUAST (assessment of metagenomic assemblies), and Icarus (contig alignment browser). I also give few notes about new tool in the QUAST family, which is under development right now. This tool is intended for assessment of large genomes (up to mammalian size).

Speakers

Tuesday August 1, 2017 12:10 - 12:30
Graduate School of Management Building, room 309 Volkhovskiy Pereulok, 3, St. Petersburg, Russia

12:30

SPAdes Toolbox
Despite its central role in genomics, accurate de novo genome assembly remains challenging. Moreover, the proliferation of new sequencing and sample‐preparation technologies introduces additional levels of complications. Recently the SPAdes genome assembler (Bankevich et al., 2012), that was originally conceived as a scalable and easy‐to‐modify platform, was gradually extended into a family of SPAdes tools aimed at various sequencing technologies and applications.

In addition to the constantly updated SPAdes assembler itself, it now includes:
• metaSPAdes assembler for metagenomics data (Nurk et al., 2017)
• rnaSPAdes: de novo RNA‐seq data assembler (Prjibelsky et al., submitted)
• plasmidSPAdes: assembly of plasmids from the whole genome sequencing data (Antipov et al., 2016)
• exSPAnder module for repeat resolution that enables efficient utilization of mate‐pair libraries and even mate‐pairs only assemblies with NexteraMP libraries (Prjibelsky et al., 2014, Vasilinetc et al., 2015)
• hybridSPAdes module for hybrid assembly of accurate short reads with long error‐prone reads, such as Pacific Biosciences and Oxford Nanopore reads (Antipov et al., 2015)
• geneSPAdes tool aimed to accurate reconstruction of biosynthetic gene clusters using their domain structure (Meleshko et al., in preparation)

In this talk I will describe the various tools from SPAdes toolbox.

Speakers
avatar for Anton Korobeynikov

Anton Korobeynikov

Associate Professor, Saint Petersburg State University


Tuesday August 1, 2017 12:30 - 13:00
Graduate School of Management Building, room 309 Volkhovskiy Pereulok, 3, St. Petersburg, Russia

13:00

Lunch

Tuesday August 1, 2017 13:00 - 14:00
Lunch Birzhevaya liniya, 10

14:00

Generation and evaluation of complete genomes from metagenomes
Genomes are an integral component of the biological information about an organism and, logically, the more complete the genome, the more informative it is. Shotgun sequencing of microbial communities could potentially generate complete genomes for some organisms, but this has rarely been undertaken. Here, we describe approaches that can, in some cases, generate complete genome sequences. Using ~7000 published complete bacterial isolate genomes, we benchmark cumulative GC skew as a predictor of overall genome accuracy. We use this to identify likely mis-assemblies in some reference genomes and to confirm the topology of complete genomes from metagenomes. Complete genomes for organisms without isolates will substantially advance metabolic and evolutionary analyses.

Speakers
avatar for Jillian Banfield

Jillian Banfield

Ph.D., UC Berkeley Professor (Earth & Planetary Science Dpt); Lawrence Berkeley National Laboratory Earth Sciences Division staff scientist


Tuesday August 1, 2017 14:00 - 15:00
Graduate School of Management Building, room 309 Volkhovskiy Pereulok, 3, St. Petersburg, Russia

15:00

Assembly of metagenomic (series) data with SPAdes instruments

Metagenomic sequencing has emerged as a technology of choice for analyzing bacterial populations and discovery of novel organisms and genes. While different groups have developed specialized tools for de novo metagenomic assembly, the problem of assembling complex microbial communities is far from being resolved.

First part of this talk talk will be devoted to our metaSPAdes software, which integrated proven solutions from the SPAdes toolkit with metagenomics-specific techniques. We will highlight key differences between SPAdes and metaSPAdes and advertise recently added (or soon to be released) features, e.g. support of third generation sequencing technologies for hybrid metagenomic assemblies.


In the second part, we will present our novel (yet unpublished) pipeline for improved reconstruction of individual organisms from (time or spatial) metagenomic series. While availability of sequencing data for multiple related samples provides an unprecedented opportunity for the accurate reconstruction of individual microbial community members, widely used approaches demonstrate some major deficiencies and limitations. In an attempt to overcome those, we developed MTS (Metagenomic Time Series) pipeline that integrates state-of-the art differential binning approaches with valuable (and largely underappreciated) ideas from early works on metagenomic series analysis.


Speakers
avatar for Sergey Nurk

Sergey Nurk

Researcher, Center for Algorithmic Biotechnology, St. Petersburg State University


Tuesday August 1, 2017 15:00 - 15:20
Graduate School of Management Building, room 309 Volkhovskiy Pereulok, 3, St. Petersburg, Russia

15:20

Simulating microbial communities
Today, a multitude of different sequencing technologies is available, all of which use different means to obtain the sequence information and therefore have different read lengths, parameters and specific error profiles. Since metagenomic studies involving whole-genome sequencing are still quite expensive, it is crucial to choose the right experimental setup and parameters. Testing the planned setup before doing the actual experiment can help a lot in saving money and designing a better experiment. To facilitate this, we developed an extendable and flexible simulation pipeline which is able to simulate arbitrary complex metagenomic data sets from just a 16S profile. We include read simulators for the most common sequencing technologies, support of multiple samples or communities as well as providing a ground truth for assemblers, binners and profilers which subsequently can be tested against. This pipeline was already successfully used in creating the data sets for the CAMI challenge (http://biorxiv.org/content/early/2017/01/09/099127). To prove the usefulness of such a pipeline beyond CAMI, we decided to create data sets aimed specifically at answering two questions: How do coverage/sequencing depth and the presence of closely related strains affect assembly quality? To answer these questions, we created a considerable number of small datasets with varying coverage and average nucleotide identity (ANI) values. The findings we made had been discussed in the metagenomics community before, but the extremely controlled environment of both experiments pinpoint the weaknesses of current metagenomic assemblers at very high and low coverages as well as in the presence of strains related with more than 97% ANI.

Speakers

Tuesday August 1, 2017 15:20 - 15:40
Graduate School of Management Building, room 309 Volkhovskiy Pereulok, 3, St. Petersburg, Russia

15:40

Break
Tuesday August 1, 2017 15:40 - 16:00
Graduate School of Management Building, room 309 Volkhovskiy Pereulok, 3, St. Petersburg, Russia

16:00

Oligo-Designer: rational design for easy gene construction and high protein expression

Each biotech company has to create new genetic constructions and express proteins. The most popular gene synthesis method is PCR-based assembly from a set of oligonucleotides. To increase synthesis purity most scientists often design oligonucleotides intuitively with one or a number of heuristics, but this often leads to loss of protein expression.

 In this work we present Oligo-Designer (a part of BIOCAD YLab computational platform) — a novel oligonucleotides design software that optimizes the probability of successful gene synthesis and protein expression in specified cell culture. We use genetic algorithm and a set of rational physics-based metrics to solve common problems like hairpins, low binding energy and oligonucleotides cross-reactivity. This helps us to obtain an optimum set of oligonucleotides. The algorithm can be used both in single gene or library mode. Using this algorithm in 2016 we created 466 gene constructions (exact or libraries) with average expression after transient transfection between 150 and 200 mg/l. We also present a fully-automated synthesis pipeline using Biosset and Tecan hardware.


Speakers
avatar for Pavel Yakovlev

Pavel Yakovlev

Director of Computational Biology Department, BIOCAD


Tuesday August 1, 2017 16:00 - 16:40
Graduate School of Management Building, room 309 Volkhovskiy Pereulok, 3, St. Petersburg, Russia

16:40

EDEN: Evolutionary Dynamics within Environments

Metagenomics revolutionized the field of microbial ecology, giving access to Gb-sized datasets of microbial communities under natural conditions. This enables fine-grained analyses of the functions of community members, studies of their association with phenotypes and environments, as well as of their microevolution and adaptation to changing environmental conditions.
Phylogenetic methods for studying adaptation and evolutionary dynamics are not able to cope with big data. Calculating the dN/dS ratio for the large-scale sequence data sets that are being generated in metagenomics and comparative microbial genomics is very challenging, due excessive run times of current methods. 
EDEN is the first software for the rapid detection of protein families and regions under positive selection, as well as their associated biological processes, from meta- and pangenome data. It provides an interactive result visualization for detailed comparative analyses. EDEN is available as a Docker installation under the GPL 3.0 license, allowing its use on common operating systems, at http://www.github.com/hzi-bifo/eden
We applied EDEN to 66 samples of the HMP project from six body sites sampled from healthy individuals. Across all body sites, most protein families with significant signs of positive selection in comparison to all other protein families were annotated with transport and binding functions, suggesting the existence of a functional pan-selectome. We also used EDEN to characterize human gut metagenome samples. EDEN determined a significantly higher dN/dS ratio for the protein coding genes from lean individuals compared to overweight and obese individuals, suggestive of a higher functional diversity in the guts of lean individuals.


Speakers

Tuesday August 1, 2017 16:40 - 17:00
Graduate School of Management Building, room 309 Volkhovskiy Pereulok, 3, St. Petersburg, Russia

17:00

Antibiotic associated dysbiosis and it correction by the indigenous bacteria
Human microbiota is a complex consortium of microorganisms (archaebacteria, bacteria, viruses, fungi) involved in the proper functioning of almost every system of the organism. Dysbiotic condition or dysbiosis is a key pathogenic condition causing many severe infectious or non-infectious diseases. It well established that the most common cause of dysbiosis are antibiotics killing indigenous bacteria. And fast return to the original microbiota in many cases leads to the fast recovery from the disease. However, the optimal way of the treatment of dysbiosis is still under the discussion. Probiotics may be helpful in many situations, however, in spite of the fairly long history of the probiotic usage they are not always effective. In present study we tried to evaluate a novel technology – autoprobiotic bacteria for the treatment of the antibiotic induced dysbiosisemploying the rat model of antibiotic induced dysbiosis.  Six experimental groups of animals after taking antibiotics were treated with different variants of the indigenous bacteria prepared for each of them before the development of dysbiosis.The groups included indigenous strains of  bifidobacteria, lactobacilli, enterococci, their mixture, feces, and anaerobically grown fecal bacteria. After having autoprobiotics for five days animals were sacrificed and studied according to the broad number of parameters including metagenomics study. Interestingly, after antibiotics all the animals developed almost identical dysbiotic condition which was characterized by dramatic increase of gammaproteobacteria. However, after different kinds of autoprobiotics the matagenomic data, data of bacteriological analysis, gut epithelium morphology and immunological parameters differed significantly. The data obtained are discussed. The study was supported by RSF grant 16-15-10085.

Speakers

Tuesday August 1, 2017 17:00 - 17:30
Graduate School of Management Building, room 309 Volkhovskiy Pereulok, 3, St. Petersburg, Russia

17:30

Poster Session
Tuesday August 1, 2017 17:30 - 18:30
Graduate School of Management Building, room 309 Volkhovskiy Pereulok, 3, St. Petersburg, Russia

18:45

Conference Dinner

Tuesday August 1, 2017 18:45 - 23:00
Pier Universitetskaya naberezhnaya, 13, Saint Petersrbug, Russia
 
Wednesday, August 2
 

09:00

Analysis of Nanopore Sequencing Data
The Oxford Nanopore MinION sequences DNA and RNA by measuring the disruption of electric current caused by the molecule passing through a protein nanopore embedded in a membrane. These devices are portable, can sequence extremely long reads, and are sensitive to base modifications like 5-methylcytosine. In my talk I will discuss methods for analysing the current signals measured by these devices including the key developments in basecalling and my group's work on calculating a consensus sequence for an assembled genome and detecting base modifications.

Speakers
avatar for Jared Simpson

Jared Simpson

Ph.D., Principal Investigator, Ontario Institute for Cancer Research Assistant Professor, Department of Computer Science, University of Toronto


Wednesday August 2, 2017 09:00 - 10:00
Graduate School of Management Building, room 309 Volkhovskiy Pereulok, 3, St. Petersburg, Russia

10:00

Real-time nanopore sequencing and data analysis: in the field and by the patient
In this talk I will examine the role of portable, real-time genome sequencing for the diagnosis and surveillance of infectious diseases. I will focus on the Oxford Nanopore MinION single molecule nanopore sequencing instrument for understanding the evolution and biology of pathogens. Since its release in mid-2014, this instrument has seen rapid platform improvements and is now capable of generating ~10 gigabases per run with a read error rate of ~10%. During the 2013-2016 Ebola epidemic we deployed nanopore sequencing to West Africa to track Ebola virus evolution. In 2016, in response to the Zika epidemic in the Americas we established a mobile sequencing laboratory that travelled through Brazil to understand the spread of Zika. Initially we have focused on virus applications but as the output of the nanopore sequencer has increased, bacterial whole genome assembly has become routine, even in field situations. Recently we were part of a group that sequenced a whole human genome on the MinION and developed a new protocol for generating up to 1 megabase single reads, significantly reducing complexity of de novo assembly. The nanopore platform permits detection of methylation and base modifications and can also sequence RNA directly, important in pathogen biology. I will discuss the bioinformatics challenges associated with working on this platform and the opportunities for near-patient ubiquitous genome sequencing on our ability to fight infectious diseases.

Speakers
avatar for Nick Loman

Nick Loman

Ph.D., Independent Research Fellow, Institute of Microbiology and Infection, University of Birmingham


Wednesday August 2, 2017 10:00 - 11:00
Graduate School of Management Building, room 309 Volkhovskiy Pereulok, 3, St. Petersburg, Russia

11:00

Break
Wednesday August 2, 2017 11:00 - 11:20
Graduate School of Management Building, room 309 Volkhovskiy Pereulok, 3, St. Petersburg, Russia

11:20

Direct RNA and cDNA sequencing of C. elegans transcripts
Directly sequencing RNA, now possible through nanopore sequencing technologies, has the potential to improve upon cDNA sequencing efforts by streamlining library preparation, reducing batch effects, PCR bias, and template switching, and allowing for the direct detection of RNA modifications and polyadenylation status. We sought to benchmark direct RNA sequencing approaches against cDNA-based approaches by sequencing in parallel protein-coding RNA transcripts from wild-type C. elegans. We compared data quality, transcript abundance and full-length transcript recovery between the two sample preparation approaches. We tested multiple split read aligners (Exonerate, GMAP, LAST) to compare accuracy, identity and splice variant detection between them. We also assessed the extent to which direct RNA sequencing was able to detect known 3’ UTR isoforms and poly(A) tails, of biological interest due to their conserved and vital roles in post-transcriptional gene regulation, and validated splice variant detection using a known splicing mutant. Together, our results yardstick the sensitivity and accuracy of feature detection between direct RNA and cDNA sequencing.

Speakers
RW

Rachael Workman

Research specialist, Johns Hopkins University, Department of Biomedical Engineering


Wednesday August 2, 2017 11:20 - 11:40
Graduate School of Management Building, room 309 Volkhovskiy Pereulok, 3, St. Petersburg, Russia

11:40

Metagenome assembly from synthetic long reads using de Bruijn graphs
Despite rapid progress in sequencing technologies metagenome assembly remains challenging: short reads result in fragmented assemblies while long reads (e.g., reads generated using Pacific Biosciences or Oxford Nanopores technologies) remain expensive for metagenomics applications. Although the recently introduced 10X GemCode Synthetic Long Reads (SLR) technology promises to become a step forward in metagenome assembly, it faces a number of computational challenges. In this talk, we will present cloudSPAdes algorithm for metagenome assembly from GemCode SLRs that, in difference from previous approaches to metagenome SLR assembly, utilizes the de Bruijn graph for resolving repeats and strain variations. We will demonstrate that cloudSPAdes results in superior assemblies than existing state-of-the-art SLR assembly approaches.

Speakers

Wednesday August 2, 2017 11:40 - 12:00
Graduate School of Management Building, room 309 Volkhovskiy Pereulok, 3, St. Petersburg, Russia

12:00

Integration of sequencing instruments and bioinformatics to accelerate analysis and interpretation
Speakers
LS

Laurent Spiess

Executive Account Manager | Enterprise Informatics EMEA | illumina, Paris


Wednesday August 2, 2017 12:00 - 12:20
Graduate School of Management Building, room 309 Volkhovskiy Pereulok, 3, St. Petersburg, Russia

12:20

Lunch

Wednesday August 2, 2017 12:20 - 13:30
Lunch Birzhevaya liniya, 10

13:30

Genome 10K Project at the Dobzhansky Center in St. Petersburg

The Genome10K project was begun in 2009 by an international consortium of biologists and genome scientists determined to facilitate the whole genome sequence and analyses of 10,000 vertebrate species.  Since then the number of species selected and accomplished has risen from ~30 to over 350 species sequenced or ongoing with funding, over 1000% increase in eight years.  I shall summarize the advances and responsibilities that have occurred to date  and lay out the achievements and present challenges of reaching the goal.  I shall review  the status of known vertebrate genome projects, recommend standards for pronouncing a species genome as sequenced or completed, and provide a present and future view of the landscape of Genome 10K.

At the Theodosius Dobzhansky Center  for Genome Bioinformatics,  we have contributed  with a the comparative analyses of 12 of the 38 living species of Felidae, a remarkable example of worldwide species radiation and adaptation to various environments.  Our study included analyses of  genome sequence of : lion (Panthera leo), tiger (Panthera tigris), snow leopard (Panthera uncia), leopard (Panthera pardus), jaguar (Panthera onca), caracal (Caracal caracal), lynx (Lynx lynx), Asian leopard cat (Prionailurus bengalensis), fishing cat (Prionailurus viverrinus), puma (Puma concolor), cheetah (Acinonyx jubatus), and domestic cat (Felis catus) - coverring six lineages of the family (Panthera, Caracal, Lynx, Asian leopard cat, Puma, and Domestic cat). For each, whole-genome assembled sequence was assessed and annotated including genes, repeats, and variants and other features . A structural alignment of the genomes was performed to identify homology and rearrangements between them. Homozygosity regions were determined based on single nucleotide variants called in the sequenced specimens. Differences and similarities between the annotated genomes are interpreted in terms of the evolutionary process that took place 10.8 million years ago and initiated branching from the last common Felid ancestor.

The Genome 10K endeavor is ambitious, bold, expensive and uncertain, but together the Genome 10K Consortium of Scientists (G10KCOS) and the world genomics community are moving deliberately toward their goal of delivering to the coming generation a gift of genome empowerment for many vertebrate species. 



Wednesday August 2, 2017 13:30 - 14:30
Graduate School of Management Building, room 309 Volkhovskiy Pereulok, 3, St. Petersburg, Russia

14:30

Correlation analysis of microbial metacommunity in lake Baikal based on data of high throughput sequencing
Bacteria and unicellular eukaryotes coexist in aquatic ecosystems by forming networks of interspecies relationship. The use of high throughput sequencing for the analysis of the composition of microbial communities in combination with statistical methods makes it possible to reveal the features of the functioning of microbial communities. The work aims at determining relationship between the representatives of microbial metacommunity and environmental parameters of photic layer in Lake Baikal. Water samples were collected in the upper layer (0-25 m) across 30 stations in Lake Baikal in early June 2012. The total DNA was extracted from the samples, amplification of the V3-V4 region of the 16S rRNA gene and V3 region of the 18S rRNA gene was performed. The amplicons were sequenced, using GS FLX 454 genomic sequencer (Roche, USA) (LIN SB RAS). The results of pyrosequencing were analyzed using Mothur 1.19.0. The analysis revealed 867 operating taxonomic units (OTU) at the level of genetic distance 0.03 for Bacteria, and 2442 OTUs of Eukaryota. The obtained OTU were identified with the usage of SILVA and NCBI databases. To analyze the relationship between metacommunity components and the environment, a correlation analysis was performed between the number of OTU sequences and physical and chemical parameters. Most of strong (r≥0.5) significant (α = 0.05) correlations in bacterial communities referred to OTUs of Actinobacteria and Bacteroidetes. Autotrophic and unclassified eukaryotes of OTUs had most of strong significant correlations (r≥0.45, α = 0.05) in eukaryotic communities. Positive correlations were primarily found between representatives of either Bacteria or Eukaryota, while positive and negative correlations were found between both Bacteria and Eukaryota (r≥ | 0.45 |, α = 0.05). A small group of taxa correlated with the lake’s environmental parameters (NO3, PO4, Si, O2 and temperature). As such, the structure of Lake Baikal’s microbial metacommunity is mainly influenced by relationship between different groups of microorganisms in spring, and to a less degree by abiotic parameters. Probably, this is caused by constant habitat conditions in the lake during the period under study. The study was carried out as part of FASO topic No. 0345-2016-0005 "Experimental studies of genomes and proteomes of biota of freshwater ecosystems".

Speakers

Wednesday August 2, 2017 14:30 - 14:50
Graduate School of Management Building, room 309 Volkhovskiy Pereulok, 3, St. Petersburg, Russia

14:50

Metagenomic exploration of horizontal gene transfer events and phage infections in a South African deep subsurface bacterial population
In deep subsurface environments viruses in general have been shown to play a role in altering the biogeochemical cycles, microbial diversity profiles and their genetic contents. The role of phages in deep marine and terrestrial environments has been rarely considered and has therefore caused an interest in recent research. In this study, the main objectives were to identify phage genes and the presence of horizontally acquired genes, in the South African subsurface, using bioinformatics approaches to infer their effect on the deep subsurface bacterial communities in terms of evolution and survival. Sampling of fracture water from the South African deep subsurface resulted in identification of phages belonging to the Myoviridae and Podoviridae using TEM analysis. Whole metagenome sequencing of the fracture water microbial population detected phages belonging to the order Caudovirales with the Siphoviridae family being the most abundant. The presence of the Myoviridae and Podovirdae families further confirmed the phage characterization using TEM. The majority of the identified phage sequences were from phages that infect hosts from the phylum Proteobacteria which is the most abundant phyla in the fracture water according to the metagenomics diversity. Partially complete prophages were detected and annotated. The presence of prophages indicated that phages in this environment can be both lytic and lysogenic. Horizontal gene transfer (HGT) was studied by focusing on genomes from binned Proteobacteria. CRISPRs, mobile/transposable elements, transposase and retrons were detected within the binned metagenome data suggesting possible phage mediated HGT events. Specific gene products of HGT events were identified as part of the nitrogen fixation pathway, cobalamin synthesis, sulfide reduction pathways as well as motility and sporulation. This indicates that HGT and viral infections are prevalent evolution events in the studied population as these events would confer novel capabilities to the host for survivability and evolution in the extreme deep subsurface environment.


Wednesday August 2, 2017 14:50 - 15:10
Graduate School of Management Building, room 309 Volkhovskiy Pereulok, 3, St. Petersburg, Russia

15:10

Break
Wednesday August 2, 2017 15:10 - 15:30
Graduate School of Management Building, room 309 Volkhovskiy Pereulok, 3, St. Petersburg, Russia

15:30

Machine learning identifies unique taxa differentiating proximal and distal human colonic microbiota
Colorectal cancer (CRC) remains a leading cause of death worldwide. Tumors of the proximal (right) and distal (left) colon are morphologically and genetically distinct. Previous work from our group found that microbial dysbiosis is associated with the development of colorectal cancer tumors in studies of both mice and humans. Analysis of the fecal microbiota from healthy and CRC patients further revealed different microbial signatures associated with disease. In this study, we extended our observations of the fecal microbiome to analysis of the proximal and distal human colon. We used a two-colonoscope approach on subjects that had not undergone standard bowel preparation procedure. This technique allowed us to characterize the native proximal and distal luminal and mucosal microbiome without prior chemical disruption. 16S rRNA gene sequencing was performed on proximal and distal mucosal biopsies, luminal and exit stool for 20 healthy individuals. Diversity analysis of each location revealed that each site contained a diverse community, and that a patient’s samples were more similar to each other than to that of other individuals. Since we could not differentiate sites along the colon based on community structure or community membership alone, we employed the machine-learning algorithm Random Forest to identify key species that distinguish biogeographical sites. Random Forest classification models were built using taxa abundance and sample location and revealed distinct populations that were found in each location. Peptoniphilus, Anaerococcus, Enterobacteraceae, Pseudomonas and Actinomyces were most likely to be found in mucosal samples versus luminal samples (AUC = 0.925). The classification model performed well (AUC = 0.912) when classifying mucosal samples into proximal or distal sides, but separating luminal samples from each side proved more challenging (AUC = 0.755). The left mucosa was found to have high populations of Finegoldia, Murdochiella and Porphyromonas. Proximal and distal luminal samples were comprised of many of the same taxa, likely reflecting the fact that stool moves along the colon from the proximal to distal end. Finally, comparison of all samples to fecal samples taken at exit uncovered that the feces were most similar to samples taken from the left lumen, again reflecting the anatomical structure of the colon. Taken together, our results have identified distinct bacterial populations distinct of the proximal and distal colon. Further investigation of these bacteria may elucidate if and how these groups contribute to differential oncogenesis processes on the respective sides of the colon.

Speakers
avatar for Kaitlin Flynn

Kaitlin Flynn

Postdoctoral Fellow, University of Michigan Medical School


Wednesday August 2, 2017 15:30 - 15:50
Graduate School of Management Building, room 309 Volkhovskiy Pereulok, 3, St. Petersburg, Russia

15:50

"Mutation grade" bacterial sequences: the Bacillus cereus group as an example
Analysis of point mutations in bacteria using NGS technologies needs very precise sequence of wild type (WT) strains to be used as templates for the variant analysis. Our experience in analysis of mutations in several strains of the Bacillus cereus group (B. cereus sensu stricto, B. thuringiensis, B. weihenstephanensis) indicated that the available Genebank sequences, produced with different technologies, are generally unsuitable for this purpose. Thorough scrutiny of the WT strain sequence must be done in parallel. As an example the available sequence of the B. cereus ATCC14579T, produced by Sanger technology, contains about 2000 sequencing errors. Read alignment with more recent sequencing data, like that of B. thuringiensis 407, produced using 4-5-4 technology, produces about 200 variations that completely hampers direct mutation analysis. The most suitable sequence that we analyzed was that of B. weihenstephanensis KBAB4 strain containing to our estimation about 20 errors. The available sequence of this strain, mainly assembled after Sanger sequencing, was refined before submission with Illumina technology. We therefore propose a concept of so-called "mutation grade" sequence that is the one containing less than 1-2 errors per Mb (0.0001%) in the whole resolvable area and thus directly suitable for point mutation mapping. Our experience in generating such sequences and point mutation analysis for the B. cereus ATCC14579T and a conjugation competent B. weihenstephanensis strains will be presented in this talk. It must be noted that generation of absolutely error-free sequence is impossible in practice, even for bacteria. This is mostly due to "unresolvable" locations in the genomes, usually containing complex repetitive sequences. Such locations are specific for each bacterial group and should be taken into account during variation analysis. The mutations in B. cereus ATCC14579 were mapped in the contexts of selection of psychrotolerant clones and conjugation experiments. Our study demonstrates that the level of sequence variation existing between strains, used in different laboratories or produced in "blind" experiments, like conjugation, is about 2-4 mutations/genome. Although looks negligible, such variation level can be frustrating and leading to misinterpretation of data when analyzing the bacterial sequences with so high precision. The sequence of conjugation competent B. weihenstephanensis strain was completed using Illumina technology, different assemblers, including spades that produced the minimal number of contigs, and subsequent semi-manual sequence verification assisted with complete KBAB4 sequence for gap closure. Mapping of "blind" mutations accumulated during conjugation experiments showed that the produced sequence of this strain is of "mutation grade" quality. The work was partially supported by the National French Research Agency (ANR project PathoBactEvol).

Speakers

Wednesday August 2, 2017 15:50 - 16:10
Graduate School of Management Building, room 309 Volkhovskiy Pereulok, 3, St. Petersburg, Russia

16:10

Plasmid Diversity from Denovo Assembly in Salmonella Typhimurium
Salmonella enterica serovar Typhimurium is the most prevalent serotype of Salmonella associated with disease. It is the most common serovar in zoonotic reservoirs for human infection and the environment and contains a number of variants able to infect multiple animal hosts and humans, as well as others that have become highly restricted to a single host. There is also a variation in levels of antimicrobial resistance and virulence often facilitated through horizontally acquired or mobile elements leading to differential impact on food safety and human health from different lineages. These factors make Typhimurium an excellent focus for the study of the evolution of pathogenesis. The whole genome sequence of strains representing much of the genotypic diversity of Typhimurium were investigated to identify microevolution associated with distinct epidemiological features. Short-read sequences were used to construct a phylogeny to which Bayesian methods were applied to define clades. Long-read sequence assemblies were then generated for representative isolates of each clade to create a series of reference sequences. Extra-chromosomal genome sequences from the references were then investigated in detail to determine the makeup and diversity of plasmid sequences. These findings were then compared with the De novo assembly of plasmid sequences from short-read data using plasmidSPAdes to produce a cohort of sequences better representing the full diversity exhibited in Salmonella Typhimurium. We report on the full diversity of plasmid variation within Typhimurium and describe the incidence of antibiotic resistance and virulence gene transfer within clades and at the serovar level. Differences in gene content are analysed and compared with phenotypic variation of strains and discussed with reference to epidemiological consequences. Long-read sequencing is a useful tool to accurately determine the nucleotide sequence and gene-content of large plasmids. However, new algorithmic methods available are able to capture a large degree of plasmid sequence from less expensive and more routinely available short-read data. The combination of these methods and subsequent analysis has valuable consequences for epidemiological surveillance.

Speakers
avatar for Matt Bawn

Matt Bawn

Earlham Institute
I began my undergraduate studies studying physics at the University College of Wales Aberystwyth. After this I then worked in industry for a number of years in the field of high-field superconducting magnet manufacture, before returning to university in 2007 to undertake postgrad... Read More →


Wednesday August 2, 2017 16:10 - 16:30
Graduate School of Management Building, room 309 Volkhovskiy Pereulok, 3, St. Petersburg, Russia

17:00

Museum
Wednesday August 2, 2017 17:00 - 18:00
SPbU Museum 7-9 Universitetskaya Nabereznaya, Saint Petersburg, Russia
 
Thursday, August 3
 

09:00

What’s old is new again: assembly and alignment for the long-read era

Computational methods that were once successful for capillary sequencing have not worked well for massively parallel short-read sequencing. This sparked a flurry of new short-read mapping and assembly methods. More recently, long-read sequencing technologies from Pacific Biosciences and Oxford Nanopore have emerged, producing extremely long, but noisy, reads. Again, this fundamental shift in data type has required new computational methods for routine bioinformatics tasks, but is also creating many new opportunities. I will discuss applications of long-read sequencing to the problems of genome assembly, alignment, and metagenomics; including the possibility of complete, haplotype-resolved vertebrate genomes and real-time analysis of complex metagenomic samples


Speakers
avatar for Adam Philippy

Adam Philippy

Ph.D., Investigator, Head, Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, NIH


Thursday August 3, 2017 09:00 - 10:00
Graduate School of Management Building, room 309 Volkhovskiy Pereulok, 3, St. Petersburg, Russia

10:00

Some components for assembling large genomes and metagenomes
I will talk about the integration of some recent algorithms into software that address the challenges of large genomes and metagenomes de novo assembly. In particular, I will highlight (1) a minimal perfect hashing technique that is capable of indexing billions of elements quickly and in low memory, (2) an efficient unitig graph construction software (BCALM 2), and (3) recent developments in the Minia 3 assembler regarding multi-k contigs assembly that draw inspiration from the SPAdes assembler. These components are integrated into a software pipeline called Minia-pipeline, which recently provided high-ranking assemblies in the Critical Assessment of Metagenomic Interpretation challenge. References and software: Chikhi R et al, in preparation. https://github.com/GATB/gatb-minia-pipeline Rizk G et al, in submission. https://github.com/rizkg/BBHash Chikhi R et al, ISMB 2016. https://github.com/GATB/bcalm Sahlin et al, Bioinformatics 2016. https://github.com/ksahlin/BESST

Speakers

Thursday August 3, 2017 10:00 - 10:20
Graduate School of Management Building, room 309 Volkhovskiy Pereulok, 3, St. Petersburg, Russia

10:20

Algorithmic challenges in de novo transcriptome and metatranscriptome assembly
Possibility to generate huge amounts of RNA-Seq data created a demand for novel tools capable of analyzing large transcriptomic data sets. While reference-based transcriptome analysis prevails in medical studies, multiple research projects require de novo transcriptome assembly tools. Due to varying expression levels across different genes, RNA-Seq data sets are characterized by highly uneven coverage depth, which appears to be one of the challenges in the de novo transcriptome assembly. Since recently developed SPAdes assembler successfully addresses this problem we have decided to expand its functionality to enable high-quality transcriptome and metatranscriptome assemblies from short reads. We present rnaSPAdes — a new SPAdes-based RNA-Seq assembler, demonstrate its application on various datasets and compare with modern state-of-the-art transcriptome assembly tools. Also we show some algorithmic and biology challenges of metatranscriptome assembly and how rnaSPAdes copes with these problems and construct accurate reliable transcripts from different dissimilar organisms.


Thursday August 3, 2017 10:20 - 10:40
Graduate School of Management Building, room 309 Volkhovskiy Pereulok, 3, St. Petersburg, Russia

10:40

Break
Thursday August 3, 2017 10:40 - 11:00
Graduate School of Management Building, room 309 Volkhovskiy Pereulok, 3, St. Petersburg, Russia

11:00

Dealing with mass of genomic data. From optimized data structures to advanced memory architectures

Raw data generated by sequencing machines represent hundreds of Giga bytes of information that are systematically processed to extract useful information. However, this mass of genomic data contains a lot of redundancy that can be captured by optimized data structures, such as the de-Bruijn graph, allowing the full information to fit into a standard computer memory. Alternatively, dedicated memory architectures are another possibility to quickly process this mass of information. The talk will discuss these two alternatives by first presenting an optimized implementation of the de-Bruijn graph and an associated tool box called GATB (Genomic Analysis Tool Box). In the second part, we will introduce the PIM (Processing in Memory) concept and present preliminary results on genomic applications using a PIM chip currently developed by the UPMEM company


Speakers
avatar for Dominique Lavenier

Dominique Lavenier

Ph.D., Research Director at Centre National de la Recherche Scientifique


Thursday August 3, 2017 11:00 - 12:00
Graduate School of Management Building, room 309 Volkhovskiy Pereulok, 3, St. Petersburg, Russia

12:00

Analytic combinatorics for bioinformatics: applications to seeding and genome assembly
One of the main challenges of modern bioinformatics is to meet the ever-growing demand for sequencing-based analyses. Most of the sequenced reads are aligned to a reference genome, explaining why large efforts have been invested into developing efficient alignment algorithms. The algorithms rely on seed-based heuristics, where short regions of near-identity are used to rapidly zoom in on candidate targets. This makes the algorithms faster, but it also makes them inexact, creating a risk of missing the best hit. Surprisingly, the chance that sequencing reads contain seeds of different lengths is not part of the common knowledge in bioinformatics. The answer depends on the sequencing technology so the problem has so far remained without a general solution. Meanwhile, the rapidly evolving field of “analytic combinatorics” has initiated a radical shift towards answering such problems. The principle is to use symbolic manipulations to construct a mathematical function encapsulating the solution to the problem, instead of using standard approximations. Using analytic combinatorics, I provide a practical solution to estimate the probabilities of occurrences of seeds for arbitrary error models. With this knowledge, I propose a new way to calibrate seeding heuristics, giving more accurate mapping qualities and allowing better rationalization of the mapping process. The approach also works for all including empirical error models and for inexact seeds with up to one error. The analytic combinatorics approach also has applications to the assembly problem, where it provides very accurate estimates for the expected contig number and contig size, even in regimes where the classical estimators fail. Overall, this work shows how to use the novel and fruitful approach of analytic combinatorics to solve some modern problems in bioinformatics.

Speakers

Thursday August 3, 2017 12:00 - 12:20
Graduate School of Management Building, room 309 Volkhovskiy Pereulok, 3, St. Petersburg, Russia

12:20

Concentrate – graphical tool for feature scale genomic data analysis
Numerous tools for genomics data visualization exist to the moment both web-based (UCSC Genome Browser, Ensembl Genome Browser) and standalone (Integrative Genomics Viewer, Artemis, Tablet, etc.). All tools share the same principle of displaying genomic data in linear coordinates of reference genome providing user with pan and zoom controls for navigation. Being the most straightforward this classical approach suits well for operation with low-level sequencing data (e.g. BAM files) but can’t readily support scenarios emerging when working with genome annotation data and sequencing result interpretation. Genomic data is sparse meaning that elements that are of interest to the researcher (e.g. potentially disease causing variants) may be separated by thousands and millions of bases or located at different contigs. Often exact annotation feature location is of less importance than presence and number of such features (as in case of search for monogenic recessive disorders in sequencing data where presence of two pathogenic variants means potential disease) or feature relations such as intersections of genetic variants and protein functional sites. These data features are hard to discover with pan and zoom approach and difficult to visualize in linear scale. We propose different approach to genomic data visualization that uses element interaction event as graphical scale unit. Until element doesn’t interact (overlaps, covers, etc.) with other elements its visual size equals one unit and every interaction site adds one more unit to its size. Physical size scale that is stretched and shrunk according to visual elements size is used to preserve information of physical size of the objects being displayed. This ensures the most efficient use of screen space in terms of object density and makes elements interaction events straightforward to detect. We implement this principle in Concentrate – an open-source application that visualizes genomic data. In addition to interaction-based scaling it provides intra- and inter track filtering capabilities with data-type based attribute discovery and rich elements interaction based on logical operators. As an example it can be used to restrict genetics variants viewing to ones that has frequency less than 0.05 or are annotated as pathogenic and intersects exons of CFTR gene where gene data came from BED file and variation data from unrelated VCF file. Such capabilities enables deep data analysis in visualization software without the use of external tools for filtering and region manipulation such as vcftools or bedtools, which currently is not the case for other genomic browsers. Concentrate is created with Java and JavaScript and distributed as single jar file that can run on any machine with modern web-browser and Java installed. It’s based on client-server architecture and can be run both locally and as a service. Source code is licensed under GNU Affero GPLv3 and is available on GitHub.

Speakers
avatar for Anton Bragin

Anton Bragin

Head of Bioinformatics Department, Parseq Lab



Thursday August 3, 2017 12:20 - 12:40
Graduate School of Management Building, room 309 Volkhovskiy Pereulok, 3, St. Petersburg, Russia

12:40

InFusion: advancing discovery of fusions genes and chimeric transcripts from RNA-sequencing data
Functional fusion genes and chimeric transcripts have been shown to occur in cancers due to genomic rearrangements as well as in non-cancerous cells due to trans-splicing or transcriptome machinery failure. Careful deactivation of fusions can stop further arrangement and growth of cancer. Therefore correct and detailed detection of fusions is important in scientific research and in precision medicine as well. RNA sequencing appeared to be an effective method for discovery of fusions. We have designed and implemented a novel toolkit called InFusion for chimeric transcript discovery from RNA-seq data. In comparison to other existing methods, our approach introduces several unique features such as discovery of fusions involving intergenic regions and detection of anti-sense chimeras based on the strand-specificity of the sequencing library. Additionally, the toolkit includes several advanced post-analysis steps such as comparison of results among well-known exiting tools and design of sequences for further experimental validation. Using simulated and public data we demonstrated that InFusion has superior detection sensitivity and high specificity compared to other existing methods. Additionally the toolkit was able to discover a wider spectrum of fusion events that can occur in the transcriptome. To further confirm this we also performed deep RNA sequencing of two prostate cancer cell lines. From this experimental data analysis we discovered in-silico and verified in-vitro 26 novel fusion events, including alternatively spliced fusion isoforms and chimeric RNAs involving non-exonic regions. Moreover, we confirmed four fusions that involve intergenic regions. To our knowledge, discovery of such events has not been addressed previously, despite their potential to encode functional proteins or regulate gene transcription. The detailed landscape of the chimeric RNAs, mechanisms underlying their genesis and their functional roles are yet to be studied. InFusion may prove to be a useful tool for detecting the whole scope of possible events. The manuscript describing the method is published in PLOS One and the open-source software toolkit is available for download at: http://bitbucket.org/kokonech/infusion


Thursday August 3, 2017 12:40 - 13:00
Graduate School of Management Building, room 309 Volkhovskiy Pereulok, 3, St. Petersburg, Russia

13:00

Lunch

Thursday August 3, 2017 13:00 - 14:00
Lunch Birzhevaya liniya, 10

14:00

Studying chromatin 3D structure in eukaryotes
Recent advances enabled by the Hi-C technique had unraveled many principles of chromosomal folding that have been subsequently linked to disease and gene regulation. In particular, Hi-C revealed that chromosomes of mammals and fruit flies are organized into Topologically Associating Domains (TADs), evolutionarily conserved compact chromatin domains that influence gene expression. However, we still know remarkably little about the mechanism of TAD formation and chromatin organization in general. In my talk, I will address several questions: (i) the effect of lamin depletion, followed by chromatin detachment from nuclear lamina, on fine chromatin architecture in Drosophila melanogaster; (ii) the hypothesis that the mechanism of TAD self-assembly is based on the ability of nucleosomes from inactive chromatin to aggregate, and lack of this ability in acetylated nucleosomal arrays; (iii) the principles of chromosomal folding in a popular model organism, soil-living amoeba Dictyostelium discoideum.

Speakers
avatar for Ekaterina Khrameeva

Ekaterina Khrameeva

Skolkovo Institute of Science and Technology


Thursday August 3, 2017 14:00 - 15:00
Graduate School of Management Building, room 309 Volkhovskiy Pereulok, 3, St. Petersburg, Russia

15:00

Evolutionary Genomics of Antibiotic Resistance
Antibiotic discovery needs not only novel compounds with antibacterial activity but also tools to assess and prioritize new drug candidates by the likelihood of acquired drug resistance in target pathogens. To that end, we have developed the approach based on experimental evolution monitored by population deep sequencing followed by systems-level mechanistic analysis. The experimental evolution of drug resistance is performed in a morbidostat, a modification of the chemostat approach enabling a constant selective pressure via gradual (software-controlled) increase in drug concentration. The established workflow was optimized and validated in a model of experimental evolution of resistance to metabolic drug triclosan (TCL) in E. coli. The known target of TCL, a popular biocide widely used in consumer products, is enoyl-ACP-reductase FabI, an essential enzyme in bacterial fatty acid biosynthesis. In course of 4x24 hrs consecutive evolutionary cycles in six parallel reactors, we have observed a gradual (up to 20-fold) increase in minimal inhibitory concentration (MIC) at the level of populations and individual selected colonies. The bioinformatics analysis of numerous single nucleotide variants (SNV) appearing and disappearing in course of evolution revealed common aspects of otherwise unique evolutionary trajectories observed in all six reactors. Most importantly, early-stage resistance mechanisms, which emerge at relatively low TCL via recruitment of certain naturalstress-response pathways, are ultimately outcompeted by a single most robust mechanism implemented via resistance mutations in the primary drug target, FabI. Notably, all of the 13 detected SNVs in fabI gene were mapped to the active site area of the enzyme.

Speakers
SL

Semen Leyn

Postdoctoral Associate, IITP RAS / Sanford Burnham Medical Discovery Institute


Thursday August 3, 2017 15:00 - 15:20
Graduate School of Management Building, room 309 Volkhovskiy Pereulok, 3, St. Petersburg, Russia

15:20

Intragenic multiplications in diatom protein-coding genes
Diatoms are unicellular algae belonging to the kindom Chromista, remarkable for their ability to create the species-specific siliceous cell wall. They have appeared circa 240 mya (Sorhannus 2007). Their genomes are usually 20-30 Mbp long and often include long non-coding insertions and short repeats (Vardi et al. 2010). A multiplication was recently shown the diatom silicon transporter (SIT) gene (Marchenkov et al. 2016). The aim of this work was to elucidate the scale of intragenic duplications in diatom genomes using our own as well as published genomic and transcriptomic sequences. We have used sequences of 31 diatom species in total; the analysis was performed with our own software pipeline. For the purposes of this work, the gene was considered multiplicated if and only if it has an analogue containing a single copy of the repetitive element, whether in the same organism or another. We have also excluded the genes rich in low-complexity regions that are also often repetitive, although with shorter repeat elements. It turned out that up to two percent of protein-coding genes detected in transcriptomic and genomic datasets used in this study contain multiplications corresponding to hundreds of aminoacids. Often the genes that encode this proteins are, quite literally, several copies of a protein sequence concatenated within a single ORF. For a comparison, we have searched for similar genes in other model eukaryotes. It turned out that less than a percent of Arabidopsis thaliana and Drosophila melanogaster genes fulfill these criteria, and none at all do for Saccharomyces cerevisiae. Distribution of multiplicated genes among diatoms suggests that these events did not happen simultaneously. It lead to the different repertoire of the repetitive genes among studied species. GO terms for most major functional groups and subcellular localisations are present in the multiplicated genes; membrane proteins are enriched, although they don't form a majority. We also should note that mRNA is transcribed from these genes, as evidenced by their presence in the transcriptomic datasets, which increases the copy number per single-copy element. It's probable that some of them are functional in the multicopy form, while others are processed. Thus, in this work we have, for the first time, estimated the scale of intragenic duplications within diatom protein-coding genes. It is higher than in other eukaryotes and varies between 0.8% and 2% protein-coding genes in different diatom species. This work was funded by the FASO projects #0345-2016-0005 and #0345-2015-0031

Speakers

Thursday August 3, 2017 15:20 - 15:40
Graduate School of Management Building, room 309 Volkhovskiy Pereulok, 3, St. Petersburg, Russia

15:40

Advanced visual analysis of genomic variations (SNV, inDel, SV) using open source New Genome Browser (NGB)

The talk covers simple (SNV, InDel) and structural variations visual analysis with a help of New Genome Browser (http://lifescience.opensource.epam.com/ngbhttps://github.com/epam/NGB).  

NGB is fast and user friendly Web-based genome browser that responds to requirements and recommendations from research and clinician communities.

NGB provides various visual tools for DNA and RNA sequence analysis, exon/domain easy integration with annotation databases, cloud-based data support, embedded protein structure viewer etc.

For structural variations analysis NGB displays fusion proteins with domains/exons structure. All these tools will be demonstrated during the talk.


Speakers
avatar for Mariia Zueva

Mariia Zueva

Software Engineer, EPAM


Thursday August 3, 2017 15:40 - 16:20
Graduate School of Management Building, room 309 Volkhovskiy Pereulok, 3, St. Petersburg, Russia

16:20

Closing Remarks
Speakers
avatar for Alla Lapidus

Alla Lapidus

Professor, St.Petersburg State University


Thursday August 3, 2017 16:20 - 16:40
Graduate School of Management Building, room 309 Volkhovskiy Pereulok, 3, St. Petersburg, Russia
 
Friday, August 4
 

10:00

Linux basics
Strongly recommended for those who are not very familiar with Linux command line.

Friday August 4, 2017 10:00 - 11:30
Department of Microbiology, Saint Petersburg State University 16 linia V.O, 29, Saint Petersburg

11:30

Break
Friday August 4, 2017 11:30 - 11:45
Department of Microbiology, Saint Petersburg State University 16 linia V.O, 29, Saint Petersburg

11:45

13:15

Lunch
Friday August 4, 2017 13:15 - 14:00
Department of Microbiology, Saint Petersburg State University 16 linia V.O, 29, Saint Petersburg

14:00

15:30

Break
Friday August 4, 2017 15:30 - 15:45
Department of Microbiology, Saint Petersburg State University 16 linia V.O, 29, Saint Petersburg

15:45

17:15

Break
Friday August 4, 2017 17:15 - 17:30
Department of Microbiology, Saint Petersburg State University 16 linia V.O, 29, Saint Petersburg

17:30

 
Saturday, August 5
 

10:00

11:30

Break
Saturday August 5, 2017 11:30 - 11:45
Department of Microbiology, Saint Petersburg State University 16 linia V.O, 29, Saint Petersburg

11:45

13:15

Lunch
Saturday August 5, 2017 13:15 - 14:00
Department of Microbiology, Saint Petersburg State University 16 linia V.O, 29, Saint Petersburg

14:00

14:00

14:00

15:30

Break
Saturday August 5, 2017 15:30 - 15:45
Department of Microbiology, Saint Petersburg State University 16 linia V.O, 29, Saint Petersburg

15:45

15:45

15:45

17:15

Break
Saturday August 5, 2017 17:15 - 17:30
Department of Microbiology, Saint Petersburg State University 16 linia V.O, 29, Saint Petersburg

17:30

17:30

17:30

19:00

Dinner
Saturday August 5, 2017 19:00 - 23:00
Pogreeb Griboyedova 11