Users of CLCs bioinformatics software for DNA, RNA, and protein sequence analyses enjoy the user-friendliness, the advanced bioinformatics functionalities, and the top-quality user support that we provide.

改良点アーカイブ

CLC bio: Latest improvements

CLC Genomics Workbench 4.0.3

October 28, 2010

Improvements:

  • Enhanced usability of GSEA analysis wizard: The "Remove duplicates" option is now a check box to switch on and off. Before, the choice of switching off was implicit by choosing Feature ID as the identifier. Now this is explicit using a check box.
  • Improved performance rendering large tables, particularly those with html formatting.

Bug fixes

  • SNP and DIP detection previously ignored overlapping pairs. Now they count (as one read) if they fulfill the quality criteria (SNP detection). In cases where the two parts of the pair disagree, the pair does not count. We recommend running all SNP and DIP detections based on overlapping pairs data sets again (this would be the case if the minimum distance when mapping the reads is lower than two times the read length). There is no need to re-run mappings - just the SNP/DIP detection.
  • ChIP-Seq: "nearest gene" reported not always right. This was the case for the last peak on each chromosome and also in cases where the order of the gene annotations in the reference file did not correspond to the order of the annotations on the actual sequence. We recommend running all ChIP-Seq Analyses again to get the correct reporting of nearest genes. There is no need to re-run the mappings.
  • SNP Annotation Using BLAST failed with certain query sequences (the result could not be shown)
  • Fixed crash of 454 import on certain Linux and Mac systems
  • SOLiD import accepts read names with -P2 at the end
  • Improved import of SAM/BAM files:
    • Better support for files from SOLiD Bioscope
    • Preliminary support for Complete Genomics files (The actual alignment is not represented completely - insertions that relates to a consensus sequence will be represented as unaligned ends in the imported mappings. This should be taken into account when looking for variations.)
  • In the Sequencing Data Analysis-> Assemble Sequences to Reference, the conflict resolution was disabled when not including a reference sequence in the output.
  • When importing sequences from Genbank files, mRNA annotations now prefer taking name after "locus_tag" rather than "product".
  • Various minor bug fixes

CLC Genomics Workbench 4.0.2

August 19, 2010

Bug fixes:

  • Fixed error when importing 454 SFF files
  • Fixed error when importing SOLiD data with quality scores when the reads had "."
  • Fixed error mapping large data sets on Windows 64-bit systems
  • Fixed error when opening tables generated by the Transfac plug-in and the primer search tool
  • Fixed errors when running analyses on experiments generated from RNA-Seq results
  • Genbank export of annotations on the negative strand were not in the right order
  • Fixed memory and performance issues related to import of many sequences, eg. from ACE files.
  • Various minor bug fixes

CLC Genomics Workbench 4.0.1

August 10, 2010

New features:

  • Improved performance of table filtering. Removed limit on the number of rows that can be filtered.
  • Option to search for read names in mapping results (and also sequence lists and BLAST results).
  • Improved performance of conflict table.
  • Better layout of graphics export and printing of mapping results: reference and consensus sequence repeated to provide an orientation context on all pages.
  • Extracting consensus sequence of mapping tables is now running in the background to provide a better user experience.

Bug-fixes:

  • Problem regarding mapping of base-space data erroneously in color space. Under special circumstances, the user settings file contained the wrong default parameters and caused the mapping to be in color space rather than base space. We recommend running mappings performed in Genomics Workbench 4.0 again with Genomics Workbench 4.0.1.
  • Fixed problem with SNP detection on large data sets suddenly running very slow.
  • Scalability improvements in mapping and de-novo assembly with drastic improvements in performance
  • Fixed various problems regarding editing alignment and read mappings.
  • In the detailed mapping report, the zero coverage section was empty when there was only one reference sequence.
  • Various smaller bug fixes.

CLC Genomics Workbench 4

June 15, 2010

New features:

  • Small RNA Analysis
    • Brand new tool for analyzing small RNA (including miRNA) data sets
    • Adapter trimming
    • Counting of tags
    • Annotation using miRBase and other resources
    • Visualization of miRNA variants
    • Expression analysis
  • Renaming and redefining concepts
    • Reference assembly -> Read mapping. We adjust to the common term used today for aligning sequencing reads to a reference sequence.
    • Contig -> Read mapping. The result of read mapping was previously called a contig (i.e. the alignment of reads to a reference sequence). Now, the term "contig" is used exclusively for results from de novo assembly. The result of mapping reads is called a "read mapping".
    • Paired-end -> Paired. We now distinguish during import between Paired-end and Mate pair data. Once imported, there is no difference, and they are both called "Paired".
  • Trim redesign
    • Brand new adapter trimming including library of adapters
    • Performance improved
    • Multiple data sets supported as input
    • Summary report of the trimming
  • Improved SAM/BAM import
    • BAM format now supported, both import and export
    • More robust implementation
    • Better performance
    • Preview panel making it easier to match reference and SAM/BAM file
    • Reference sequence name spaces automatically converted to underscores when comparing with SAM/BAM file
  • High-throughput Sequence Data Import
    • Gzip support
    • SOLiD fastq format supported (when downloading SOLiD data from Sequence Read Archive, SRA). Read more
    • 454 paired data: Support for both FLX and Titanium linkers (also the possibility to add custom non-palindromic linkers). Read more
    • Improved support for SOLiD paired-end data. Read more
    • Support for data from Illumina Pipeline 1.5. Read more
    • Import of tabular alignment files: it is now possible to specify a read name from the file to be imported with the read. Read more
  • Better compression of reference sequences (lower memory footprint and disk space usage)
  • Performance improvement of read mapping algorithm
  • Improved memory management in general: lower memory footprint and shorter management overhead pauses.
  • Improved memory handling of large tabular data sets.
  • RNA-Seq:
    • Directional RNA-Seq. Read more
    • Exon-intron reads are now counted under Total exon reads. When comparing new and old samples, please re-run the analysis on the old samples to ensure consistency. Read more.
  • New de novo assembler has replaced the old one, making the de novo assembly plug-in obsolete. Read more
  • SNP and DIP detection
    • Dialog usability improved by adding an advanced panel for advanced users
    • Minimum counts have been made more clear by creating a Minimum and Sufficient count
  • Contig report has been renamed to Detailed Mapping Report and has been split up to support data with many reference sequences (e.g. when mapping against contigs from de novo assembly). Read more.
  • Redesign of product graphics
  • Improved consistency of data handling including faster listing of folder contents
  • Performance when saving small files significantly improved
  • Performance of ACE export improved, especially for long reference sequences or read mapping tables.
  • Sequence annotations are packed to lower memory footprint and disk space usage, especially for SNP, DIP, and Conflict annotations.
  • Improved performance of reading data files from shared drives.
  • REBASE collection of enzymes updated to latest version
  • BLAST: In the overview BLAST table, it is now possible to extract query sequences. Read more
  • Process tagged sequences: it is now possible to input barcodes on a comma-separated list. Read more
  • Folder structure (expanded/collapsed folders) is preserved through the life-time of a wizard (e.g. when selecting input data and reference for read mapping)
  • Find in Side Panel: separators are allowed when performing position search (e.g. 1.000.000 or 1,000,000 or 1'000'000 or 1 000 000). Read more
  • It is now possible to pause and restart processes involving read mapping and de novo assembly (except the accelerated mapping part of the analyses). Read more
  • Normalization of expression data: it is now possible to do "Reads per 1,000,000"-style normalization of count-based data. Read more
  • New preference group called "Data" to hold information about adapter sequences and Gateway cloning primer additions. Read more

Bug-fixes:

  • Print of folder content now takes settings in the Side Panel into account
  • Process tagged sequences of paired data: it was not possible to specify one read without sequence (necessary for Illumina barcodes using paired data)
  • Better memory handling in conflict table
  • Read mapping: fixed windows errors on large data sets, fixed color space errors
  • RNA-Seq: max number of mismatches when running color space data could be set to three in the dialog but did not take effect. Now the limit at 2 is enforced in the dialog.
  • Find in Side Panel: space are now allowed
  • Genbank import: sequence name (LOCUS) was truncated to 18 characters

Targeted resequencing plug-in 1.0

March 31, 2010

Release of a new plug-in supporting targeted resequencing work flows. Learn more.

Annotate with GFF plug-in 2.0

March 30, 2010

New features

  • Better support for GFF3-formatted files
  • Full support for Ensembl GTF files for direct use in RNA-Seq
  • Better user control over naming and typing of annotations
  • Improved feedback of annotation results

CLC Genomics Workbench 3.7.1

February 04, 2010

Bug fixes:

  • Fixed error concerning naming of dots in PCA plot
  • RNA-seq: reads that extend over more than two exons are now shown correctly
  • Error in folder editor that prevented all elements to be shown is fixed
  • Documentation on trim using quality scores has been updated
  • Names of results from reference assemblies are now named according to the input data
  • Fixed error preventing manual editing of contigs under special circumstances
  • Various bug fixes

CLC Genomics Workbench 3.7

December 15, 2009

New features:

  • Global alignment for long reads when running reference assembly algorithm
  • Gapped color-space alignment when running reference assembly
  • Significantly improved speed of all operations with large data sets
  • RNA-Seq analysis:
    • Performance optimization: A run of 44 mio reads against the mouse genome now takes 32 minutes on an eight-core computer with 32GB RAM. This used to be more than two hours. With the previous version, a lot of small temporary files were created and deleted, and this took a long time and impacted the comupter's general responsiveness. In comparison, only a small fraction of temporary files are created with the new version.
    • New option to specify minimum required exon-overlap of reads spanning an exon-exon junction. Read more...
    • New RNA-Seq report which gives statistical overview of the assembly process. Read more...
    • Result table now reports number of exon-exon- and intron-exon junction spanning reads.
    • Result table now reports chromosome location of genes. Read more...
    • Visualization of reads that span exon-exon junctions. Read more...
    • Reads mapping equally well to intron-exon and exon-exon boundaries are now identified as unique exon-exon spanning reads.
    • RPKM is better defined in the user manual. Read more...
    • Default setting for multi-hits is now 10 as in the Mortazavi paper Read more...
    • Very short reads are now assembled allowing more mismatches.
  • Expression analysis:
    • Volcano plots: you can now choose the values to plot on the x-axis. Choose between "Difference" and "Fold change". Read more...
    • Table view of bar plots shows the same intervals as are shown in the bar plot.
    • Generic importer for expression array data in tabular format. Read more...
    • Generic importer for expression experiment annotation data in tabular format. Read more...
    • Gene Ontology (GO) files can now be used to annotate an expression experiment. Read more...
    • Tag profiling: You are no longer allowed to annotate tag samples, only experiments
    • Side panel of experiment table has been re-organized to provide better overview. Read more...
  • Import high-throughput sequencing data
    • Import tool moved from Toolbox to File menu and tool bar. Read more..
    • Import and export of the SAM alignment format. Read more...
    • Import of alignment data in tab-delimited format, including the ELAND alignment format. Read more...
    • Import of Illumina QSEQ file format. Read more...
    • Linker in 454 data is also found for non-perfect matches Read more...
  • Enhanced visualization of contigs:
    • Un-aligned nucleotides on the inside of paired-end reads are now shown
    • Paired-end reads have a single line connecting the pair rather than gaps
    • Drag handles to move the aligned/unaligned border are only shown when you can see the bases of the reads. This means that you need to have zoomed in to 100% or more and chosen Compactness levels "Not compact" or "Low". Otherwise the handles for dragging are not available (this is done in order to make the visual overview more simple). Read more....
    • It is possible to display pairs that overlap
  • The unassembled reads from an assembly now preserves their paired-end status (this also means that you can get two lists - one with pairs and one with the remainder of the broken pairs
  • SNP detection output table now reports if multiple non-synonymous SNPs exist in same codon
  • SNP detection dialog: Quality filtering is no longer disabled when quality scores are missing. Due to performance issues it is not possible to check if quality scores are present. The SNP detection will just omit the quality score filtering if quality scores are not present.
  • SNP detection: possible to detect variants with frequency less than 1 percent.
  • Contig report now includes information about coverage for both covered regions and whole reference. Read more...
  • Opening consensus sequence including gaps will also put Ns before the consensus sequence starts and after it ends
  • The trim functionality now includes the option to trim away a predefined number of nucleotides from either end of a read. Read more...
  • Gateway cloning. Simple and easy-to-use support for creating Gateway entry and expression clones. Read more...
  • Search for matches among all your saved primers. The Find Binding Sites tool has been greatly improved to now allow you to search among all your primers. In addition, you also get a tabular output of the binding sites and possible fragments. Read more...
  • In silico PCR: create PCR product based on primer pair and template sequence (including primer extensions). As part of the improved Find Binding Sites and Create Fragments tool, you can extract the PCR product from the list of fragments through a right-click menu. Read more...
  • Check primer specificity. As part of the improved Find Binding Sites and Create Fragments tool, you can search with a primer pair in a list of potential target sequences and see an overview table of binding sites and mismatches as well as potential PCR fragments. Read more...
  • Deployment
    • You can set a path to the default data location used when the Workbench starts for the first time. This is a feature to help system administrators control where new installations per default save their data. Read more...
    • Support for removing tools accessing the internet (NCBI BLAST, update notifications etc). Read more...
  • General import and export
    • Support for import of complex regions from GFF files
    • Export tables and reports in Excel format.
    • Import section of user manual re-structured to provide better overview Read more.... Expression data importers are now described in technical details in a separate section Read more....
    • You can now export multiple sequence lists in fasta format
    • Forced import of zip files is now supported (it will force import the contents of the zip file)
    • The standard import now accepts gzip and tar files as well as zip
    • If a forced import fails, there will be more technical information about what went wrong, allowing you to identify bad formatting of the import files
    • Both Genbank and gff importer now makes several attempts at naming genes that do not have a gene name. It will iteratively try the following qualifiers: "product", "locus_tag", "protein_id" and "transcript_id"
    • When importing genbank files where the length stated does not match the actual sequence, a warning is shown but the sequence is accepted.
    • When exporting in csv format, the Locale settings are used to determine whether comma or semi-colons should be used as delimiter (comma used for US locales)
    • GFF plug-in has been updated to accept complex annotations
  • Miscellaneous
    • Advanced retyping of annotations using the annotation table. Read more...
    • Improved reporting of situations when a full disk prevents saving of data
    • Downloading sequences using drag and drop from the search table no longer creates a "Downloading..." node in the folder. The download process can be monitored in the Processes tab.
    • Primer design now supports PCR fragments longer than 5000 bp.
    • Extract Sequences moved from File manu to Toolbox-> General Sequence Analysis. Read more...
    • Better progress feedback on various dialogs

Bug-fixes:

  • Problem with order of genes when setting up RNA-Seq experiments. If the order of input sequences was not the same for all samples, the experiment would be wrong.
  • Fixed wrong orientation of SOLiD mate-pair data
  • Fixed problem with naming of tabs. The fix means that on Windows and Linux unsaved data now gets a * rather than make the tab name bold and italics. (This has always been the behavior on Mac OS X).
  • Fixed problem displaying the "Copying..." label when copying data and then updating the folder
  • Misleading label when assembling reads shorter than 15 bp. Now it says that these reads will be ignored in assembly

CLC Genomics Workbench 3.6.5

August 18, 2009

New features:

  • Export of annotations in GFF format (note that annotations with joined regions are not supported)
  • Export of sequence data in fastq format
  • Now possible to perform detailed manual editing of contigs with up to 100,000 reads
  • Improved performance when zooming large contigs displaying a coverage graph
  • Now possible to change the linker used when importing 454 paired-end data

Bug-fixes:

  • Fixed problems importing expression annotation files
  • Fixed error when trimming for vector sequences
  • Fixed tblastn numbering issue
  • Various bug-fixes

This update is recommended for all users.

CLC Genomics Workbench 3.6.1

July 09, 2009

Issues resolved with this release include:

  • Problem when adding annotations to an Illumina array file
  • Error handling annotated tag-data
  • DNA strider files could loose name upon import
  • Rare misplacement of annotations when editing very large sequences
  • Problem when importing color space data alongside a .cas file

CLC Genomics Workbench 3.6

July 02, 2009

New features

  • Tag profiling: tag-based transcriptomics. Read more...
  • ChIP-Seq analysis is now able to (optionally) use a control sample. Read more...
  • Advanced view of elements in a folder including batch editing. Read more...
  • Create new contig from selection.
  • Import high-throughput sequencing data: you can now import without quality scores. Read more...
  • Reference assembly of short reads: user can now choose between local and global alignment.
  • Reference and de novo assembly output options have been changed so that you no longer need to decide whether you want a contig table or single contigs. Whenever more than one contig is produced, the Workbench automatically creates a contig table Read more...
  • Contig report for reference assemblies: GC content of the reference sequence now included
  • Extract sequences improvements Read more...
    • Now contig tables, overview BLAST tables and RNA-Seq samples can be used
    • User feedback in the dialog is improved
    • Problem with extracting paired-end reads correctly is fixed
  • mRNA Sequencing tool changed name to RNA-Seq Analysis to reflect the consensus about this naming in the NGS community
  • Heat maps and clustering improved:
    • You can now perform different clusterings on an experiment and save them all. In the Side Panel you can switch between the different clusterings to show the corresponding heat map. Read more...
    • Terminology change in the clustering dialogs: "similarity measure" and "cluster distance metric" are replaced by "distance measure" and "cluster linkage", respectively.
  • Annotating samples or experiments for expression analysis:
    • This is now possible even if the number of features doesn't match the number of annotations
    • You can now decide which column in the annotation file to use for matching to the sample or experiment.
    • Because of this extra option, you can no longer include an annotation file when setting up an experiment. You need to add the annotations afterwards
  • Microarray import improved:Added support for import of more versions of native Illumina BeadChip and GEO expression files
  • "Find" in text view now accepts Enter as command to find the next hit
  • Importing VectorNTI archives previously resulted in a sequence list. Now it imports as single sequences.
  • Import list of sequences in csv format: each line in the file represents a sequence with name, optional description, and sequence. Typically useful for importing lists of oligos.
  • You can now drag results from NCBI searches into the view area to open directly (previously you could only drag into a folder to save)
Bug fixes
  • Assembly against many reference sequences could run out of memory. This is been significantly improved.
  • Integration with the Genomics Server: fixed an error when selecting contigs from a contig table for analysis. This is no longer possible (i.e. you have to save the contig first).
  • Microarray import: Fixed a bug that prevented import of expression data with white spaces in column names.
  • Various bug fixes

CLC Genomics Workbench 3.5.1

June 11, 2009

Issues resolved with this release include

  • Rare failure when importing very large Illumina files
  • Memory problem when mapping against many(>20.000) references
  • Rare concurrency issue when translating DNA->protein in e.g. SNP detection
  • Problem rendering scatter plots without lines
  • Graphics export of contigs
  • ChIP-seq table did not show the right distance to nearest gene

CLC Genomics Workbench 3.5

June 2, 2009

Data formats:

  • Data generated with version 3.5 cannot be read in earlier versions

New features:

  • New ChIP seq tool
  • Contig report that records various statistics and graphs for contigs, including e.g. N75, N50 and N25 statistics, coverage distribution, contig size distributions.
  • Extension of RNA-seq functionality to also handle color space data
  • RNA-seq now outputs and can use unique and total gene/exon reads as well as median coverage as measures of expression.
  • Implementations of statistical tests for comparing expression levels of count-based expression measures as may be produced in RNA-seq
    • Kal's test for differences of proportions in single sample to single sample comparisons.
    • Baggerley's test for differences of proportions in two groups with replicates comparisons.
  • New filter options in SNP and DIP detection.
    • SNP and DIP detection: as supplement to minimum variant frequency in percent, you can also specify a minimum variant count.
    • SNP detection: just as DIP detection there is a maximum coverage filter
    • SNP detection: there is now a "ploidy" setting just as for DIPs. This is used to mark SNPs as "complex". The "Genetic code" drop-down box has been moved to step 3.
  • Alignment of SNP and DIP tabular output to allow for easy merging of SNP and DIP tables into complete variance tables
  • Support for Sanger Institute defined FASTQ and new Illumina format (QSEQ)
  • Import of NGS data now allows discarding of sequence names for large savings in disc space and processing time
  • Performance optimization when adding sequences to a sequence list. This now works for NGS data also.
  • SNP and DIP detection can now be performed directly on RNA-seq output contig tables
  • Exporting coverage graph to csv file now has an option to include or exclude gaps. Excluding gaps will make the file use the reference sequence coordinates
  • Much improved memory performance and processing time of large data sets
  • Improved performance when handling trace data. Trace now take up 50 % less disk space. This means that the data is opened and saved much faster and less memory is used.
  • You can now specify minimum length of contigs to be reported in de novo assembly
  • "Reverse Contig" has been renamed to "Reverse Complement Contig." Functionality is un-changed.
  • Import of Illumina expression bead arrays and bead array annotation files
  • Import of Affymetrix Chp files (CHP / PSI)
  • Transformation of expression values now supports square root transformation.
  • Better feedback on processes: there is a tool tip showing details and start time.
  • Translation of DNA to protein in sequence views can now be set to follow existing CDS/ORF annotations.

Bug fixes:

  • Fixed error when trimming reads for vectors
  • Fixed out-of-memory error in mRNA sequencing
  • Fixed error in mRNA sequencing when gene annotations were present outside the reference sequence
  • Fixed error when parsing files from Clone Manager (cm5-files)
  • UniProt search works again

Note:

  • This version introduces a new data format which is not readable by older versions of the software.

CLC Genomics Workbench 3.2.0

March 12, 2009

New features:

  • DIP detection - automatic examination and reporting of insertions/deletions in reference assembly contigs. In the Toolbox under High-Throughput Sequencing. Can be used together with SNP detection to systematically examine positions where the reads differ from the reference sequence. This eliminates the need for manually inspecting gaps and conflicts in the contig.
    Learn more...
  • 15% less disk space usage of imported NGS data sets.
  • 25% faster assembly of NGS data sets.

Various bug fixes:

  • Under certain circumstances, trim failed on Mac OS X
  • mRNA Sequencing: Downstream/upstream options should be disabled when using un-annotated reference sequences
  • Color space information now shown per default for mixed data sets including color space reads
  • De novo assembly report: sometimes number of reverse matches were reported as negative
  • Corrections to the ACE export
  • Better performance of files with many annotations
  • Fixed an error in RNA Structure Evaluation
  • Fixed error and improved performance of Join Sequences tool
  • Fixed error in Find Binding Sites on Sequence: no longer distinguish between lower and upper case
  • Various small fixes

CLC Genomics Workbench 3.1.0

February 26, 2009

New features:

  • Support for reference assembly of SOLiD data in color space (learn more). You need to reimport your data to make use of color space.
  • Viewing of color space data in contig results (learn more).
  • Option of using non-annotated sequences (e.g. EST-library) for RNA-seq (learn more).

Various bug fixes:

  • Assembly and mRNA sequencing errors ("Empty match not allowed" and "Could not read from temporary file") fixed
  • Under special circumstances, quality scores were not aligned correctly
  • SNP detection with an RNA sequence as reference failed
  • SNP detection performance for annotated sequences improved
  • Find in the Side Panel did not support spaces when searching for annotations
  • In the cloning editor under special circumstances, an error occurred when replacing a selection with fragment
  • Sequence statistics codon count were not correct when using RNA sequences

CLC Genomics Workbench 3.0.1

February 3, 2009

The following has been updated:

  • Fixed an error when trimming NGS data
  • Fixed an error in the contig view when deleting a sequence that was selected
  • Fixed an error when changing the filter of a sorted table
  • Fixed error when assembling a mix of paired ends and single reads under special circumstances
  • Fixed error in import of cas file based on SOLiD data from the CLC NGS Cell
  • Fixed a rare error when running SNP detection on a contig table
  • Made mRNA Sequencing accept a sequence list as reference
  • Fixed table view of contigs: sometimes an empty entry would appear which did not reflect the reads at the current position

CLC Genomics Workbench 3.0

January 27, 2009

New features

Transcriptomics

  • Support for both microarray- and sequencing-based (RNA-Seq) expression data
  • Visualization: Interactive heat map, table and scatter plot views
  • Transformation and normalization tools
  • Quality control tools including principal component analysis, MA- and boxplots
  • Experimental design tools for two- or multiple group comparisons
  • T-tests and ANOVA analysis with support for paired/repeated measures
  • Multiple testing corrected p-values (Bonferroni and/or FDR)
  • Clustering algorithms: hierarchical clustering, k-means and Partitioning Around Medoids (PAM) with support for various distance and linkage measures.
  • Ability to import NetAffx annotation arrays and adding annotation to experiments
  • Tools for Gene Set Enrichment Analysis (GSEA) and for Hyper-Geometric based tests for overrepresented annotation categories (e.g. 'GO'stats or specific protein pathways).
  • Ability to work with Expression Arrays and RNA-seq results at the same time, enabling comparison of results
  • Facility for annotating sequences from GFF or GTF files (as used by Ensembl and the UCSC Genome Browser), useful for annotating reference genomes before assembly
  • Statistics on numbers of matching and unique gene, exon and exon-exon boundary spanning reads
  • Calculation of gene expression measures (RPKM) from mRNA sequence data and generation of gene expression profiles (RNA-Seq analysis)
  • Discovery of novel transcripts/exons through mapping of mRNA reads to whole chromosomes or genomes, comparing matches with known exons
  • Interactive views of assemblies and derived gene expression data

Assembly

  • Long reads assembly significantly faster
  • No upper limit on number of reads in de novo assembly (there is still a limit regarding the size of the genome)
  • New simple output option for de novo assembly: only generate consensus sequence instead of full contigs. At the last step of the de novo assembly wizard, you can now choose between "Full contigs" and "Simple contig sequences". The latter option will result in a sequence list with all the consensus sequences. This is much faster and less the demanding for the computer. You can always create full contigs later by running a reference assembly with the consensus sequences as references.
  • Quality of trimming for contamination from own sequences improved. It is now possible to trim off smaller primer sequences.
  • SNP detection:
    • Accepts multiple contigs and table of contigs (the table output includes a new column for the name of the contig)
    • For coding regions (annotated with CDS/ORF annotations): changes on the amino acid level as a consequence of a SNP is now reported (both in the table and in the annotations).
    • General performance improvements
  • Right-clicking a graph (e.g. coverage) on a contig lets you export the data points to a csv file.
  • Contig table shows latin and common name of reference sequences. This is beneficial if you perform a reference assembly against references from different species.
  • Multiplexing - Process Tagged Sequences now has an option to filter away groups with few sequences. This is an advantage if you have very ambiguous barcode definitions where sequencing errors would lead to a lot of "false" groups. These groups can now be filtered because of their small size. (The option is called "Minimum number of sequences" and is found in the third step of the wizard.)
  • Coverage info is now included when you export a table of contigs in ace format. (It contains a "Contig Tag" of type comment (a CT clause) containing a textual description of the coverage in the form "Average coverage: 14.65". )
  • Coverage info is put into the description of consensus sequences extracted from a table of contigs (this means that if you export to fasta, this information will be included).
  • Importing assemblies with more than one contig creates multi contig tables (ace and cas file import)

Improved user experience of processes

  • Non-modal feedback from processes:
    • When there is a message (e.g. from a BLAST search: not hits found)
    • If you have chosen to save the results in the last step of the wizard, you will be notified when the process is done.
    • Processes running on the CLC Science Server will notify when they are done.
  • Possibility to open results by clicking the button next to the process
  • Possibility to find and select results in the Navigation Area by clicking the button next to the process
  • You can see a log of your process by clicking the button next to the process (even if you did not choose to see the log in the last step of the wizard)

Support for interacting with CLC Science Server

  • Read more at http://www.clcbio.com/index.php?id=1260

3D editor re-design

  • The 3D editor now allows you to select individual structure subunits, residues, active sites, disulfide bridges and even atoms, and to customize their appearance

General improvements

  • Limited mode: when using a license server - if there are no more licenses left, you can still access your data. The Workbench will then run in Limited mode where only a few tools are available (corresponds to the tools found in CLC Sequence Viewer). Click "Limited Mode" in the license dialog.
  • Tables:
    • New advanced filter to use numerical data for filtering and to combined several filter criteria. Click the small button next to the normal filter to see the advanced filter.
    • Visual feedback when sorting and filtering tables
    • Improved automatic detection of column width
  • Performance of graphs and plots improved
  • Local BLAST is upgraded to use NCBI BLAST version 2.2.19
  • More elaborate error reports including error logs
  • You can specify which folder the Workbench should use for temporary files
  • Extract sequences from a sequence list, contig or alignment by right-clicking the white empty space. You will then be able to extract the sequences into a list or as separate sequences.
  • The "Find" option in the Side Panel of sequence views automatically detects if you have entered a position instead of a sequence.

Plug-ins

  • Extract Annotations plug-in has been improved:
    • Possibility to specify the naming of the sequences (based on annotation name, type etc)
    • Performance improvements to make it possible to extract annotations of large genomes.
  • MLST plug-in: various bug fixes

Bug fixes

  • Locale settings were not automatically set right on the first start-up. The locale settings determine whether . or , should be used for before decimals. For new installations of the Workbench, it will now be set to the locale of the computer's operating system. For existing installations, you will have to change this in the Edit->Preferences dialog.
  • Fixed problem when BLASTing with an empty sequence
  • Various performance improvements and bug fixes

CLC Genomics Workbench 2.1.1

November 21, 2008

The following has been updated:

  • Reference assembly: fixed an error which meant that in some cases, reference assembly produces different results depending on the amount of memory available.
  • SNP Detection: Reads were dismissed because of gaps even though the reference sequence also had gaps.
  • The Side Panel's Find only high-lighted the first hit. This is now fixed.
  • Fixed error when importing 454 fna/qual files
  • Extract sequences: fixed an error when extracting paired-ends sequences from contigs and sequence lists
  • Local BLAST: solved problem applying command-line parameters, now a checkbox determines whether command-line options should take effect
  • BLAST: it was possible to use a BLAST result as input and database
  • Trace data: fixed an error when deleting parts of an unsaved sequence with traces
  • Better performance when zooming a dot plot
  • Better performance when using the Side Panel's Find in large contigs and sequence lists
  • When right-clicking a CDS annotation and translating into protein, gaps were erroneously introduced into the protein sequence
  • There was an error related to selecting sequences in the Cloning editor
  • Multi-select (using Ctrl / Command key) did not work for sequence lists
  • Various bug fixes

CLC Genomics Workbench 2.1

November 11, 2008

The following has been updated:

  • Support for paired-end Sanger reads
  • Support for paired-end FASTA reads
  • Improved user interface of High-throughput Sequencing Data import dialog
  • Assembly report includes information about assembly parameters
  • Corrected error when opening multiple consensus sequences
  • Fixed problem with import of NGS data in FASTA format
  • Improved error handling for assembly
  • Fixed issue with contig selections while scrolling
  • Corrected error introduced by overlapping mate-pairs

CLC Genomics Workbench 2.0.4

October 8, 2008

The following has been updated:

  • Fixed problems when assembling large or mixed data sets
  • Ensured correct setting of limit for assembly of short reads

CLC Genomics Workbench 2.0.3

October 6, 2008

The following has been updated:

  • Fixed problems with de novo assembly
  • Status properly updated when a conflict is resolved
  • Assembly programs now run on older version of Linux

CLC Genomics Workbench 2.0.2

October 2, 2008

The following has been updated:

  • Fixed problems when scrolling very large sequences
  • Fixed problem when importing very large GenBank files
  • Improved possibilities for navigating contigs
  • Improved stability when importing non-standard data
  • Improved memory handling and stability of assembly algorithms
  • Support for import of Illumina long insert paired-end data

CLC Genomics Workbench 2.0.1

September 18, 2008

New features

General performance improvements
  • Improved performance when handling large data sets
High-throughput Sequencing Assembly
  • Support for reference assembly against the human genome (i.e. reference sequences of any size)
  • New and much faster algorithm for assembling short reads (less than 55 nucleotides)
  • Significant performance improvements of reference assembly.
  • True support for reference assembly of mixed data sets in one go. Sequencing data from different platforms (and both single and paired ends) can now be assembled together. Previously this could be accomplished by making separate assemblies and joining the contigs afterwards, but now this process is automated.
  • Reference sequences can be masked based on annotations. This could be used to e.g. mask off repeat regions or only include exons in the assembly. The reference sequences have to be annotated in order to use masking.
  • Assembly report includes the number of contigs produced
  • Contigs from a reference assembly can also be shown in an overview table. This was previously only possible for De novo assembly. In the last step of the reference assembly wizard, there is an option: Create overview table including all contigs.
Import and export
  • Support for high number of Sanger sequencing data with trace information. Using the Import functionality under High-throughout sequencing you can import huge amounts of e.g. abi files. This will import quality scores but discard trace data to produce a sequence list in the Workbench which makes it possible to assemble thousands of Sanger reads.
  • SOLiD import of paired-ends data improved. In some cases paired-ends data also contains single reads which are now removed during import.
  • Possible to import cas files created by the CLC NGS Cell (the command-line version of the assembly algorithms of the CLC Genomics Workbench)
  • Contigs can be exported in ACE format
  • Improvement of ACE file importer
  • Trim information in sff files can be used during import
  • Support for import of SCARF files (from Illumina Genome Analyzer systems)
  • Export of graph data points in csv format
Various high-throughput sequencing improvements
  • SNP detection now also reports position relative to the reference sequence as well as the consensus sequence. The table includes both positions per default (can be checked on and off), and the user decides where annotations should be added.
  • SNP detection table includes information about the name of annotations covering the SNPs. Previously only the annotation type was reported.
  • Trimming now also supports paired-ends data. If one of the reads in a pair is trimmed off, the whole pair will be removed.
  • Partially matched reads are reported as a graph along the contig.
  • Possibility to open consensus sequence with gaps. Right-click the label of the consensus sequence in the contig view and select: Open Copy of Sequence Including Gaps. The gaps will be represented by Ns in the new sequence.
  • Dynamic consensus graph removed from contig view. Since contigs now have a "real" consensus sequence which is also updated to reflect changes in the reads, the dynamic consensus sequence which is switched on in the Side Panel has been removed.
  • Annotations can be transferred from reference to consensus sequence in bulk. Right-click one of the annotations and choose "Copy to Consensus Sequence" or "Copy Annotations of Type xx to Consensus Sequence".
  • Multiplexing now also possible for paired-end reads
Plug-in updates
  • New plug-in! GFF/GTF support: You can now annotate a sequence using a GFF/GTF file. The plug-in is available for all Workbenches (not CLC Sequence Viewer). Once installed, you find it in Toolbox->General Sequence Analysis-> Annotate from GFF/GTF File. Read more...
  • Extract annotations plug-in updated: it now uses the name of the annotation as the name of the new sequence.
Annotation handling
  • Annotation table has been greatly improved:
    • supports very long, heavily annotated genomes
    • usability of the filtering has been improved with feedback on the filtering process
  • Advanced renaming options.
Bug fixes
  • Fixed bugs related to contig editing.
  • Various bug-fixes.
  • Fixed problem with import in 2.0 release.

CLC Genomics Workbench 1.1.1

July 10, 2008

New features

  • Scrollbars can be adjusted manually
Problems fixed
  • Fixed problems when aligning sequences with lowercase characters
  • Fixed import of trace files without quality scores
  • Fixed problem when removing location
  • A new sequence list can be created from a selection in the table view
  • Better memory handling and managment of large contigs
  • User definable scrollbar areas for contig views
  • A few other minor bugs have been fixed.

CLC Genomics Workbench 1.1

June 27, 2008

New features

  • Increased speed of de novo assembly
  • Option of generating a contig table as a result of de novo assembly. This way the workspace is not polluted by a large number of contigs.
  • Multi contig table has options for opening contigs or extracting consensus sequences for further analysis
  • Much smoother scrolling on contigs when there is very high coverage
Problems fixed
  • Problems with import of .ACE files
  • Problems with excessive generation of files when doing de novo assembly of short read data
  • Problems with the use of quality scores in SNP detection
Copyright © CLC bio Japan, Inc.