Overview

GenomeSet is an interactive, web-based bioinformatics platform designed to facilitate genomic data management, exploration, and analysis for researchers, students, and bioinformaticians. This system provides users with the ability to upload their own datasets or retrieve publicly available genomic information directly from trusted repositories such as NCBI. By integrating a modular and intuitive interface with flexible analysis tools, GenomeSet offers an efficient environment for conducting a variety of sequence-based studies.

Upon accessing the GenomeSet homepage, users are presented with a navigation bar that organizes core system functionalities into structured categories: Home, Documentation, Start Analyzing, Additional Genomic Tools, and Contact & Support. The primary operations of the platform are easily accessible through three central features displayed on the homepage:

  • Upload Data: Allows registered users to upload genomic data files in supported formats such as FASTA, CSV, or TSV, and prepare them for further analysis.

  • Species Explorer: Enables users to select and download genome data for specific organisms from NCBI without requiring registration, for direct analysis or comparative studies.

  • GEO Dataset Explorer: Provides access to gene expression datasets from the Gene Expression Omnibus (GEO) database for integrative functional genomics analysis.

GenomeSet incorporates a user profile system where registered users can manage their personal data, uploaded files, saved projects, analysis results, and custom sequence subsets. Uploaded data objects can be selected and directed to the Analyzer, where a range of computational tools are available for sequence property analysis, including length distribution, GC-content profiling, CpG island detection, k-mer analysis, nucleotide composition, codon usage, and amino acid frequency evaluations.

The Analyzer interface is organized into three main sections:

  1. Selected Objects: Displays organisms and datasets chosen for analysis, with real-time download and parsing progress for genome and annotation files from NCBI.

  2. Analysis Functions: Lists available computational tools, customized for different sequence types (cDNA, CDS, UTRs, Genes), enabling users to perform in-depth analyses tailored to their research needs.

  3. Results Display: Visualizes analysis outputs via interactive charts, adjustable property settings, and detailed result tables, with options for filtering, exporting, and saving findings.

An additional feature is available for registered users, allowing them to save and manage projects, share analysis setups, and archive result datasets or selected sequence subsets for future reuse.

In summary, GenomeSet is designed to streamline the workflow of genomic research by offering a unified, accessible, and scalable platform for both exploratory data analysis and targeted bioinformatics investigations.

Key Features

GenomeSet offers an integrated environment with a range of powerful features designed to simplify and accelerate genomic data analysis workflows:

  • User Data Management
    – Upload and manage personal genomic datasets in supported formats (FASTA, CSV, TSV) via a secure, user-friendly interface.
    – Access personal profiles for managing uploaded files, saved projects, analysis results, and sequence subsets.
  • Species Explorer (NCBI Integration)
    – Download genome assemblies and annotation files (FASTA, GFF) for selected organisms directly from the NCBI repository.
    – Use an interactive taxonomic tree and lineage browser to explore organisms by kingdom, division, class, or species.
  • GEO Dataset Explorer
    – Retrieve and manage gene expression datasets from the Gene Expression Omnibus (GEO) for expression-based studies.
  • Modular Analyzer Page
    – Analyze sequences by type (cDNA, CDS, Genes, UTR 3', UTR 5') with selectable analysis options.
    – Real-time genome and annotation file downloads, parsing progress indicators, and dynamic dataset preparation.
  • Comprehensive Analysis Functions
    – Compute sequence length distributions, GC-content, CpG-island frequency, k-mer distributions, nucleotide and codon composition, amino acid content, and positional sequence statistics.
    – Filter, customize, and adjust sequence regions or parameter intervals for tailored analysis.
  • Interactive Result Visualizations
    – Present results using interactive charts, tables, and filters for detailed exploration and interpretation of outputs.
  • Project Management for Registered Users
    – Save analysis steps, results, and filtered subsets as projects.
    – Share projects with other users and retrieve previously saved projects for re-analysis.
  • Expandable Additional Genomic Tools (Under Development)
    – Upcoming specialized tools for GFF3 parsing, massive genome data exploration, and genome visualization modules.
  • Comprehensive Documentation & Support
    – Integrated documentation with tutorials, FAQs, troubleshooting guides, and direct support access via the Help section.

Workflow Overview

GenomeSet provides an organized, stepwise workflow to simplify the process of preparing and analyzing genomic data. The system offers three primary entry points depending on your data source and research goals:

Upload Data

Users with personal or locally stored genomic datasets can upload their files directly into the system through the Upload Data page.

Key steps:

  • Sign up or log in to your GenomeSet account.
  • Navigate to Start Analyzing → Upload Data.
  • Choose your data files in supported formats (FASTA, CSV, or TSV).
  • Define the file type and data format via the options menu.
  • Upload the data into the system.
  • After uploading, access your User Profile to view uploaded files, select objects for analysis, and proceed to the analyzer.

This method is ideal for researchers working with unpublished, custom, or experimental genomic datasets.

Species Explorer

For users interested in analyzing publicly available genomes, the Species Explorer enables direct selection and download of genomic data from the NCBI database.

Key steps:

  • Open Start Analyzing → Species Explorer.
  • Use the taxonomic tree or lineage navigation tools to explore available organisms.
  • Select up to five species for analysis.
  • Add selected organisms to your buffer list.
  • Click Go to Analyzer to fetch genome data (FASTA and GFF files) from NCBI and prepare them for analysis.

This approach eliminates the need for account registration and is designed for comparative or reference-based genome studies.

GEO Explorer

The GEO Explorer (Gene Expression Omnibus Explorer) provides access to gene expression datasets stored in the GEO repository.

Key steps:

  • Visit Start Analyzing → GEO Dataset Explorer.
  • Browse, search, and select gene expression datasets of interest.
  • Download and integrate selected datasets for downstream analysis alongside genome sequence data.

This feature supports expression-based studies, adding transcriptomic data analysis capabilities to GenomeSet workflows.

Additional Genomic Tools

In addition to its core analysis and data management features, GenomeSet aims to provide a collection of specialized genomic tools to support a wide range of research needs. These tools are designed to complement the primary workflow and offer extended analytical and visualization capabilities.

Planned Tools and Features:

  • GFF3 Parser:

    A utility for parsing GFF3 files to extract genomic annotations, feature statistics, and coordinate data. This tool will allow users to convert and interpret annotation files independently or in combination with uploaded or downloaded genome sequences.

  • Massive Genome Research Tool:

    A high-throughput genome analysis module capable of processing multiple genome files simultaneously. It will offer functions for bulk property calculations, comparative genomics, and genome-scale visualization.

  • Genomes Visualization Tool (Future Plan):

    An interactive, graphical genome map viewer. Users will be able to visualize gene locations, sequence regions, feature annotations, and analysis results in a dynamic, zoomable genome browser interface.

Why It Matters:

These additional tools are designed to empower researchers with more flexible, scalable, and detailed capabilities for managing and analyzing large or complex genomic datasets beyond the core features provided in the main analyzer.

New tools and modules will be continuously added based on user feedback, research trends, and collaborative projects.