Skip to content
Change the repository type filter

All

    Repositories list

    • velox

      Public
      A new C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.
      C++
      Apache License 2.0
      1.1k20819Updated Oct 3, 2024Oct 3, 2024
    • oap-mllib

      Public
      Optimized Spark package to accelerate machine learning algorithms in Apache Spark MLlib.
      Scala
      Apache License 2.0
      1220378Updated Sep 25, 2024Sep 25, 2024
    • .github

      Public
      Other
      0000Updated Aug 19, 2024Aug 19, 2024
    • raydp

      Public
      RayDP provides simple APIs for running Spark on Ray and integrating Spark with AI libraries.
      Python
      Apache License 2.0
      683083512Updated Jul 31, 2024Jul 31, 2024
    • vllm-fork

      Public
      A high-throughput and memory-efficient inference and serving engine for LLMs
      Python
      Apache License 2.0
      4.1k000Updated Jul 23, 2024Jul 23, 2024
    • text2sql-gluten

      Public archive
      Python
      3500Updated Jul 11, 2024Jul 11, 2024
    • English SDK for Apache Spark
      Python
      Apache License 2.0
      124101Updated Jul 9, 2024Jul 9, 2024
    • libhdfs3

      Public
      HDFS file read access for ClickHouse
      C++
      Apache License 2.0
      55200Updated Jul 5, 2024Jul 5, 2024
    • oap-tools

      Public archive
      Tools for building, packaging, and OAP public cloud integrations such as AWS EMR, Google Dataproc and K8S.
      Jupyter Notebook
      Apache License 2.0
      131692Updated Mar 27, 2024Mar 27, 2024
    • Gluten: Plugin to Double SparkSQL's Performance
      Scala
      Apache License 2.0
      421000Updated Mar 26, 2024Mar 26, 2024
    • Spark* shuffle plugin for support shuffling data through a remote Hadoop-compatible file system, as opposed to vanilla Spark's local-disks.
      Scala
      Apache License 2.0
      122031Updated Mar 15, 2024Mar 15, 2024
    • protobuf

      Public
      A Intel customized Protocol Buffers - Google's data interchange format
      C++
      Other
      15k001Updated Nov 21, 2023Nov 21, 2023
    • Gluten-Trino

      Public archive
      Gluten: Plugin to Boost Trino's Performance
      Java
      Apache License 2.0
      156961Updated Oct 25, 2023Oct 25, 2023
    • cloudtik

      Public archive
      Cloud Scale Platform for Distributed Analytics and AI
      Python
      Apache License 2.0
      72311Updated Oct 12, 2023Oct 12, 2023
    • pmem-shuffle

      Public archive
      Spark* Shuffle plugin for support shuffling through remote persistent memory over fabrics, which leverages the RDMA network and remote persistent memory (for read) to provide extremely high performance and low latency shuffle solutions for Spark*.
      C++
      Apache License 2.0
      914151Updated Sep 18, 2023Sep 18, 2023
    • recdp

      Public archive
      Python
      Apache License 2.0
      4210Updated Sep 18, 2023Sep 18, 2023
    • oap-project.github.io

      Public archive
      The OAP project web site
      HTML
      Apache License 2.0
      4000Updated Sep 5, 2023Sep 5, 2023
    • arrow

      Public
      Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and interprocess communication…
      C++
      Apache License 2.0
      3.5k6021Updated May 18, 2023May 18, 2023
    • gazelle_plugin

      Public archive
      Native SQL Engine plugin for Spark SQL with vectorized SIMD optimizations.
      Scala
      Apache License 2.0
      7725619124Updated Feb 21, 2023Feb 21, 2023
    • solution-navigator

      Public archive
      Example solutions or code for using OAP features.
      Jupyter Notebook
      Apache License 2.0
      3000Updated Jan 25, 2023Jan 25, 2023
    • sql-ds-cache

      Public archive
      Spark* plug-in for accelerating Spark* SQL performance by using cache and index at SQL data source layer.
      Scala
      Apache License 2.0
      2537154Updated Jan 3, 2023Jan 3, 2023
    • libhdfs3-downstream

      Public archive
      a native c/c++ hdfs client (downstream fork from apache-hawq)
      C++
      Apache License 2.0
      54000Updated Jan 3, 2023Jan 3, 2023
    • arrow-data-source

      Public archive
      Spark DataSouce plugin for reading files from various formats like Parquet into Arrow compatible columnar vectors.
      Scala
      Apache License 2.0
      10630Updated Jan 3, 2023Jan 3, 2023
    • pmem-spill

      Public archive
      Spark plug-in package for accelerating Spark runtime spill functions using PMem such as RDD cache PMem extension.
      Scala
      Apache License 2.0
      57111Updated Dec 15, 2021Dec 15, 2021
    • pmem-common

      Public archive
      Common library for accessing PMEM native library functions including memkind, vmemcache and so on.
      Java
      Apache License 2.0
      7331Updated Dec 14, 2021Dec 14, 2021