Data Science @ Scale

HDP and IBM Data Science Experience

La scienza dei dati cambierà le carte in tavola per le aziende

La scienza dei dati è un ambito interdisciplinare che comprende apprendimento automatico, statistica, analisi avanzata e programmazione. È una nuova forma d'arte che ricava approfondimenti nascosti e mette all'opera i dati nell'epoca della conoscenza.

IBM Data Science Experience (DSX) is an enterprise platform for data scientists and data engineers. It offers out-of-the-box open-source and commercial data science tools including RStudio, Apache Spark, Jupyter, and Zeppelin notebooks. DSX supports the entire data science lifecycle from data preparation and ETL to model development and deployment. With DSX, companies can build predictive and machine learning models using their favorite tools, technologies, and libraries, while leveraging the scale, security and governance of the HDP platform.

Il ciclo vita della scienza dei dati


Access to community

DSX provides a social environment where data scientists can research and share articles, data sets, notebooks, and tutorials. DSX enables data scientists and analysts to come up to speed by taking courses in R, Python, or Scala, copy content into a Jupyter or a Zeppelin notebook, or work in an embedded RStudio environment.

  • Find tutorials and datasets
  • Connect with data scientists and ask questions
  • Research articles and papers
  • Fork and share projects
Use familiar open source tools and libraries

With DSX, data scientists have the flexibility to create new Jupyter or Zeppelin notebooks in R, Python, or Scala or import an existing notebook. DSX includes popular open source libraries, such as PySpark, matplotlib, SparkML and machine learning and deep learning APIs. Data scientists can use DSX to tell a compelling story with the help of open source visualization libraries like Brunel and PixieDust and have the flexibility to install other open source libraries of their choice.

  • Code in Scala, Python, R, Apache Spark and SQL
  • Visualize and share code using Zeppelin & Jupyter Notebooks
  • Leverage RStudio IDE and Shiny
  • Use your favorite libraries including Scikit-learn, XGBoost, Spark Mlib, TensorFlow, Caffe, Keras and MXNet
Operationalize models with one click

With DSX, administrators can deploy models with one-click and have the ability to monitor all runtime environments and services.

  • Data Shaping Pipeline UI
  • Auto-data preparation & modeling
  • Advanced Visualizations
  • Model management & deployment
  • Documented Model APIs
Scale and enterprise security

The combination of HDP and DSX empowers enterprises to run data science at scale by leveraging all the data in the data lake, as well as deploying enterprise-grade security, governance, and operations.

  • Data Science at Scale - Run Spark Jobs on HDP Cluster
  • Secure Hadoop Support using Apache Ranger
  • Support for ABAC using Apache Ranger
