Big Data Fundamentals

Big data digital concept

In today’s era of Big Data explosion, analyzing large data sets has become a key basis of competition, productivity growth, innovation, and consumer surplus for Communications Service Providers’ (CSP).

Operators are seeking new ways to increase operational efficiency by leveraging Big Data technologies. For example, by utilizing Big Data technologies such as Hadoop distributed file system and cloud-based analytic, CSPs can bring about significant cost savings in data storage.

Business leaders and data-oriented managers in the telecommunications industry will now have to deal with the implications of Big Data in order to meet the objectives of improved bottom line, customer experience, intelligent network planning and reduced customer churn.

In this course, participants will study the technological landscape of Big Data and learn the fundamentals of Big Data to develop and implement strategies and overcome challenges. This course will impart the analytical skills required for participants to make sense of large volumes of data and drive real-time actionable insights and business decision making.

  • Network Operation Managers
  • Financial Managers
  • CRM Managers
  • Top IT Managers in Telco CIO Office
  • Business Analysts in Telco
  • CFO Office Managers / Analysts
  • Operational Managers
  • QA Managers
Instructor-Led Training
[Classroom: 3 days / LIVE Virtual*: 21 hours]
*Note:
  • A minimum of 5 or more participants are required for a company-based LIVE Virtual course to commence
  • LIVE Virtual courses can be conducted for 5 hours or 7 hours daily. Please note that the number of training days will be extended if you opt for 5 hours daily.
  1. Business Overview – Why Big Data Business Intelligence in Telco
  • Case Studies from T-Mobile, Verizon etc.
  • Big Data adaptation rate in North American Telco & and how they are aligning their future business model and operation around Big Data BI
  • Broad Scale Application Area
  • Network and Service Management
  • Customer Churn Management
  • Data Integration & Dashboard Visualization
  • Fraud management
  • Business Rule Generation
  • Customer Profiling
  • Localized Ad Pushing
  1. Big Data Introduction I
  • Main Characteristics of Big Data – Volume, Variety, Velocity and Veracity, MPP Architecture for Volume
  • Data Warehouses – Static Schema, Slowly Evolving Dataset
  • MPP Databases like Greenplum, Exadata, Teradata, Netezza, Vertica etc.
  • Hadoop Based Solutions – No Conditions on Structure of Dataset
  • Typical Pattern: HDFS, MapReduce (crunch), Retrieve from HDFS
  • Batch – Suited for Analytical/Non-Interactive
  • Volume: CEP Streaming Data
  • Typical Choices – CEP Products (eg. Infostreams, Apama, MarkLogic, etc.)
  • Less Production Ready – Storm/S4
  • NoSQL Databases – (Columnar and Key-value): Best suited as Analytical Adjunct to Data Warehouse/Database
  1. Big Data Introduction II
  • NoSQL Solutions
  • KV Store – Keyspace, Flare, SchemaFree, RAMCloud, Oracle NoSQL Database (OnDB)
  • KV Store – Dynamo, Voldemort, Dynomite, SubRecord, Mo8onDb, DovetailDB
  • KV Store (Hierarchical) – GT.m, Cache
  • KV Store (Ordered) – TokyoTyrant, Lightcloud, NMDB, Luxio, MemcacheDB, Actord
  • KV Cache – Memcached, Repcached, Coherence, Infinispan, EXtremeScale, JBossCache, Velocity, Terracoqua
  • Tuple Store – Gigaspaces, Coord, Apache River
  • Object Database – ZopeDB, DB40, Shoal
  • Document Store – CouchDB, Cloudant, Couchbase, MongoDB, Jackrabbit, XML-Databases, ThruDB, CloudKit, Prsevere, Riak-Basho, Scalaris
  • Wide Columnar Store – BigTable, HBase, Apache Cassandra, Hypertable, KAI, OpenNeptune, Qbase, KDI
  1. Varieties of Data: Introduction to Data Cleaning Issue in Big Data
  • RDBMS – Static Structure/Schema, doesn’t Promote Agile, Exploratory Environment
  • NoSQL – Semi-structured, enough Structure to Store Data without Exact Schema before Storing Data
  • Data Cleaning Issues
  1. Big Data Introduction III
  • When to Select Hadoop/Spark/Kafka
  • STRUCTURED – Enterprise data warehouses/databases can store massive data (at a cost) but impose structure (not good for active exploration)
  • SEMI STRUCTURED data – tough to do with traditional solutions (DW/DB)
  • Warehousing data = HUGE effort and static even after implementation
  • For Variety & Volume of Data, Crunched on Commodity Hardware – HADOOP
  • Commodity H/W needed to Create a Hadoop Cluster
  1. Introduction to Map Reduce / HDFS
  • MapReduce – Distribute Computing over Multiple Servers
  • HDFS – Make Data Available Locally for the Computing Process (with Redundancy)
  • Data – Can be Unstructured/Schema-less (unlike RDBMS)
  • Developer Responsibility to make Sense of Data
  • Programming MapReduce = Working with Java (pros/cons), Manually Loading Data into HDFS
  1. Big Data Ecosystem – Building Big Data ETL
  • CSPARK vs. Other NoSQL Solutions
  • For Interactive, Random Access to Data
  • Hbase (Column-oriented Database) on Top of Hadoop or SPAK
  • Random Access to Data but Restrictions Imposed (max 1 PB)
  • Not Good for Ad-hoc Analytics, good for Logging, Counting, Time-series
  • Sqoop – Import from Databases to Hive or HDFS (JDBC/ODBC access)
  • Flume – Stream Data (eg. Log Data) into HDFS
  1. Big Data Management System & Brokers
  • Moving Parts, Compute Nodes Start/Fail: ZooKeeper – For Configuration/Coordination/Naming Services
  • Complex Pipeline/Workflow: Oozie – Manage Workflow, Dependencies, Daisy Chain
  • Deploy, Configure, Cluster Management, Upgrade, etc, (System Admin): Ambari
  • In Cloud: Whirr
  • Kafka, the King of Broker System
  • MQTT for IoT-based Brokerage
  1. Predictive Analytics in Business Intelligence 1 – Fundamental Techniques & Machine Learning-Based BI
  • Introduction to Machine learning
  • Learning Classification Techniques
  • Bayesian Prediction – Preparing Training File
  • Support Vector Machine
  • Neural Network
  • Big Data Large Variable Problem – Random Forest (RF)
  • Big Data Automation Problem – Multi-Model Ensemble RF
  • Agile Learning
  • Agent Based Learning – Example from Telco Operation
  • Distributed Learning – Example from Telco Operation
  • Introduction to Open Source Tools for Predictive Analytics: R, Rapidminer, Mahut
  1. Step-by-Step Procedure to Replace Legacy Data System with Big Data System
  • Understanding Practical Big Data Migration Roadmap
  • What is the Important Information Needed before Architecting a Big Data Implementation?
  • What are the Different Ways of Calculating Volume, Velocity, Variety and Veracity of Data?
  • How to Estimate Data Growth?
  • Case Studies in 2 Telcos
  1. Review of Big Data Vendors and their Products with Q&A Session
  • A basic knowledge of business operations and data systems in Telecom
  • A basic understanding of SQL / Oracle or relational database
  • A basic understanding of Statistics (in Excel levels)
Print Friendly, PDF & Email
Add to Wishlist
Duration: Classroom: 3 days / LIVE Virtual: 21 hours
Delivery Format: Virtual Training

Upcoming Classes

You cannot copy content of this page