1 ForeSite Predictive Analytics
Overall Architecture
Server Cluster Configuration
- Allows all 3 software components to scale across all nodes
- Allows scaling across future nodes that may be added
- Total 6 nodes:
- 2 management nodes
- 3 data nodes
- 1 edge node
- SROM will be installed on the Edge and Data Nodes
- The Edge node will be the Primary SROM with the webserver
- SROM nodes will need an NFS mount
- Same storage will be mounted to all nodes and path should be same specification
Hardware Requirements
Management Node 1
Management Node 1 |
||
Filesystem |
Size |
Mounted on |
/dev/sda2 |
160GB |
/ |
/dev/sda1 |
1GB |
/boot |
/dev/sda3 |
20GB |
/var |
/dev/sda4 |
50GB |
/home |
/dev/sdb1 |
200GB |
/data |
Swap: Size 2X system memory |
||
CPU/cores |
8 |
|
Memory |
32 GB |
Management Node 2
Management Node 2 |
||
Filesystem |
Size |
Mounted on |
/dev/sda2 |
160GB |
/ |
/dev/sda1 |
1GB |
/boot |
/dev/sda3 |
20GB |
/var |
/dev/sda4 |
50GB |
/home |
/dev/sdb1 |
200GB |
/data |
Swap: Size 2X system memory |
||
CPU/cores |
8 |
|
Memory |
32 GB |
Data Node 1
Data Node 1 |
||
Filesystem |
Size |
Mounted on |
/dev/sda2 |
160GB |
/ |
/dev/sda1 |
1GB |
/boot |
/dev/sda3 |
20GB |
/var |
/dev/sda4 |
50GB |
/home |
/dev/sdb1 |
1TB |
/data |
Swap: Size 2X system memory |
||
CPU/cores |
8 |
|
Memory |
32 GB |
Data Node 2
Data Node 2 |
||
Filesystem |
Size |
Mounted on |
/dev/sda2 |
160GB |
/ |
/dev/sda1 |
1GB |
/boot |
/dev/sda3 |
20GB |
/var |
/dev/sda4 |
50GB |
/home |
/dev/sdb1 |
1TB |
/data |
Swap: Size 2X system memory |
||
CPU/cores |
8 |
|
Memory |
32 GB |
Data Node 3
Data Node 3 |
||
Filesystem |
Size |
Mounted on |
/dev/sda2 |
160GB |
/ |
/dev/sda1 |
1GB |
/boot |
/dev/sda3 |
20GB |
/var |
/dev/sda4 |
50GB |
/home |
/dev/sdc1 |
1TB |
/data |
Swap: Size 2X system memory |
||
CPU/cores |
8 |
|
Memory |
32 GB |
Edge Node - SROM (Webserver) and DataStage
Data Node 3 |
||
Filesystem |
Size |
Mounted on |
/dev/sda2 |
160GB |
/ |
/dev/sda1 |
1GB |
/boot |
/dev/sda3 |
20GB |
/var |
/dev/sda4 |
50GB |
/home |
/dev/sdc1 |
1TB* |
/data |
Swap: Size 2X system memory |
||
CPU/cores |
8 |
|
Memory |
32 B |
* 1TB is not required. 200 GB will be sufficient.
Additional Notes:
- The allocations presented above are for a sandbox environment and not typical of a Hadoop cluster deployed on the IBM Cloud or on the IBM BigInsights reference architecture.
- Adjustments may need to be made as the workload is further understood.
- For reference see ForeSite R1 IBMServerClusterRequirements
Software Components Configuration
SROM RunTime Web/Analytics Server
Software Requirements:
- Red Hat Enterprise Linux OS
- Python 2.7
- Python pip
- Spark 2.0.1
- Sqlite3
- Libsqlite3-dev
- Ngnix Server
- Docker
Python Packages Dependencies (Use the latest versions for all)
- Virtualenv
- Django
- Pandas
- Requests
- Py4j
- Pydoop
Note: Use the latest version for all
Python Packages Dependencies on Spark Cluster
- Pandas
- Sklearn
- Scipy
- Requests
Note: Install on each node
Configuration Requirements
- Pythonpath environment variable pointing to pyspark
Environment information required
- IP address and host name of the SROM server
- Spark master
- Hive thrift server URL and port number
- Spark installation path
- Orchestration layer’s web service hostname
- Orchestration layer’s web service port number
Special Instructions for Python Package
- Command to install these packages: sudo pip install pandas sklearn scipy requests
- In case pip is not available in the machine: sudo yum install python-pip
- In case above command to install pip doesn't work:
curl "https://bootstrap.pypa.io/get-pip.py" -o "get-pip.py"
sudo python get-pip.py
Additional Nodes
- Spark is able to access the Hive DB
Data Integration Server
Software List:
- Infosphere Data stage
Special Instructions:
- Client data jar file must be SFTPed to server
Big Data Cluster/RHEL 7.2/Min 2 RHEL Node cluster
Software List:
- SROM Java (latest version)
- BigInsights 4.2
- Ranger 0.5.2
- Phoenix (4.6.1)
- Titan (1.0.0)
- Apache SystemML (0.10.0)
- Ambari (2.2.0)
- Flume (1.6.0)
- Hadoop (2.7.2)
- Hbase (1.2.0)
- Kafka (0.9.0.1)
- Knox (0.7.0)
- Slider (0.90.2
- Solr (5.5)
- Spark (1.6.1)
- BigSheets (4.2)
- Big SQL (4.2)
- Big R (4.2)
- Text Analytics (4.2)
- Python (latest 2.7 version)
- Pydot (1.0.28) *specific version required
- Pyparsing (1.5.6) *specific version required
- Numpy (latest version)
- Scipy (latest version)
- Scikit_learn (latest version)
- Orange (latest version)
- Ibm-db (latest version)
- Pandas (latest version)
- Statsmodels (latest version)
- Dill (latest version)
Comments
0 comments
Please sign in to leave a comment.