Pysparkjdbc读写mysql oracle hive 

    需下载依赖包,mysql-connector-java.jar https://mvnrepository.com/artifact/mysql/mysql-connector-java 和 ojdbc6.jar https://www.oracle.com/database/technologies/jdbc-upc-downloads.html并将这两个jar 包放在 SPARK_HOME/jars 目录下。注意:版本。

代码如下: 

 

 MYSQL

 from pyspark import SparkContext
 from pyspark.sql.session import SparkSession

 sc = SparkContext()
 spark = SparkSession(sc)

 # mysql
 mysql_url = 'jdbc:mysql://192.168.0.3:3306/mysql'
 mysql_table = 'titanic'
 mysql_user = 'user'
 mysql_password = 'password'
 mysql_driver = 'com.mysql.jdbc.Driver'
 mysql_properties = {
     'user': mysql_user,
     'password': mysql_password,
     'driver': mysql_driver
 }
 df = spark.read.jdbc(mysql_url, mysql_table, properties=mysql_properties)
 df = spark.read.format('jdbc').options(
     url=mysql_url,
     driver=mysql_driver,
     dbtable='(select * from titanic) titanic',
     user=mysql_user,
     password=mysql_password
 ).load()
 df.write.jdbc(mysql_url, 'test', mode='overwrite', properties=mysql_properties)
 # or user df.write.options().save()

 

 ORACLE

oracle_url = 'jdbc:oracle:thin:@//192.168.0.3:1521/xe'
 oracle_table = 'TITANIC'
 oracle_user = 'user'
 oracle_password = 'password'
 oracle_driver = 'oracle.jdbc.driver.OracleDriver'
 oracle_properties = {
     'user': oracle_user,
     'password': oracle_password,
     'driver': oracle_driver
 }
 df = spark.read.jdbc(oracle_url, oracle_table, properties=oracle_properties)
 df = spark.read.format('jdbc').options(
     url=oracle_url,
     driver=oracle_driver,
     dbtable='(select * from TITANIC where rownum < 1002) TITANIC',
     user=oracle_user, 
     password=oracle_password 
 ).load() 
 df.write.jdbc(oracle_url, 'test', mode='overwrite', properties=oracle_properties) 
 # or user df.write.options().save() 
  HIVE
Python spark

到现在有0条评论

添加我的评论