Unable to read csv file uploaded on google cloud storage bucket(无法读取上传到谷歌云存储桶的 csv 文件)
问题描述
Goal - To read csv file uploaded on google cloud storage bucket.
Environment - Run Jupyter notebook using SSH instance on Master node. Using python on Jupyter notebook trying to access a simple csv file uploaded onto google cloud storage bucket.
Approaches -
1st approach - Write a simple python program
Wrote following program
import csv
f = open('gs://python_test_hm/train.csv' , 'rb' )
csv_f = csv.reader(f)
for row in csv_f
print row
Results - Error message "No such file or directory"
2nd Approach - Using gcloud Package tried to access the train.csv file. The sample code is shown below. Below code is not the actual code. The file on google Cloud storage in my version of code was referred to "gs:///Filename.csv" Results - Error message "No such file or directory"
Load data from CSV
import csv
from gcloud import bigquery
from gcloud.bigquery import SchemaField
client = bigquery.Client()
dataset = client.dataset('dataset_name')
dataset.create() # API request
SCHEMA = [
SchemaField('full_name', 'STRING', mode='required'),
SchemaField('age', 'INTEGER', mode='required'),
]
table = dataset.table('table_name', SCHEMA)
table.create()
with open('csv_file', 'rb') as readable:
table.upload_from_file(
readable, source_format='CSV', skip_leading_rows=1)
3rd Approach -
import csv
import urllib
url = 'https://storage.cloud.google.com/<bucket>/train.csv'
response = urllib.urlopen(url)
cr = csv.reader(response)
print cr
for row in cr:
print row
Results - Above code doesn't result in any error but it displays the XML content of the google page as shown below. I am interested in viewing the data of the train csv file.
['<!DOCTYPE html>']
['<html lang="en">']
[' <head>']
[' <meta charset="utf-8">']
[' <meta content="width=300', ' initial-scale=1" name="viewport">']
[' <meta name="google-site-verification" content="LrdTUW9psUAMbh4Ia074- BPEVmcpBxF6Gwf0MSgQXZs">']
[' <title>Sign in - Google Accounts</title>']
Can someone throw some light on what could be possibly wrong here and how do I achieve my goal? Your help is highly appreciated.
Thanks so much for your help!
I assume you are using Jupyter notebook running on a machine in Google Cloud Platform (GCP)? If that's the case, you will already have the Google Cloud SDK running on that machine (by default).
With this setup you have 2 easy options to work with Google Cloud Storage (GCS):
Use the gcloud/gsutil commands in Jupyter
Writing to GCS:
gsutil cp train.csv gs://python_test_hm/train.csv
Reading from GCS:
gsutil cp gs://python_test_hm/train.csv train.csv
Use google-cloud python library
Writing to GCS:
from google.cloud import storage
client = storage.Client()
bucket = client.get_bucket('python_test_hm')
blob = bucket.blob('train.csv')
blob.upload_from_string('this is test content!')
Reading from GCS:
from google.cloud import storage
client = storage.Client()
bucket = client.get_bucket('python_test_hm')
blob = storage.Blob('train.csv', bucket)
content = blob.download_as_string()
这篇关于无法读取上传到谷歌云存储桶的 csv 文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:无法读取上传到谷歌云存储桶的 csv 文件
基础教程推荐
- 将 YAML 文件转换为 python dict 2022-01-01
- 如何在 Python 中检测文件是否为二进制(非文本)文 2022-01-01
- 使 Python 脚本在 Windows 上运行而不指定“.py";延期 2022-01-01
- 使用Python匹配Stata加权xtil命令的确定方法? 2022-01-01
- 使用 Google App Engine (Python) 将文件上传到 Google Cloud Storage 2022-01-01
- 哪些 Python 包提供独立的事件系统? 2022-01-01
- 合并具有多索引的两个数据帧 2022-01-01
- 症状类型错误:无法确定关系的真值 2022-01-01
- 如何在Python中绘制多元函数? 2022-01-01
- Python 的 List 是如何实现的? 2022-01-01