Create new csv file in Google Cloud Storage from cloud function(通过云功能在 Google Cloud Storage 中创建新的 csv 文件)
问题描述
第一次使用 Google Cloud Storage.下面我有一个云功能,只要将 csv 文件上传到我的存储桶内的 my-folder
就会触发该功能.我的目标是在同一文件夹中创建一个新的 csv 文件,读取上传的 csv 的内容并将每一行转换为将进入新创建的 csv 的 URL.问题是我一开始就创建新的 csv 时遇到了麻烦,更不用说实际写入它了.
First time working with Google Cloud Storage. Below I have a cloud function which is triggered whenever a csv file gets uploaded to my-folder
inside my bucket. My goal is to create a new csv file in the same folder, read the contents of the uploaded csv and convert each line to a URL that will go into the newly created csv. Problem is I'm having trouble just creating the new csv in the first place, let alone actually writing to it.
我的代码:
import os.path
import csv
import sys
import json
from csv import reader, DictReader, DictWriter
from google.cloud import storage
from io import StringIO
def generate_urls(data, context):
if context.event_type == 'google.storage.object.finalize':
storage_client = storage.Client()
bucket_name = data['bucket']
bucket = storage_client.get_bucket(bucket_name)
folder_name = 'my-folder'
file_name = data['name']
if not file_name.endswith('.csv'):
return
接下来的几行来自 GCP 的 GitHub 存储库中的示例.这是我希望创建新 csv 的时候,但没有任何反应.
These next few lines came from an example in GCP's GitHub repo. This is when I would expect the new csv to be created, but nothing happens.
# Prepend 'URL_' to the uploaded file name for the name of the new csv
destination = bucket.blob(bucket_name + '/' + file_name[:14] + 'URL_' + file_name[14:])
destination.content_type = 'text/csv'
sources = [bucket.get_blob(file_name)]
destination.compose(sources)
output = bucket_name + '/' + file_name[:14] + 'URL_' + file_name[14:]
# Transform uploaded csv to string - this was recommended on a similar SO post, not sure if this works or is the right approach...
blob = bucket.blob(file_name)
blob = blob.download_as_string()
blob = blob.decode('utf-8')
blob = StringIO(blob)
input_csv = csv.reader(blob)
在下一行出现错误:No such file or directory: 'myProjectId/my-folder/URL_my_file.csv'
with open(output, 'w') as output_csv:
csv_dict_reader = csv.DictReader(input_csv, )
csv_writer = csv.DictWriter(output_csv, fieldnames=['URL'], delimiter=',', quotechar='"', quoting=csv.QUOTE_ALL)
csv_writer.writeheader()
line_count = 0
for row in csv_dict_reader:
line_count += 1
url = ''
...
# code that converts each line
...
csv_writer.writerow({'URL': url})
print(f'Total rows: {line_count}')
如果有人对我如何获得它来创建新的 csv 然后写入它有任何建议,那将是一个巨大的帮助.谢谢!
If anyone has any suggestions on how I could get this to create the new csv and then write to it, it would be a huge help. Thank you!
推荐答案
可能我会说我对代码和解决方案的设计有几个问题:
Probably I would say that I have a few questions about the code and the design of the solution:
据我了解 - 一方面,云功能由
finalise
事件触发 Google Cloud Storage 触发器,而不是他希望将新创建的文件保存到同一个存储桶中.成功后,该存储桶中新对象的出现将触发您的云函数的另一个实例.这是预期的行为吗?你的云功能准备好了吗?
As I understand - on one hand the cloud function is triggered by a
finalise
event Google Cloud Storage Triggers, not he other hand you would like to save a newly created file into the same bucket. Upon success, an appearance of a new object in that bucket is to trigger another instance of your cloud function. Is that the intended behaviour? You cloud function is ready for that?
在本体上没有文件夹
这样的东西.因此在这段代码中:
Ontologically there is no such thing as folder
. Thus in this code:
folder_name = 'my-folder'
file_name = data['name']
第一行有点多余,除非您想将该变量和值用于其他用途...并且 file_name
获取包含所有前缀的对象名称(您可以将它们视为文件夹".
the first line is a bit redundant, unless you would like to use that variable and value for something else... and the file_name
gets the object name including all prefixes (you may consider them as "folders".
您引用的示例 - storage_compose_file.py - 是关于如何将 GCS 中的几个对象组合成一个.我不确定该示例是否与您的案例相关,除非您有一些额外的要求.
The example you refer - storage_compose_file.py - is about how a few objects in the GCS can be composed into one. I am not sure if that example is relevant for your case, unless you have some additional requirements.
现在,让我们看一下这段代码:
Now, let's have a look at this snippet:
destination = bucket.blob(bucket_name + '/' + file_name[:14] + 'URL_' + file_name[14:])
destination.content_type = 'text/csv'
sources = [bucket.get_blob(file_name)]
destination.compose(sources)
一个.bucket.blob
- 是工厂构造函数 - 请参阅 API 存储桶说明.我不确定您是否真的想使用 bucket_name
作为其参数的元素...
a. bucket.blob
- is a factory constructor - see API buckets description. I am not sure if you really would like to use a bucket_name
as an element of its argument...
b.sources
- 变成一个只有一个元素的列表 - 对 GCS 存储桶中现有对象的引用.
b. sources
- becomes a list with only one element - a reference to the existing object in the GCS bucket.
c.destination.compose(sources)
- 是否试图复制现有对象?如果成功 - 它可能会触发您的云函数的另一个实例.
c. destination.compose(sources)
- is it an attempt to make a copy of the existing object? If successful - it may trigger another instance of your cloud function.
- 关于类型更改
blob = bucket.blob(file_name)
blob = blob.download_as_string()
在第一行之后,blob
变量的类型为 google.cloud.storage.blob.Blob
.在第二个 - bytes
之后.我认为 Python 允许这样的事情......但你真的喜欢它吗?顺便说一句,download_as_string
方法已弃用 - 请参阅 Blobs/Objects API
After the first line the blob
variable has the type google.cloud.storage.blob.Blob
. After the second - bytes
. I think Python allows such things... but would you really like it? BTW, the download_as_string
method is deprecated - see Blobs / Objects API
- 关于
输出
:
output = bucket_name + '/' + file_name[:14] + 'URL_' + file_name[14:]
with open(output, 'w') as output_csv:
请记住 - 所有这些都发生在云函数的内存中.与 GCS 的 blob 桶无关.如果您想在云函数中使用临时文件 - 您将在 /tmp
目录中使用它们 - 从谷歌云函数写入临时文件我猜你会因为这个问题而收到错误.
Bear in mind - all of that happens inside the memory of the cloud function. Nothing to do with the GCS buckets of blobs. If you would like to use temporary files within cloud functions - you are to use them in the /tmp
directory - Write temporary files from Google Cloud Function I would guess that you get the error because of this issue.
=>提出一些建议.
=> Coming to some suggestions.
您可能希望将对象下载到云函数内存中(进入 /tmp
目录).然后您想处理源文件并将结果保存在源附近.然后您想将结果上传到 另一个(不是源)存储桶.如果我的假设是正确的,我会建议逐步实施这些事情,并检查您是否在每一步都获得了预期的结果.
You probably would like to download the object into the cloud function memory (into the /tmp
directory). Then you would like to process the source file and save the result near the source. Then you would like to upload the result to another (not the source) bucket. If my assumptions are correct, I would suggest to implement those things step by step, and check that you get the desired result on each step.
这篇关于通过云功能在 Google Cloud Storage 中创建新的 csv 文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:通过云功能在 Google Cloud Storage 中创建新的 csv 文件
基础教程推荐
- 哪些 Python 包提供独立的事件系统? 2022-01-01
- 使用Python匹配Stata加权xtil命令的确定方法? 2022-01-01
- 如何在 Python 中检测文件是否为二进制(非文本)文 2022-01-01
- 症状类型错误:无法确定关系的真值 2022-01-01
- 使用 Google App Engine (Python) 将文件上传到 Google Cloud Storage 2022-01-01
- 使 Python 脚本在 Windows 上运行而不指定“.py";延期 2022-01-01
- 将 YAML 文件转换为 python dict 2022-01-01
- 合并具有多索引的两个数据帧 2022-01-01
- Python 的 List 是如何实现的? 2022-01-01
- 如何在Python中绘制多元函数? 2022-01-01