.Xls va .csv fayllari bo'shligini tekshirish

Savol 1: Barcha .xls yoki .csv fayllarining bo'shligini tekshirish mumkin. Bu men foydalanadigan koddir:

try:
    if os.stat(fullpath).st_size > 0:
       readfile(fullpath)
    else:
       print "empty file"
except OSError:
    print "No file"

An empty .xls file has size greater than 5.6kb so it is not obvious whether it has any contents. How can I check if an xls or csv file is empty?

Question 2: I need to check the header of the file. How can I tell python that files which are just a single row of headers are empty?

import xlrd
def readfile(fullpath)
    xls=xlrd.open_workbook(fullpath)  
    for sheet in xls.sheets():
        number_of_rows = sheet.nrows 
        number_of_columns = sheet.ncols
        sheetname = sheet.name
        header = sheet.row_values(0) #Then if it contains only headers, treat it as empty.

Bu mening tashabbusim. Ushbu kodni qanday qilib davom ettiraman?

Iltimos, ikkala savol uchun ham yechim bering. Oldindan rahmat.

5

6 javoblar

Ushbu pandada usuli. Buni qiling

import pandas as pd

df = pd.read_csv(filename) # or pd.read_excel(filename) for xls file
df.empty # will return True if the dataframe is empty or False if not.

Bundan tashqari, bu faqat sarlavhalar bilan yozilgan fayl uchun True-ni qaytaradi

>> df = pd.DataFrame(columns = ['A','B'])
>> df.empty
   True
5
qo'shib qo'ydi
javobingiz uchun rahmat. Men xlrddan foydalanaman va panda kabi boshqa paketlarni o'rnatishni xohlamayman
qo'shib qo'ydi muallif bob marti, manba

Savol 1: Barcha .xls faylini tekshirish qanday bo'sh.

def readfile(fullpath)
    xls = xlrd.open_workbook(fullpath)

    is_empty = None

    for sheet in xls.sheets():
        number_of_rows = sheet.nrows

        if number_of_rows == 1:
            header = sheet.row_values(0)  
            # then If it contains only headers I want to treat as empty
            if header:
                is_empty = False
                break

        if number_of_rows > 1:
            is_empty = False
            break

        number_of_columns = sheet.ncols
        sheetname = sheet.name

    if is_empty:
        print('xlsx ist empty')

Savol 2: Fayl nomini qanday tekshiraman? Agar faylda faqat bitta satr mavjud bo'lsa (men faqat bitta satrni nazarda tutayapman), men uni faylga ishlov berish kerak emas .Qanday qilib men buni qila olaman?

import csv
with open('test/empty.csv', 'r') as csvfile:
    csv_dict = [row for row in csv.DictReader(csvfile)]
    if len(csv_dict) == 0:
        print('csv file is empty')

Python bilan testlangan: 3.4.2

2
qo'shib qo'ydi
Sizning javobingiz to'g'ri bo'lishi mumkin, lekin CSV va xlsni tekshirish kerak
qo'shib qo'ydi muallif bob marti, manba
CSV uchun, DictReader bilan barcha satrlar orqali ajralish kerak emas. Faylning ikkinchi satrining bo'shligini tekshirib ko'rishingiz mumkin. f.readline() == b '' . Mening to'liq javob bering.
qo'shib qo'ydi muallif tsh, manba

Men Stackoverflow bu vaqtda ikkita savol berishi mumkin emas, deb o'ylamayman, lekin sizga Excel qismidagi javobimni beray

import xlrd
from pprint import pprint

wb = xlrd.open_workbook("temp.xlsx")

empty_sheets = [sheet for sheet in wb.sheets() if sheet.ncols == 0]
non_empty_sheets = [sheet for sheet in wb.sheets() if sheet.ncols > 0]

# printing names of empty sheets
pprint([sheet.name for sheet in empty_sheets])

# writing non empty sheets to database 
pass # write code yourself or ask another question 

Sarlavha haqida: Sizga bir oz maslahat berishga ijozat bering, sheet.nrows == 1 uchun test.

1
qo'shib qo'ydi
@bobmarti nima demoqchisiz? Nima qilishni bilmaymiz! Faqat bo'sh bo'lmagan varaqlarni xohlaysizmi?
qo'shib qo'ydi muallif Elmex80s, manba
@bobmarti Men nimani nazarda tutayotganingizni tushunmayapman.
qo'shib qo'ydi muallif Elmex80s, manba
@bobmarti Yangilangan savolingizga javob beradigan kodni yangiladim. Aytaylik, agar u ishlayotgan bo'lsa yoki sizda ko'proq savollar bo'lsa.
qo'shib qo'ydi muallif Elmex80s, manba
u barcha jadvallarni bir vaqtning o'zida tekshirish uchun aytilgan .Shuningdek, agar 1 - varaqda ma'lumot bo'lsa va jadval2 bo'sh bo'lsa, nima qilishim kerak.
qo'shib qo'ydi muallif bob marti, manba
Men barcha jadvallarni tekshirishni istayman va bo'sh jadvallar jb sifatida saqlangan va jadvallar qiymati saqlangan deb baholang
qo'shib qo'ydi muallif bob marti, manba
Excel file.then barcha varaqlarni nazorat qilishni xohlayman. Qaysi sahifalar bo'sh bo'lishi kerak, keyin chop etish empty.if har qanday varaq bo'sh bo'lmasa, keyin ma'lumotlar bazasiga \
qo'shib qo'ydi muallif bob marti, manba
mening ikkinchi savolimni qidirib toping
qo'shib qo'ydi muallif bob marti, manba
chunki sizning yechimingiz menga bir fikrni beradi.Qanday qilib mening savolga tatbiq eta olasiz? Shevada wb.sheets() da ikki marta foydalangan holda varagidan foydalansangiz. xls.sheets ): faqat bir marta. Keyin mening kodimga qanday davom etish kerak
qo'shib qo'ydi muallif bob marti, manba

Sizning excel kodingiz uchun men siz bilan birga kelgan pandas echimini yaxshi ko'raman, lekin agar siz ishlayotgan bo'lsangiz va uni o'rnatolmaysiz, deb o'ylayman. Har bir sahifani kesib o'tuvchi teshik bor. Shunday qilib, siz har bir varaqdagi satrlarni sinab ko'rishingiz mumkin va shunday bo'lsa bo'sh ishni bajarishingiz mumkin:

import xlrd

xlFile = "MostlyEmptyBook.xlsx"

def readfile(xlFile):
    xls=xlrd.open_workbook(xlFile)  
    for sheet in xls.sheets():
        number_of_rows = sheet.nrows 
        number_of_columns = sheet.ncols
        sheetname = sheet.name
        header = sheet.row_values(0) #then If it contains only headers I want to treat as empty
        if number_of_rows <= 1:
            # sheet is empty or has just a header
            # do what you want here
            print(xlFile + "is empty.")

Note: I added a variable for the filename to make it easier to change in one place throughout the code when used. I also added : to your function declaration which was missing it. If you want the test to be has header only (mine includes completely blank page), then change <= to ==.

Bilan bog'liq CSV masalasi haqida. CSV faqat matnli fayl. Quyidagi kabi kodlash yondashuvidan foydalangan holda, sarlavha tashqari fayl bo'sh. Ushbu kodni fayllar misolida sinab ko'rsata olasiz va siz matematik mantiqni o'zgartirishni xohlaysiz. Misol uchun, agar mavjud bo'lsa, + 1 o'rniga * 1 dan foydalaning. Mening fikrimcha oq bo'shliq bilan yoki bir nechta belgilar noto'g'ri kiritilgan bo'lsa, bu yaxshi fayl hajmi yostig'i + kodlash mantig'ida berilgan ikkinchi qatorli testdagi chars.

Bu sizning kompyuteringizga ba'zi ulkan fayllarni yuklamasdan oldin fayl bo'sh bo'lganligini bilmoqchi bo'lgan taxmin bo'yicha yozilgan. Agar bu taxmin noto'g'ri bo'lsa, sinov mantiqiyligimdan foydalansangiz va keyin faylni ochiq qoldirishingiz yoki hatto keyinchalik qo'shimcha ma'lumot kiritilgan bo'sh satr yo'qligiga ishonch hosil qilish uchun qo'shimcha kodda o'qishingiz mumkin (noto'g'ri formatlangan kirish faylida) :

import os

def convert_bytes(num):
    """
    this function will convert bytes to MB.... GB... etc
    """
    for x in ['bytes', 'KB', 'MB', 'GB', 'TB']:
        if num < 1024.0:
            return "%3.1f %s" % (num, x)
        num /= 1024.0


def file_size(file_path):
    """
    this function will return the file size
    """
    if os.path.isfile(file_path):
        file_info = os.stat(file_path)
        return convert_bytes(file_info.st_size)


# testing if a csv file is empty in Python (header has bytes so not zero)

fileToTest = "almostEmptyCSV.csv"

def hasContentBeyondHeader(fileToTest):
    answer = [ True, 0, 0, 0]
    with open(fileToTest) as f:
        lis = [ f.readline(), f.readline() ] 
        answer[1] = len(lis[0])                # length header row
        answer[2] = len(lis[1])                # length of next row
        answer[3] = file_size(fileToTest)      # size of file

        # these conditions should be high confidence file is empty or nearly so
        sizeMult = 1.5   # test w/ your files and adjust as appropriate (but should work)
        charLimit = 5

        if answer[1] * sizeMult > answer[2] and answer[2] == 0:
            answer[0] = False
        elif answer[1] * sizeMult > answer[2] and answer[2] < charLimit:
            # separate condition in case you want to remove it
            # returns False if only a small number of chars (charLimit) on 2nd row
            answer[0] = False
        else:
            answer[0] = True   # added for readability (or delete else and keep default)         

        f.close()
    return answer

hasContentBeyondHeader(fileToTest)  # False if believed to be empty except for header

Sinov jarayonida readline buyruqlar ushbu kontentni fayldan chiqarib tashladi:

['year,sex,births\n', '']

Namuna chiqishi:

[True, 16, 0, '17.0 bytes']

Ushbu yondashuv siz qaytaradigan ro'yxatning [0] elementida haqiqiy/noto'g'ri bo'lgan test natijalariga kirishingiz mumkinligini anglatadi. Qo'shimcha ma'lumotlar keyinroq chizish kerak bo'lsa, dasturning qaror qabul qilish uchun kiritilgan ma'lumotlar haqida ma'lumot olish imkonini beradi.

Ushbu kod maxsus fayl hajmi funktsiyasi bilan boshlanadi. Agar siz qisqa kodni qidirsangiz, bu sizning afzalliklaringizga bog'liq bo'lishi mumkin. Bu ikkita kichik funktsiyaning o'rnini egallaydi:

import os    
os.path.getsize(fullpathhere)
1
qo'shib qo'ydi

bu kabi narsa haqida:

file = open(path, "r")
file_content = file.read()
file.close()
if file_content == "":
    print("File '{}' is empty".format(path))
else:
    rows = file_content.split("\n", 1)
    if rows[1] == "":
        print("File '{}' contains headers only.".format(path))

bu erda path - sizning xls yoki csv faylingizning yo'lidir.

1
qo'shib qo'ydi
xls ishlamaydi
qo'shib qo'ydi muallif bob marti, manba
Keyinchalik, ushbu fayl formati uchun maxsus kodlash tufayli xls fayllari uchun ushbu kod ishlashi mumkinligiga ishonchim komil ...
qo'shib qo'ydi muallif PurpleJo, manba

Savolingiz uchun:

Savol 2: Fayl nomini tekshirish kerak. Pitonga faqat bitta satr satrlari bo'lgan fayllar bo'sh bo'lishi mumkinligini qanday aytishim mumkin?

Fayllardagi qatorni tekshirishingiz mumkin.

with open('empty_csv_with_header.csv') as f:
    f.readline()  # skip header
    line = f.readline()
    if line == b'':
        print('Empty csv')
0
qo'shib qo'ydi
Python
Python
372 ishtirokchilar

Bu guruh python dasturlash tilini muhokama qilish uchun. Iltimos, o'zingizni hurmat qiling va faqat dasturlash bo'yicha yozing. Botlar mavzusini @botlarhaqida guruhida muhokama qling! FAQ: @PyFAQ Offtopic: @python_uz_offtopic

Python offtopic group !
Python offtopic group !
150 ishtirokchilar

@python_uz gruppasining offtop gruppasi. offtop bo'lsa ham reklama mumkin emas ) Boshqa dasturlash tiliga oid gruppalar @languages_programming